Biogen-MDCK
Dataset: Download it here.
Dataset description: 2,642 compounds with measured efflux ratio values in the MDR1-MDCK assay, released by Biogen.
Dataset preprocessing
- Download the original dataset from here, which contains 2,642 compounds with available efflux ratio values;
- Use RDKit to transform the SMILES to their canonical forms (most SMILES are already canonical.);
Reference
- C. Fang, Y. Wang, R. Grater, S. Kapadnis, C. Black, P. Trapa, and S. Sciabola, Prospective validation of machine learning algorithms for absorption, distribution, metabolism, and excretion prediction: An industrial perspective, Journal of Chemical Information and Modeling 63, 3263 (2023).