Biogen-RLM
Dataset: Download it here.
Dataset description: 3,054 compounds with measured logarithm of intrinsic clearance (\(\log \text{CL}_\text{int}\)) in the unit of mL/min/kg, released by Biogen.
Dataset preprocessing
- Download the original dataset from here, which contains 3,054 compounds with available measurements;
- Use RDKit to transform the SMILES to their canonical forms (most SMILES are already canonical.);
Reference
- C. Fang, Y. Wang, R. Grater, S. Kapadnis, C. Black, P. Trapa, and S. Sciabola, Prospective validation of machine learning algorithms for absorption, distribution, metabolism, and excretion prediction: An industrial perspective, Journal of Chemical Information and Modeling 63, 3263 (2023).