Biogen-hPPB
Dataset: Download it here.
Dataset description: 194 compounds with measured logarithm of the percentage unbound to human plasma protein, released by Biogen.
Dataset preprocessing
- Download the original dataset from here, which contains 194 compounds with available measurements;
- Use RDKit to transform the SMILES to their canonical forms (most SMILES are already canonical.);
Reference
- C. Fang, Y. Wang, R. Grater, S. Kapadnis, C. Black, P. Trapa, and S. Sciabola, Prospective validation of machine learning algorithms for absorption, distribution, metabolism, and excretion prediction: An industrial perspective, Journal of Chemical Information and Modeling 63, 3263 (2023).