AZ(AstraZeneca)-lipo
Dataset: Download it here.
Dataset description: 4,195 compounds and their measured \(\log D_{7.4}\) values, octanol-water partition coefficient at pH7.4 measured using a shake flask method, deposited by AstraZeneca in ChEMBL.
Dataset preprocessing
- Extract the raw dataset from ChEMBL 34 using assay ID CHEMBL3301363, which contains 4,200 compounds;
- For the 3 compounds with 2 different \(\log D_{7.4}\) values, drop the 2 compounds (4 rows) with a large difference in \(\log D_{7.4}\):
| logD7.4 | canonical_smiles |
|---|---|
| 3.04 | CN1CCC[C@@H]1CCOC@(c1ccccc1)c1ccc(Cl)cc1 |
| 3.48 | CN1CCC[C@@H]1CCOC@(c1ccccc1)c1ccc(Cl)cc1 |
| -0.66 | CN1[C@@H]2CC[C@H]1CC@@HC2 |
| -0.09 | CN1[C@@H]2CC[C@H]1CC@@HC2 |
and take the average for the third compound:
| logD7.4 | canonical_smiles |
|---|---|
| 2.16 | C=C[C@H]1CN2CC[C@H]1C[C@H]2C@Hc1ccnc2ccc(OC)cc12 |
| 2.26 | C=C[C@H]1CN2CC[C@H]1C[C@H]2C@Hc1ccnc2ccc(OC)cc12 |
Reference
- M. Wenlock and N. Tomkinson, Experimental in vitro DMPK and physicochemical data on a set of publicly disclosed compounds.