Note:

  • All the SMILES strings are canonical;
  • “x” column: “canonical_smiles”;

ADME

Aqueous solubility

Task \(N\) y Dataset Preprocessing
ESOL 1,084 logS here here
EPA-sol 10,093 logS here here
AZ-sol 1,763 logS here here
Biogen-sol 2,173 logS here here
NCATS-sol 2,453 low_solubility here here

Lipophilicity

Task \(N\) y Dataset Preprocessing
AZ-lipo 4,195 logD7.4 here here

Permeability

Task \(N\) y Dataset Preprocessing
CSU-Caco2 1,018 logPapp here here
USTL-Caco2 1,780 logPapp here here
Biogen-MDCK 2,642 “LOG MDR1-MDCK ER (B-A/A-B)” here here
NCATS-PAMPA-pH7.4 2,033 low_moderate_permeability here  
NCATS-PAMPA-pH5 486 low_permeability here  

Note:

Plasma protein binding (PPB)

Task \(N\) y Dataset Preprocessing
AZ-rPPB 717 log_pct_unbound here  
Biogen-rPPB 168 “LOG PLASMA PROTEIN BINDING (RAT) (% unbound)” here here
AZ-dPPB 244 log_pct_unbound here  
AZ-mPPB 162 log_pct_unbound here  
AZ-hPPB 1,614 log_pct_unbound here here
Biogen-hPPB 194 “LOG PLASMA PROTEIN BINDING (HUMAN) (% unbound)” here here

Hepatocyte stability

Task \(N\) y Dataset Preprocessing
AZ-rH 837 “LOG RH_CLint (uL/min/1E6 cells)” here  
AZ-hH 407 “LOG HH_CLint (uL/min/1E6 cells)” here  

Liver microsomal stability

Task \(N\) y Dataset Preprocessing
NCATS-rLM 2,528 unstable here  
Biogen-rLM 3,054 “LOG RLM_CLint (mL/min/kg)” here here
AZ-hLM 1,102 “LOG HLM_CLint (mL/min/g)” here  
Biogen-HLM 3,087 “LOG HLM_CLint (mL/min/kg)” here here

CYP450 interactions

Task \(N\) y Dataset
CYP1A2_CHEMBL1741322 9,600 pchembl_value here
CYP2C9_CHEMBL1614027 2,898 pchembl_value here
CYP2C9_CHEMBL1741325 7,220 pchembl_value here
CYP2C19_CHEMBL1613777 3,518 pchembl_value here
CYP2C19_CHEMBL1741323 8,850 pchembl_value here
CYP2D6_CHEMBL1614110 3,343 pchembl_value here
CYP2D6_CHEMBL1741321 5,461 pchembl_value here
CYP3A4_CHEMBL1613886 6,471 pchembl_value here
CYP3A4_CHEMBL1614108 6,471 pchembl_value here
CYP3A4_CHEMBL1741324 8,628 pchembl_value here

Note:

Acute toxicity

NCATS-LD50

Task \(N\) y Dataset
rat-SC 1,886 “rat_subcutaneous_LD50_(?log(mol/kg))” here
rat-IV 2,464 “rat_intravenous_LD50_(?log(mol/kg))” here
rat-IP 5,001 “rat_intraperitoneal_LD50_(?log(mol/kg))” here
rat-oral 10,151 “rat_oral_LD50_(?log(mol/kg))” here
mouse-SC 6,754 “mouse_subcutaneous_LD50_(?log(mol/kg))” here
mouse-IV 16,967 “mouse_intravenous_LD50_(?log(mol/kg))” here
mouse-IP 36,267 “mouse_intraperitoneal_LD50_(?log(mol/kg))_(?log(mol/kg))” here
mouse-oral 23,350 “mouse_oral_LD50_(?log(mol/kg))” here

Note:

  • See here for the data extraction steps;
  • SC: subcutaneous; IV: intravenous; IP: intraperitoneal
  • LD50: median lethal dose