Online OCHEM multi-task model for solubility and lipophilicity prediction of platinum complexes

ElsevierVolume 269, August 2025, 112890Journal of Inorganic BiochemistryAuthor links open overlay panel, , , , , , , , , Highlights•

First online model to predict water solubility of Platinum Pt(II)/Pt(IV) complexes.

Updated online model to predict lipophilicity of Platinum Pt(II)/Pt(IV) complexes.

Rigorous time-split validation of water solubility model with post-2017 compounds.

New water solubility data for 18 Platinum complexes.

Multitask model predicts solubility and lipophilicity simultaneously.

AbstractPredicting the solubility and lipophilicity of platinum(II, IV) complexes is essential for prioritizing potential anticancer candidates in drug discovery. This study introduces the first publicly available online model for predicting the solubility of platinum complexes, addressing the lack of literature and models in this regard. Using a time-split dataset, we developed a consensus model with a Root Mean Squared Error (RMSE) of 0.62 through 5-cross-validation on a training set of 284 historical compounds (solubility data reported prior to 2017). However, the RMSE increased to 0.86 when applied to a prospective test set of 108 compounds reported after 2017. Further analysis of the high prediction errors revealed that these inaccuracies are primarily attributed to the underrepresentation of novel chemical scaffolds, particularly Pt(IV) derivatives, in the training sets. For instance, a series of eight phenanthroline-containing compounds, not covered by the training set's chemical space, had an RMSE of 1.3. When the model was redeveloped using a combined dataset, the RMSE of this series significantly decreased to 0.34 under the same validation protocol. Additionally, we developed an interpretable linear model to identify structural features and functional groups that influence the solubility of platinum complexes. We further validated the correlation between solubility and lipophilicity, consistent with the Yalkowsky General Solubility Equation. Building on these insights, we developed a final multitask model that simultaneously predicts solubility and lipophilicity as two endpoints with RMSE = 0.62 and 0.44, respectively. The data and final developed model is available at https://ochem.eu/article/31.Graphical abstractWater solubility measurements of Pt(II)/Pt(IV) complexes were used to develop consensus models using descriptors and representation-learning methods. While the chemical series structurally distinct from the training set showed large prediction errors (a), they were not outliers in the model developed from the extended chemical space (b).Unlabelled ImageDownload: Download high-res image (271KB)Download: Download full-size imageKeywords

Platinum Pt(II)/Pt(IV) complexes

Water solubility

Lipophilicity

Consensus model

Neural networks

Representation learning

AbbreviationsADMET

Absorption, Distribution, Metabolism, Excretion, and Toxicity

GIT

gastrointestinal tract

MPNN

message passing neural network

OCHEM

Chemical Modelling environment

QSPR

quantitative structure-property relationship

CNN

Convolutional Neural Network

SMILES

Simplified Molecular Input Line Entry System

SMARTS

SMILES-Arbitrary-Target Specification

ASNN

Associative Neural Network

InChi

International Chemical Identifier

RMSE

Root Mean Squared Error

EFG

Extended Functional Group

© 2025 The Author(s). Published by Elsevier Inc.

Comments (0)

No login
gif