The challenge of a modern physicochemical biology is to establish the relationship between the protein structure and its function. Recent development of molecular modeling techniques made a substantial impact in this process. However, in order to perform high-performance computations and implement quantum mechanical and molecular dynamics simulations, it is essential to understand the flexibility of protein structure. Currently, there is no doubt that proteins in living systems exist as ensembles of different conformers. The variety of their properties cannot be explained by just one static structure, determined, for example, by X-ray diffraction. The same considerations apply not only to the general task of constructing protein structures, but also to more specific problems, such as the structural organization of individual regions (Kopylov et al., 2023). For example, mobile sites in unresolved 3D structures are of particular interest.
However, exploring the complex and high-dimensional free energy landscapes of proteins remains a challenging task in computational biology. Conventional methods, such as homology modeling and ab initio approaches like Rosetta, have contributed significantly to the field by predicting protein structures and providing insights into conformational states. These methods have laid the groundwork for understanding protein dynamics and continue to be indispensable in structural biology. More recently, state-of-the-art techniques utilizing neural networks (NNs) have introduced new opportunities for improving the efficiency and accuracy of conformational sampling and modeling. Enhanced sampling techniques, such as metadynamics, combined with these advancements, have proven invaluable for overcoming free energy barriers and sampling rare events.
Metadynamics relies on the definition of collective variables (CVs) to bias molecular dynamics simulations, but selecting optimal CVs often requires expert knowledge and is prone to limitations. Neural networks, with their ability to model high-dimensional data and learn non-linear relationships, offer a solution by enabling the automatic discovery of CVs. For instance, deep learning models such as State Predictive Information Bottlenecks have been integrated with Bias Exchange Metadynamics (BE-metaD) to uncover complex protein folding and unfolding pathways, revealing intricate details of protein conformational dynamics (Pomarici et al., 2023). Similarly, AlphaFold-guided slow feature analysis, combined with metadynamics, has accelerated the sampling of protein–ligand interactions and allosteric transitions (Vats et al., 2024).
Methods capable of generating conformational ensembles for specific protein sequences have also been developed, such as tools derived from structure prediction algorithms and generative neural networks (Zhang et al., 2023, Zheng et al., 2024, Jing et al., 2024). These approaches provide insights into the range of possible structural states a protein can adopt. However, their application to molecular dynamics simulations and their ability to restore conformations from trajectory data remain underexplored. Such methods may also prove invaluable in generating additional configurational variability, extending the exploratory phase space accessible to metadynamics. By broadening the range of conformational states available for sampling, these techniques can enhance the resolution and accuracy of free energy landscape exploration.
In addition to CV optimization, neural networks enhance the exploration of metastable states and conformational transitions. Frameworks such as ANN-COLVAR (Trapl et al., 2019), VAMPnet (Chen et al., 2019), Markov state models (Wang et al., 2024), TS-DAR (Liu et al., 2025) and deep autoencoders (Bandyopadhyay and Mondal, 2021, Mansoor et al., 2024) have been applied to protein folding simulations and the discovery of metastable ensembles, demonstrating their ability to resolve detailed features of energy landscapes. Time-lagged autoencoders (TLAEs) (Wehmeyer and Noé, 2018) and Deep-TICA (Bonati et al., 2021) (Deep Time-Delay Independent Component Analysis) are advanced machine learning techniques increasingly applied to select slow, collective motions by learning temporal dependencies and nonlinear transformations from molecular dynamics trajectories. By automating the extraction of relevant motions, TLAEs and Deep-TICA enhance the efficiency of free energy calculations and pathway identification, providing valuable insights into protein function and dynamics. Being computationally expensive, these methods may require large datasets and significant training time in addition to the selection of optimal time-lags and model hyperparameters posing domain expertise for effective implementation. Generative models, including neural network-based dynamic ensemble predictors, have been particularly successful in studying rare events such as cryptic pocket formations and protein flexibility (Zheng et al., 2023, Wang and Tiwary, 2021).
The integration of neural networks with metadynamics has also been transformative in studying ligand binding thermodynamics and kinetics. For example, machine learning-guided metadynamics has elucidated binding modes and free energy landscapes in drug-target interactions, offering mechanistic insights into receptor-ligand systems (Siddiqui et al., 2023). Furthermore, deep learning models have been applied to study allosteric mechanisms in G-protein coupled receptors (GPCRs), linking conformational changes to biological functions (Ghorbani et al., 2022). These advances highlight the potential of combining neural networks with metadynamics to address the limitations of traditional sampling techniques. Neural networks enable automated CV discovery, dynamic modeling of conformational ensembles, and high-resolution characterization of protein energy landscapes. By integrating data-driven methods with traditional molecular dynamics, this approach has broad applications in drug discovery, enzyme engineering, and the fundamental understanding of biomolecular processes.
Despite the successful application in metadynamics of collective variables derived from the latent space formed by conventional variational autoencoders in literature (Bandyopadhyay and Mondal, 2021), their general feasibility remains to be thoroughly explored. We propose to use hyperspherical variational autoencoders (Davidson et al., 2018) to pack the available conformational space into a format that can be effectively utilized as a collective variable in metadynamics. Recent study demonstrated the application of hyperspherical latent space to prevent dispersion loss term from pushing data infinitely apart (Liu et al., 2025): 684 pairwise distances were used as the input features for DNA motor protein in order to identify transition states. However, we expect that the structure of the hyperspherical latent space can better compact the conformational paths defined by dihedrals of amino acids (Liu et al., 2025), thus having a potency to be directly incorporated into to the sampling procedures including metadynamics. Presented approach does not require prior specific knowledge of the system, enabling the reconstruction of the energy surface between states based on the learned representations of the protein's conformational dynamics. The adequacy of the suggested approach was verified by characterizing the folding of the Trp-cage protein, modeling the equilibrium between conformational states of ubiquitin and studying the conformational states of mobile loops in the active center of flavin-dependent 2-hydroxybiphenyl-3-monooxygenase (HbpA).
Comments (0)