Numerous cellular functions, such as transcription, translation, DNA replication and repair, and gene regulation, rely on interactions between proteins and nucleic acids [1, 2, 3]. Disruption and misregulation of these interactions were linked to a number of human diseases [4, 5, 6], motivating studies that decipher and investigate their molecular underpinnings, which include knowledge of the interacting amino acids and nucleotides. While these details can be learned from structures of protein–nucleic acids complexes [7,8], these structures are available for a small fraction of interactions. Correspondingly, computational methods that predict nucleic acid–binding residues in protein sequences were developed to help address this challenge [9, 10, 11, 12, 13, 14, 15∗∗, 16, 17, 18∗]. Around 90 predictors of DNA-binding residues (DBRs) and the RNA-binding residues (RBRs) in protein sequences were released over the last two decades [15]. There are also numerous methods that predict DBRs and RBRs in protein structures, with a few recently released examples including GraphBind [19], Geobind [20] and PNAbind [21]. Moreover, some recently released tools, such as GraphSite [16], GLMSite [22], GPSFun [23], and EquipNAS [24], utilize protein structures predicted with AlphaFold2 [25] and ESMFold [26] from sequences to make these predictions, indirectly generating predictions from sequences. A recent study showed that the sequence-based methods were applied in a wide range of projects, with 21 predictors that were cited over 100 times [15]. For instance, our DisoRDPbind [27] was recently used to investigate the impact of mutations on chromosome condensation and segregation [28], liquid–liquid phase separation propensities of the herpes simplex virus-1 proteins [29], and abundance and functions of intrinsic disorder in the polyomavirus [30]. Another popular method, ProNA2020 [31], was applied to identify novel overlapping genes [32] and to study the type VI secretion systems that are linked to inflammatory bowel disease [33] and the CRISPR/Cas9 system in the context of the hepatitis B virus disruption [34]. Predictors of RBRs and DBRs are easy to access and use as web server and/or standalone code and their precomputed predictions are nowadays available via online databases, such as DescribePROT [35,36] and GPSiteDB [18]. These developments have accelerated recently, with 20 new predictors that were published between 2023 and 2024. We focus on the sequence-based methods and investigate these recent advances, highlighting trends in the scope, predictive models, availability, and accuracy of the recently released tools. We also discuss several aspects that need further attention to move this active field of research forward.
Comments (0)