RNA aptamers are single-stranded oligonucleotides ranging from 20 to 80 bases in length [1,2]. Their remarkable ability to specifically bind not only to proteins but also to a diverse range of biomolecules with exceptional affinity makes them a promising alternative to monoclonal antibodies. RNA aptamers exhibit both distinct advantages and disadvantages compared with antibodies. They offer greater stability at room temperature, resistance to proteases, and lower cost for synthesis. They also exhibit minimal batch-to-batch variation in mass production, ease of modification to refine specificity and affinity, and the ability to target molecules inaccessible to antibodies. On the other hand, they are susceptible to nuclease-mediated degradation, leading to a short half-life in vivo. Consequently, many RNA aptamers require chemical modifications to suit specific applications. Several review articles offer more detailed comparisons of the advantages and disadvantages between RNA aptamers and monoclonal antibodies [1,3,4]. Among the diverse potential targets of RNA aptamers, proteins stand out as the most prevalent and crucial. Numerous successful applications of protein-targeting RNA aptamers have emerged in domains such as pathogen and cancer recognition, environmental contamination screening, biosensors, and fundamental research. In light of this, we focus on computational methods related to protein-RNA aptamer complexes in this review.
The primary sequences of RNA aptamers are intrinsic to protein binding, and their identification is crucial to the success of aptamer screening efforts. The gold-standard methodology for RNA aptamer sequence discovery is in vitro systematic evolution of ligands by exponential enrichment (SELEX). Several comprehensive reviews on SELEX and its numerous variants provide in-depth discussions of their limitations and the corresponding technical adjustments made for specific application domains [5,6]. Despite its dominant role in aptamer sequence discovery, the success rate of SELEX is relatively low, with only approximately a 30 % chance of recovering a sequence with satisfactory affinity [6,7]. To address this issue of low hit rate, machine learning methods, particularly deep learning (DL)-based methods, have been developed to explore the latent sequence space which is only partially covered by the SELEX data, aiming to identify sequences with high affinity and specificity against the target protein.
Post-SELEX optimization involves altering the sequence or making chemical modifications to enhance the functionality of the aptamer. Given that the affinity and specificity of RNA aptamers are structure-dependent, high-resolution structures of the aptamer and its complex with the target serve as the foundation for any rational optimization approach. Experimental methods for atomic resolution structure determination, i.e., X-ray crystallography, nuclear magnetic resonance (NMR), and cryo-electron microscopy (Cryo-EM), are time-consuming and resource-intensive, rendering them unsuitable for an iterative aptamer modification workflow. Higher-throughput experimental methods such as SAXS and chemical footprinting offer valuable structural insights despite their lower resolution. For instance, SAXS reveals particle morphology, while chemical footprinting identifies key binding interactions, with both serving as useful constraints for structure modeling. In contrast, efficient computational methods have been extensively employed in aptamer research and have been demonstrated to be highly effective in streamlining the post-SELEX optimization process [8, 9, 10]. For instance, Vasconcelos et al. recently performed an in silico analysis of interactions between human transferrin receptor-1 (hTfR) and an aptamer-RNA conjugate to investigate the mechanism of hTfR-specific aptamer-based therapeutics delivery [11]. The authors utilized RNA tertiary structure modeling, molecular docking, and molecular dynamics (MD) simulations to reveal atomic details of the interactions. Dixit et al. combined experimental techniques with protein structure modeling and molecular docking to study the interactions of HIV-1 integrase (IN) with INI1/SMARCB1 and TAR RNA [12]. These studies underscore the significance of computational methods in RNA aptamer research and illustrate a general paradigm of employing computational methods for protein-RNA recognition and modeling, as depicted in Figure 1.
In this review, we delve into the computational methodologies employed for the design of RNA aptamer sequences, as well as the prediction of RNA tertiary structures and protein-RNA complex structures. The discussion encompasses the technical aspects of both conventional methods and DL-based approaches used in these areas.
Comments (0)