Systematic Benchmarking of Deep-learning Methods for Tertiary RNA Structure Prediction

Akash Bahai, Chee Keong Kwoh, Yuguang Mu, Yinghui Li.

Abstract

The 3D structure of RNA critically influences its functionality, and understanding this structure is vital for deciphering RNA biology. Experimental methods for determining RNA structures are labour-intensive, expensive, and time-consuming. Computational approaches have emerged as valuable tools, leveraging physics-based-principles and machine learning to predict RNA structures rapidly. Despite advancements, the accuracy of computational methods remains modest, especially when compared to protein structure prediction.

Introduction

RNA molecules are essential players in various cellular processes, extending beyond their initial role as passive carriers of genetic information. Their diverse functions, including gene expression regulation, enzymatic activity, and regulatory mechanisms, have highlighted the importance of understanding RNA at a structural level. The three-dimensional (3D) structure of RNA plays a critical role in determining its function. Unlike DNA, RNA molecules can act as single-stranded entities, folding into intricate, dynamic architectures that dictate their biological activity and interactions with other molecules.

Methods:

We compiled a list of DL-based methods for RNA structure prediction in the recent literature and implemented them locally on our systems to allow a large-scale comparison for multiple targets (Table 1). We also included two fragment-assembly based methods (non-ML) to allow an overall comparison of deep-learning methods against the traditional methods. The details of the benchmarked methods are provided in the Table 1. In the following, we describe the methods in more details.

Discussion

In this study, we conducted a comprehensive benchmarking of various RNA structure prediction methods across diverse datasets, each presenting varying levels of difficulty. While our primary focus was on deep-learning-based approaches, we also incorporated fragment-assembly methods to check the comparative effectiveness of machine learning (ML) versus traditional techniques. We evaluated seven 3D RNA structure prediction methods on three datasets, encompassing a total of 66 target RNAs.

Acknowledgments

The computational work for this article was performed on resources of the National Supercomputing Centre (NSCC), Singapore (https://www.nscc.sg).

Citation: Bahai A, Kwoh CK, Mu Y, Li Y (2024) Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction. PLoS Comput Biol 20(12): e1012715. https://doi.org/10.1371/journal.pcbi.1012715

Editor: Yang Zhang, National University of Singapore, SINGAPORE

Received: June 11, 2024; Accepted: December 10, 2024; Published: December 30, 2024.

Copyright: © 2024 Bahai et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The FASTA sequences of the target RNAs from all the datasets, the PDB files of the predicted models by all seven methods, different comparison metrics and the comparative analysis scripts are all available at https://github.com/akashbahai/rna_benchmarking. The benchmarked prediction methods are available at the following links: RoseTTAFold2NA at https://github.com/uw-ipd/RoseTTAFold2NA DeepFoldRNA at https://github.com/robpearc/DeepFoldRNA DRfold at https://github.com/leeyang/DRfold/ RhoFold at https://github.com/Dharmogata/RhoFold trRosettaRNA at https://yanglab.qd.sdu.edu.cn/trRosettaRNA/ RNAComposer at https://rnacomposer.cs.put.poznan.pl/ 3dRNA at http://biophy.hust.edu.cn/new/3dRNA.

Funding: This study is supported by the Nanyang Technological University Singapore, under its Accelerating Creativity and Excellence (ACE) award (https://www.ntu.edu.sg/research/research-careers/accelerating-creativity-and-excellence-(ace)) - Project ID: NTU-ACE2021-07, awarded to Y.L., M.Y. and K.C.K. This study is also partially funded by the Ministry of Education, Singapore (https://www.moe.gov.sg/), under its Academic Research Fund Tier 1 - Project ID: RG33/20, awarded to Y.L. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared no competing interests exist.