Dongmei Li, Martin S. Zand, Timothy D. Dye, Maciej L. Goniewicz, Irfan Rahman, Zidian Xie
RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay.
High-throughput transcriptome sequencing technologies have profound impact on our ability to address an increasingly diverse range of biological and biomedical problems, and improve our understanding of human diseases by capturing an accurate picture of molecular processes within the cell. RNA-seq has become a major assay for measuring relative transcript abundance and diversity, and has been used as a standard tool for the life sciences research community.
Materials and Methods:
For our purposes, consider the null hypothesis of no differential gene expression. Among m hypothesis tests, m0 represent cases where no differential expression exists; i.e. these are “true null hypotheses”. R represents the number of rejected null hypotheses, and V represents the number of tests that result in false rejections (i.e., V represents the number of false discoveries). S represents the number of tests that result in true rejections (i.e. the number of true discoveries).
We evaluated eight commonly used RNA-seq differential analysis methods in this study through both simulation studies and real RNA-seq data examples. We compared the FDR control, power, apparent test power, and stability of eight methods under different scenarios with varied library sizes, distribution assumptions and sample sizes. Our studies show the library size does not have much effect on performance, which is due to the adjustment of library size in all methods.
We thank the Center for Integrated Research Computing at the University of Rochester for providing high performance computing resources. We also would like to thank the anonymous reviewers for their insightful comments and suggestions that helped to further improve our manuscript.
Citation: Li D, Zand MS, Dye TD, Goniewicz ML, Rahman I, Xie Z (2022) An evaluation of RNA-seq differential analysis methods. PLoS ONE 17(9): e0264246. https://doi.org/10.1371/journal.pone.0264246
Editor: Dov Joseph Stekel, University of Nottingham, UNITED KINGDOM
Received: February 5, 2022; Accepted: August 30, 2022; Published: September 16, 2022.
Copyright: © 2022 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The summarized RNA-seq count data from two mouse strains in neuroscience research can be downloaded from http://bowtie-bio.sourceforge.net/recount/countTables/bottomly_count_table.txt. The R code for simulations and real data analysis could be found from the GitHub repository https://github.com/DongmeiLi2017/RNA-seq-Analysis-Methods-Comparison.
Funding: This work is supported by the University of Rochester’s Clinical and Translational Science Award (CTSA) number UL1 TR000042, UL1 TR002001, and U24TR002260 from the National Center for Advancing Translational Sciences of the National Institutes of Health (Drs. Li and Zand). Dr. Zand is also supported by the National Institute of Allergy and Infectious Diseases and the National Institute of Immunology, grant numbers AI098112 and AI069351. This study was supported by the National Institute of Environmental Health Sciences with grant number NIH 1R21ES032159-01A1 and National Institute on Aging with grant number NIH 1U54AG075931-01 (Drs. Li and Rahman). This study was supported by the grants from the WNY Center for Research on Flavored Tobacco Products (CRoFT) under cooperative agreement U54CA228110 which is supported by the National Cancer Institute of the National Institutes of Health (NIH) and the Food and Drug Administration (FDA) Center for Tobacco Products (Drs. Li, Goniewicz, Rahman, Xie). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH and FDA. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.