Xiangying Sun,Zhezhen Wang, Johnathon M. Hall,Perez-Cervantes, Alexander J. Ruthenburg, Ivan P. Moskowitz, Michael Gribskov, Xinan H. Yang
Long noncoding RNAs (lncRNAs) localize in the cell nucleus and influence gene expression through a variety of molecular mechanisms. Chromatin-enriched RNAs (cheRNAs) are a unique class of lncRNAs that are tightly bound to chromatin and putatively function to locally cis-activate gene transcription. CheRNAs can be identified by biochemical fractionation of nuclear RNA followed by RNA sequencing, but until now, a rigorous analytic pipeline for nuclear RNA-seq has been lacking. In this study, we survey four computational strategies for nuclear RNA-seq data analysis and develop a new pipeline, Tuxedo-ch, which outperforms other approaches. Tuxedo-ch assembles a more complete transcriptome and identifies cheRNA with higher accuracy than other approaches. We used Tuxedo-ch to analyze benchmark datasets of K562 cells and further characterize the genomic features of intergenic cheRNA (icheRNA) and their similarity to enhancer RNAs (eRNAs). We quantify the transcriptional correlation of icheRNA and adjacent genes and show that icheRNA is more positively associated with neighboring gene expression than eRNA or cap analysis of gene expression (CAGE) signals. We also explore two novel genomic associations of cheRNA, which indicate that cheRNAs may function to promote or repress gene expression in a context-dependent manner. IcheRNA loci with significant levels of H3K9me3 modifications are associated with active enhancers, consistent with the hypothesis that enhancers are derived from ancient mobile elements. In contrast, antisense cheRNA (as-cheRNA) may play a role in local gene repression, possibly through local RNA:DNA:DNA triple-helix formation.
Both the nucleoplasm and the chromatin fraction of the nucleus are enriched in long noncoding RNA (lncRNA) . Many nuclear lncRNAs affect coding gene expression, alter chromatin organization, and are important in diverse biological processes [2, 3]. Among these nuclear lncRNAs, chromatin-enriched RNAs (cheRNAs) possess gene-regulatory roles [4–7]. In our recent studies, we found individual cheRNAs that promote essential gene-enhancer contacts are dependent on a transcript factor [5, 7]. However, a robust analytic pipeline for the identification of cheRNA as a group of functional nuclear RNAs has not been developed.
S1 and S2 Tables list all publically available datasets analyzed in this study.
We compared three pipelines with the original cheRNA-identification pipeline . Each pipeline includes four analytic steps: sequence mapping, transcript assembly, transcriptome construction, and signature identification (Fig 1C). Computational strategies in the latter three steps varied in four different pipelines (S2 File). Source file for the Tuxedo-ch pipeline is provided in the S1 File.
Detailed analysis of nuclear RNA-seq from lncRNA that are shorter than 1,000 bases or transcribed at a low level sheds new light on cis-regulatory elements. Operationally, cheRNA are defined by their statistically significant enrichment in the chromatin pellet fraction after biochemical fractionation of nuclei. With our improved computational strategy, we have examined the molecular characteristics of cheRNAs in greater detail than has heretofore been possible. We find that, first, cheRNAs are more likely to be transcribed from noncoding regions, while sneRNAs are mostly transcribed from protein-coding regions. Second, icheRNA have a lower transcription level and are largely unannotated, in contrast with isneRNA which are more highly transcribed and more frequently annotated. Traditional transcriptome profiling of non-coding RNA, using techniques such as total RNA-seq, yields the broadest survey of transcripts, but has limited ability to detect low expression transcripts such as icheRNA. Thus, previous analyses of noncoding RNA primarily focused on noncoding RNA with relatively high transcription levels (e.g., isneRNA and as-sneRNA). In contrast, isolating and sequencing chromatin-enriched RNAs in a nuclear extract more sensitively identifies low expression noncoding RNAs that previously have been ignored by conventional sequencing and analysis methods. Third, we have shown that icheRNA, in contrast to isneRNA, is mostly non-coding, non-polyadenylated, and positively correlated with the expression of neighboring coding genes (Fig 5A–5E). Notwithstanding the similarity of these features to those of eRNA, icheRNA has several unique molecular characteristics that distinguish it. For example, icheRNA is generally longer than eRNA (median length of icheRNA is ~4,400 bases; eRNA is ~350 bases, ) and icheRNA shows only modest coincidence with enhancer marks (H3K27ac, H3K4me1 and EP300) that are used to canonically define eRNA (Fig 4F). Moreover, some icheRNAs (e.g., XIST) are known to be repressive regulators rather than activators. Combining all this evidence, we conclude that icheRNA more comprehensively defines chromatin-localized regulatory lncRNAs than cis-activating eRNA.
We would like to acknowledge ENCODE for contributing sequencing datasets. We specifically acknowledge the assistance of Lorenzo Pesce, Kazutaka Takahashi, Purdue University ITaP Research Computing (RCAC) team, and the University of Chicago Research Computing Center for supporting high performance computing services. We thank Jeffrey D. Steimle and Kohta Ikegami for discussion on H3K9me3 patterns.
Citation: Sun X, Wang Z, Hall JM, Perez-Cervantes C, Ruthenburg AJ, Moskowitz IP, et al. (2020) Chromatin-enriched RNAs mark active and repressive cis-regulation: An analysis of nuclear RNA-seq. PLoS Comput Biol 16(2): e1007119. https://doi.org/10.1371/journal.pcbi.1007119
Editor: Ferhat Ay, La Jolla Institute for Allergy and Immunology, UNITED STATES
Received: May 17, 2019; Accepted: January 14, 2020; Published: February 10, 2020
Copyright: © 2020 Sun et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data underlying the results presented in the study are all publically available. The identifications of Tuxedo-ch in three cell types are now accessible on GitHub (https://github.com/xyang2uchicago/Tuxedo-ch).
Funding: XS, ZW, XY were supported by NIH National Library of Medicine (NLM) (R21LM012619). JH was supported by NIH Genetics and Regulation Training Grant (T32 GM07197). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
Competing interests: The authors declare no competing interests