Understanding and Mitigating the Impact of Ambient mRNA Contamination in Single-cell RNA-sequencing Analysis
Jantarika Kumar Arora, Louisa K. James, Varodom Charoensawan.
Abstract
Droplet-based single-cell RNA sequencing (scRNA-seq) frequently encounters significant challenges from contamination of cell-free mRNAs, known as “ambient mRNAs”, which can substantially distort single-cell transcriptome data interpretation to a large extent. In this study, we investigate the impact of ambient mRNA contamination on differential gene expression and biological pathway enrichment analyses, using two independent scRNA-seq datasets: ten peripheral blood mononuclear cells (PBMCs) samples from dengue-infected patients and forty-two scRNA-seq samples of human fetal liver tissues.
Introduction
Single-cell RNA-sequencing (scRNA-seq) has become a powerful technique for investigating transcriptomic profiles and complex cellular heterogeneity at the single-cell resolution. This technology offers not only profound insights into cellular heterogeneity, but also improves our understanding on the functions of highly complex biological systems in both normal and disease-related physiological contexts.
Materials and Methods:
All datasets used in this study were processed using the same consistent pipeline described as follows. Raw FASTQ files were aligned and quantified using the CellRanger Single-Cell Software Suite (version 8.0.1) and the reference human genome GRCh38-2024-A (10x Genomics, USA). Standard preprocessing steps, including normalization of gene expression levels, scaling, clustering, and dimensionality reduction, of individual single-cell data was carried out using Seurat V.5.2.1.
Discussion
To demonstrate the presence of ambient mRNA contamination and its impact on downstream analysis, we first utilised a publicly available peripheral blood mononuclear cell (PBMC) dataset. The dataset exhibited contamination from ambient mRNAs or background noise, as evidenced by the presence of nonzero counts of known marker genes in unexpected cell types.
Acknowledgments
This research utilised Queen Mary’s Apocrita HPC facility, supported by QMUL Research-IT (http://doi.org/10.5281/zenodo.438045). We acknowledge the ITS Research Team at QMUL for their support. Resources for data processing were also provided by Mahidol University and the Office of the Ministry of Higher Education, Science, Research and Innovation under the Reinventing University project: the Center of Excellence in AI-Based Medical Diagnosis (AI-MD) sub-project. We thank Sarintip Nguantad for running CellBender to preprocess single-cell RNA-seq data.
Citation: Arora JK, James LK, Charoensawan V (2025) Understanding and mitigating the impact of ambient mRNA contamination in single-cell RNA-sequencing analysis. PLoS One 20(9): e0332440. https://doi.org/10.1371/journal.pone.0332440
Editor: Wan-Tien Chiang, Augusta University, TAIWAN
Received: August 6, 2024; Accepted: August 30, 2025; Published: September 24, 2025.
Copyright: © 2025 Arora et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Single-cell RNA-sequencing datasets: The raw sequencing reads of the single-cell RNA-seq datasets used in this study include the peripheral blood mononuclear cell (PBMC) datasets, which consists of eight single-cell experiments from dengue patients and one healthy donor [13], available through the ArrayExpress repository: E-MTAB-9467. Another healthy sample was obtained from the 10x Genomics website (4k PBMCs from a healthy donor, Single Cell Gene Expression Dataset by Cell Ranger 2.1.0, 10x Genomics, 2017, November 8, https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k). In addition, the raw sequencing data of forty-two single-cell experiments of human fetal liver tissues [21] were obtained from the ArrayExpress database, under the accession number: E-MTAB-7407. The raw and filtered 10x Genomics species-mixing dataset, which contains a mixture of human HEK293T and mouse NIH3T3 cells, is available at https://www.10xgenomics.com/datasets/10-k-1-1-mixture-of-human-hek-293-t-and-mouse-nih-3-t-3-cells-3-v-3-1-3-1-standard-6-0-0, retrieved on 20 June 2025.
Funding: This project is funded by the mid-career researcher grant from National Research Council of Thailand (NRCT) and Mahidol University (N42A670557) through VC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.