Compositionally Aware Estimation of Cross-correlations for Microbiome Data
Ib Thorsgaard Jensen, Luc Janss, Simona Radutoiu, Rasmus Waagepetersen.
Abstract
In the field of microbiome studies, it is of interest to infer correlations between abundances of different microbes (here referred to as operational taxonomic units, OTUs). Several methods taking the compositional nature of the sequencing data into account exist. However, these methods cannot infer correlations between OTU abundances and other variables. In this paper we introduce the novel methods SparCEV (Sparse Correlations with External Variables) and SparXCC (Sparse Cross-Correlations between Compositional data) for quantifying correlations between OTU abundances and either continuous phenotypic variables or components of other compositional datasets, such as transcriptomic data.
Introduction
Sequencing data are ubiquitous in modern biology. For example, RNA-seq data have been used to identify genes associated with clinical outcomes of cancer patients, for human disease profiling, and to identify genes with possible links to Rett Syndrome. Microbiome data have drawn much attention in recent years, particularly regarding the human gut microbiome. Composition of the human gut microbiome has been shown to be associated with several aspects of human health, such as obesity and metabolic disorders. More recently, the integration of microbiome data with other omics data has received increasing interest.
Materials and Methods:
The aim of this paper is to estimate the correlation between log ai and other log transformed variables. However, we only have access to observed read counts, denoted xi for OTU i. To theoretically compare the different strategies and to develop new methods, we adopt a simplified modelling framework.
Discussion:
For the theoretical considerations in this paper, we, like Friedman and Alm [13], assume that the data follow the model in (1). According to this model, the true relative abundances ri are observed, which would only be the case with infinite sequencing depth. We nevertheless assess the different correlation estimation methods using data simulated under a more realistic setting where the xis are noisy observations of the ris. Specifically, SparseDOSSA2 assumes that the xis are multinomial, given the library size N and the ris.
Acknowledgments:
We thank Adrián Gómez Repollés for assistance with the dermatitis data. We thank Thorsten Thiergart and Ruben Garrido-Oter for assistance with the plant microbiome data. We thank B Kirtley Amos and Max Gordon for critical reading. We thank Sha Zhang for supplying the data used to construct the templates for gene expression data in the simulation studies. We thank Taylor Grace FitzGerald for copy-editing.
Citation: Jensen IT, Janss L, Radutoiu S, Waagepetersen R (2024) Compositionally aware estimation of cross-correlations for microbiome data. PLoS ONE 19(6): e0305032. https://doi.org/10.1371/journal.pone.0305032
Editor: Enrique Hernandez-Lemus, Instituto Nacional de Medicina Genomica, MEXICO
Received: October 18, 2023; Accepted: May 22, 2024; Published: June 28, 2024.
Copyright: © 2024 Jensen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data used in this paper can be found at https://github.com/IbTJensen/Microbiome-Cross-correlations/. The raw sequencing data from Byrd et al. can be found in NCBI Bioproject 46333, and the OTU table was originally obtained from Morton et al. at https://github.com/knightlab-analyses/reference-frames. The raw sequencing data from Thiergart et al. can be found at the European Nucleotide Archive (ENA). The 16S dataset has project accession no. PRJEB34100, and the ITS dataset has project accession no. PRJEB34099. The OTU tables was originally obtained at https://github.com/ththi/Lotus-Symbiosis.
Funding: This work was supported by the Bill and Melinda Gates Foundation and from Foreign, Commonwealth & Development Office through Engineering the Nitrogen Symbiosis for Africa (ENSA; OPP11772165). Ib Thorsgaard Jensen and Rasmus Waagepetersen were supported by research grant VIL57389 from Villum Fonden. The funders played no role in the content of this paper.
Competing interests: The authors have declared that no competing interests exist.