Thiago S. Guzella, Vasco M. Barreto, Jorge Carneiro
Phenotypic variation in the copy number of gene products expressed by cells or tissues has been the focus of intense investigation. To what extent the observed differences in cellular expression levels are persistent or transient is an intriguing question. Here, we develop a quantitative framework that resolves the expression variation into stable and unstable components. The difference between the expression means in two cohorts isolated from any cell population is shown to converge to an asymptotic value, with a characteristic time, τT, that measures the timescale of the unstable dynamics. The asymptotic difference in the means, relative to the initial value, measures the stable proportion of the original population variance . Empowered by this insight, we analysed the T-cell receptor (TCR) expression variation in CD4 T cells. About 70% of TCR expression variance is stable in a diverse polyclonal population, while over 80% of the variance in an isogenic TCR transgenic population is volatile. In both populations the TCR levels fluctuate with a characteristic time of 32 hours. This systematic characterisation of the expression variation dynamics, relying on time series of cohorts’ means, can be combined with technologies that measure gene or protein expression in single cells or in bulk.
The phenotypic variation among organisms or cells is a theme of growing importance in biology. Macroscopic phenotypes, such as body structures or physiologic responses, have been studied for ages, but one phenotype particularly suitable for quantification that has received attention in the last decades is the amount of specific mRNAs and proteins expressed by single cells. Advances in genomics have allowed the analysis of genetic contributions to variation in gene expression, in terms of so-called expression quantitative trait loci (eQTL) [1, 2]. In this case, expression levels, typically assessed via mRNA levels, are treated as quantitative traits, and one is interested in the specific loci underlying variation in expression levels among different individuals. The increasing availability of single-cell resolution genomics, proteomics and metabolomics technologies has enabled molecular biologists to analyse cell lineages and tissues showing that what were previously perceived as homogeneous cell populations are in fact a complex mixture of often transient and interchangeable cellular types and cellular states (see discussion in ). In parallel to these studies linking phenotypes to genotype, the literature on stochastic gene expression [4–8], reviewed in , has brought to light the variation in expression levels in isogenic cells, even when these are in the same cellular state and in the same environment. The variation is typically attributed to the “noise” resulting from the small copy number of molecules involved in the process.
Materials and methods
This research project was ethically reviewed and approved by the Ethics Committee of the Instituto Gulbenkian de Ciência, and by the Portuguese National Entity that regulates the use of laboratory animals (DGAV—Direção Geral de Alimentação e Veterinária (license reference: 0421/000/000/2013). All experiments conducted on animals followed the Portuguese (Decreto-Lei number 113/2013) and European (Directive 2010/63/EU) legislations, concerning housing, husbandry and animal welfare.
In this article, we introduce a new approach to analyse the variation in protein expression levels in a cell population, which enables measuring the characteristic dynamics of the fluctuations in cellular expression and estimating the magnitude of stable and unstable contributions to the variation across cells. The analysis is based on the realisation that the difference between the means of log-transformed expression levels in two selected cohorts isolated from a population of interest converges with approximate exponential dynamics to an asymptotic value. By normalising this asymptotic value by the difference in cohorts’ means immediately after their isolation one obtains an unbiased estimation of the proportion of population variance that is explained by the stable component , while the mean convergence time τT measures the timescale of unstable component dynamics. This key insight stems from perceiving any cell population as a mixture of many independent subpopulations, each with a characteristic mean expression level, that is fixed yet distributed among the subpopulations. Under these assumptions, the population variance is equated to the sum of the variance of the subpopulations means, which embodies the stable component of variation, and the variance of the expression level within the subpopulations, which represents the unstable component.
We are grateful to Jocelyne Demengeot and Henrique Teotónio for the support during the development of this work and to Alberto Darszon and Vera Martins for reading an earlier version of this manuscript. We thank Rui Gardner, Telma Lopes and Cláudia Bispo for assistance on flow cytometry analysis and cell sorting.
Citation: Guzella TS, Barreto VM, Carneiro J (2020) Partitioning stable and unstable expression level variation in cell populations: A theoretical framework and its application to the T cell receptor. PLoS Comput Biol 16(8): e1007910. https://doi.org/10.1371/journal.pcbi.1007910
Editor: Martin Meier-Schellersheim, National Institutes of Health, UNITED STATES
Received: July 18, 2019; Accepted: April 24, 2020; Published: August 25, 2020
Copyright: © 2020 Guzella et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The code and data sets are freely available in the following URLs in the data repository of the Instituto Gulbenkian de Ciência: http://downloads.igc.gulbenkian.pt/jcarneir/GuzellaetalPLoSComputBiol_code.zip http://downloads.igc.gulbenkian.pt/jcarneir/GuzellaetalPLoSComputBiol_data.zip.
Funding: This work was supported by a grant from the Fundação para a Ciência e Tecnologia (FCT) (PTDC/BIA-BCM/108020/2008). TSG was supported by a fellowship from FCT (SFRH/BD/33572/2008). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.