Modelling Transcription with Explainable AI Uncovers Context-specific Epigenetic Gene Regulation at Promoters and Gene Bodies
Kashyap Chhatbar, Adrian Bird, Guido Sanguinetti
Abstract
Transcriptional regulation involves complex interactions with chromatin-associated proteins, but disentangling these mechanistically remains challenging. Here, we generate deep learning models to predict RNA Pol-II occupancy from chromatin-associated protein profiles in unperturbed conditions. We evaluate the suitability of Shapley Additive Explanations (SHAP), a widely used explainable AI (XAI) approach, to infer functional relevance and analyse regulatory mechanisms across diverse datasets.
Introduction
Understanding how cells regulate gene expression during physiological or disease processes remains one of the most important open problems in biology, with potentially immense implications for biomedicine and biotechnology. While the fundamental process of regulation by transcription factor binding has been extensively studied, the complexity of sequence-dependent signal integration in a crowded chromatin environment remains largely unknown.
Materials and Methods:
Raw sequencing data for RNA-seq and TT-seq experiments were also downloaded from GEO. Reads were aligned to the reference genome using Bowtie2. PCR duplicates were identified and removed using SAMtools markup to eliminate artifacts due to library preparation. Gene-level read counts were quantified using featureCounts from the Subread package. Differential gene expression analysis was performed using DESeq2 in R, applying default parameters to identify significantly differentially expressed genes.
Discussion:
A key strength of this study lies in the ability of SHAP analysis to identify biologically meaningful relationships directly from data derived from unperturbed cell systems. The model, trained solely on unperturbed conditions, captures the relationships between protein occupancy and transcriptional output without prior knowledge of perturbations. Remarkably, by ranking genes based on SHAP importance values, direct targets of perturbation can be accurately predicted. This represents a potentially significant achievement, enabling context-dependent inference and complex interplay of regulatory targets without the need for time-consuming and expensive perturbation experiments.
Acknowledgments
We are grateful to Sara Giuliani for helpful discussions and feedback on the manuscript.
Citation: Chhatbar K, Bird A, Sanguinetti G (2025) Modelling transcription with explainable AI uncovers context-specific epigenetic gene regulation at promoters and gene bodies. PLoS Genet 21(10): e1011908. https://doi.org/10.1371/journal.pgen.1011908
Editor: Charles G. Danko, Cornell University, UNITED STATES OF AMERICA
Received: February 4, 2025; Accepted: October 6, 2025; Published: October 23, 2025.
Copyright: © 2025 Chhatbar et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The raw sequencing data used in this study were obtained from the Gene Expression Omnibus (GEO) and are publicly available under the following accession numbers: GSE199805, GSE181714 and GSE159400. The source code, processed data and model weights are publicly available on the GitHub repository https://github.com/kashyapchhatbar/SHAP-analysis.
Funding: KC was supported by a scholarship from College of Science and Engineering, University of Edinburgh. This work was supported by a Wellcome Investigator Award to AB (ref. 222507), a European Research Council Advanced grant (ref. Gen-Epix - 694295) and a core grant (ref. 203149) to the Wellcome Centre for Cell Biology. GS acknowledges co-funding from Next Generation EU, in the context of the National Recovery and Resilience Plan, Investment PE1 - Project FAIR “Future Artificial Intelligence Research”. This resource was co-financed by the Next Generation EU (DM 1555 del 11.10.22). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interest exists.