Marta Sanvicente-García, Albert García-Valiente, Socayna Jouide, Jessica Jaraba-Wallace, Eric Bautista, Marc Escobosa, Avencia Sánchez-Mejías, Marc Güell
Gene editing characterization with currently available tools does not always give precise relative proportions among the different types of gene edits present in an edited bulk of cells. We have developed CRISPR-Analytics, CRISPR-A, which is a comprehensive and versatile genome editing web application tool and a nextflow pipeline to give support to gene editing experimental design and analysis. CRISPR-A provides a robust gene editing analysis pipeline composed of data analysis tools and simulation.
CRISPR-based gene editing has become a fundamental toolbox to cover a large variety of research and applied needs. It facilitates the editing of endogenous genomic loci and systematic interrogation of genetic elements and causal genetic variations. Nowadays, it is even on the verge of becoming a therapeutic reality in vivo. Despite tremendous advances, DNA editing and writing still involve imperfect protocols which need to be optimized and evaluated. This makes it essential to have tools that enable accurate characterization of gene editing outcomes.
Simulations algorithm development
SimGE is built taking into account the different layers of classes and their proportions. The proportion of edited and not edited sequences can be determined by two different sgRNA efficiency predictors, Moreno-Mateos and Doench 2016 scores, which give the most reliable on-target activity prediction.
With the purpose of being able to compare the different sample sizes and the positions of the indels, we needed to define a distance metric. When clustering the observations into groups, we computed the distance between each pair of observations (S6 Table), giving an idea about the dissimilarity among the observations
NGS is the method that enables the identification of all different outcomes led by genome editing tools. There are different online and command line available tools to decipher the percentage of edits achieved in genome editing experiments. Even so, most of these tools do not retrieve all possible kinds of editing events and are not flexible enough to cover the whole diversity of genome editing tools.
We would like to thank María, Alejandro, Aitor, Andrea, Javier, Joana, Othmane, Jon, María, Leandro, Guillermo and Yabel for their collaboration in the examination of reads to generate a ground truth data set.
Citation: Sanvicente-García M, García-Valiente A, Jouide S, Jaraba-Wallace J, Bautista E, Escobosa M, et al. (2023) CRISPR-Analytics (CRISPR-A): A platform for precise analytics and simulations for gene editing. PLoS Comput Biol 19(5): e1011137. https://doi.org/10.1371/journal.pcbi.1011137
Editor: Ilya Ioshikhes, CANADA
Received: February 24, 2023; Accepted: April 30, 2023; Published: May 30, 2023.
Copyright: © 2023 Sanvicente-García et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Next-generation sequencing data are available in the European Nucleotide Archive under the Study accession number PRJEB53901. Previously published data used in this paper can be found under the following accession numbers: PRJNA326019, PRJNA486372, PRJNA208620 and PRJNA304717. SimGE developed R package can be installed with devtools: devtools::install_bitbucket("synbiolab/SimGE"). Code for CRISPR-A pipeline has been made available in Bitbucket https://bitbucket.org/synbiolab/crispr-a_nextflow/ and through the web page application https://synbio.upf.edu/crispr-a/. This pipeline will also be added to the NF-core community. Custom analysis scripts for data analysis and visualization are freely available at https://bitbucket.org/synbiolab/crispr-a_figures/.
Funding: This work was supported by the European Commission (European Union Horizon 2020 grant 825825 to MG), Ramón y Cajal program (grant RYC-2015-17734 to MG), Fundación Ramón Areces (grant “Advanced gene editing technologies to restore LAMA2 on merosin-deficient congenital muscular dystrophy type 1A” to MG) and Ministerio de Ciencia e Innovación de España (Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020 «Advanced methodologies for precise and efficient gene delivery» grant PID2020-118597RB-I00 to MG). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors declare that they have no conflict of interest.