Phenotype to Genotype: A New and Rapid Approach Using Whole-genome Sequencing
McKenna Feltes, Aleksey V. Zimin, Sofia Angel, Nainika Pansari, Monica R. Hensley, Jennifer L. Anderson, Meng-Chieh Shen, Mackenzie Klemek, Yi Shen, Vighnesh S. Ginde, Hannah Kozan, Nhan V. Le, Vivian P. Truong, Meredith H. Wilson, Steven L. Salzberg, Steven A. Farber.
Abstract
Forward genetic screening is a powerful approach to assign functions to genes and can be used to elucidate the many genes whose functions remain unknown. A key step in forward genetic screening is mapping: identification of the gene causing the phenotype. Existing mapping methods use a bioinformatic mapping-by-sequencing approach based on allelic frequency calculations that often identify large genomic regions which contain an intractable number of candidate genes for testing.
Introduction
Forward genetic screening is an established and effective technique for assigning new functions to genes, and is a particularly exciting strategy to address the ~ 20% of human genes with unknown functions through screens in model organisms such as zebrafish. Chemically (e.g., N-ethyl-N-nitrosourea [ENU], Ethyl methanesulfonate [EMS]) induced point mutants are powerful tools for the dissection of gene function in a variety of model organisms, but relative to more-easily identifiable genome modifications (e.g., transposon-mediated insertions, CRISPR/Cas9-mediated deletions), identifying the causative single base pair substitution that underlies a particular phenotype is significantly more challenging.
Materials and Methods:
For recessive mutations, the causative locus and surrounding genomic region will be homozygous in mutant animals, while regions outside of the locus will be more heterozygous as zebrafish are highly polymorphic. To leverage this principle we designed an algorithm that utilizes whole genome sequencing (WGS) data from mutant and wild-type sibling genomic DNA to identify regions of the genome that are more homozygous in mutant animals.
Discussion:
Here we introduce WheresWalker, a mutation mapping protocol based on bulk segregant analysis and demonstrate its ability to identify multiple genetic variants, including 4 novel mutations responsible for dark yolk phenotypes in zebrafish. The WheresWalker software 1) calculates a SNP index based on decreased mutant heterozygosity to identify genomic regions linked to a mutant phenotype, 2) filters variants to generate a list of the most likely candidates, and 3) automatically identifies background polymorphisms that can be used for recombinant mapping to refine the computationally defined interval and narrow the list of candidate genes.
Acknowledgments:
The authors acknowledge Dr. Rebecca Burdine for providing WIK zebrafish and the Carnegie Embryology Sequencing Core facility, particularly Allison Pinder and Frederick Tan, for supporting sequencing efforts. In addition, the authors acknowledge Jasmine James, Tye Chicha, Victoria Murphy, and Camille Coffey for phenotyping screen mutants during lab rotations, and Julia Baer who managed the fish facility during the screen.
Citation: Feltes M, Zimin AV, Angel S, Pansari N, Hensley MR, Anderson JL, et al. (2025) Phenotype to genotype: A new and rapid approach using whole-genome sequencing. PLoS Genet 21(7): e1011702. https://doi.org/10.1371/journal.pgen.1011702
Editor: Mary C. Mullins,, University of Pennsylvania School of Medicine, UNITED STATES OF AMERICA
Received: November 26, 2024; Accepted: April 28, 2025; Published: July 14, 2025.
Copyright: © 2025 Feltes et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: WGS datasets are available on the NCBI Sequence Read Archive (SRA) under BioProject PRJNA1187516. Accession numbers are provided in S3 File. WheresWalker is publicly available on github at https://github.com/alekseyzimin/WheresWalker.
Funding: This work was supported by grants from the National Institutes of Health (https://www.nih.gov/): F32GM144223 (M.F.), R01DK093399 (S.A.F), R01GM63904 (S.A.F), R01HL158054 (S.A.F), and R01HG006677 (S.L.S.), and the National Science Foundation (https://www.nsf.gov/): IOS-2432298 (A.V. Z.). Additional support for this work was provided by the Carnegie Institution for Science endowment and the G. Harold and Leila Y. Mathers Charitable Foundation (https://mathersfoundation.org/) (S.A.F). The funders did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.”
Competing interests: The authors have declared that no competing interests exist.