High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation [RESOURCES]

Jonas A. Gustafson1,2,35, Sophia B. Gibson1,3,35, Nikhita Damaraju1,5,35, Miranda P.G. Zalusky1, Kendra Hoekzema3, David Twesigomwe6, Lei Yang7, Anthony A. Snead8, Phillip A. Richmond9, Wouter De Coster10,11, Nathan D. Olson12, Andrea Guarracino13,14, Qiuhui Li15, Angela L. Miller1, Joy Goffena1, Zachary B. Anderson1, Sophie H.R. Storz1, Sydney A. Ward1, Maisha Sinha1, Claudia Gonzaga-Jauregui16, Wayne E. Clarke17,18, Anna O. Basile17, André Corvelo17, Catherine Reeves17, Adrienne Helland17, Rajeeva Lochan Musunuri17, Mahler Revsine15, Karynne E. Patterson3, Cate R. Paschal4,19, Christina Zakarian3, Sara Goodwin20, Tanner D. Jensen21, Esther Robb22, The 1000 Genomes ONT Sequencing Consortium, University of Washington Center for Rare Disease Research (UW-CRDR), Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium, William Richard McCombie20, Fritz J. Sedlazeck23,24,25, Justin M. Zook12, Stephen B. Montgomery21, Erik Garrison13, Mikhail Kolmogorov26, Michael C. Schatz14, Richard N. McLaughlin Jr.2,7, Harriet Dashnow27,28, Michael C. Zody16, Matt Loose29, Miten Jain30,31,32, Evan E. Eichler3,33,34 and Danny E. Miller1,4,33 1Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA; 2Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195, USA; 3Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA; 4Department of Laboratory Medicine and Pathology, University of Washington, Seattle, Washington 98195, USA; 5Institute for Public Health Genetics, University of Washington, Seattle, Washington 98195, USA; 6Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg 2193, South Africa; 7Pacific Northwest Research Institute, Seattle, Washington 98122, USA; 8Department of Biology, New York University, New York, New York 10003, USA; 9Alamya Health, Baton Rouge, Louisiana 70806, USA; 10Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp 2650, Belgium; 11Department of Biomedical Sciences, University of Antwerp, Antwerp 2000, Belgium; 12Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA; 13Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA; 14Human Technopole, Milan 20157, Italy; 15Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA; 16International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Mexico City 76230, Mexico; 17New York Genome Center, New York, New York 10013, USA; 18Outlier Informatics Inc., Saskatoon, Saskatchewan S7H 1L4, Canada; 19Department of Laboratories, Seattle Children's Hospital, Seattle, Washington 98195, USA; 20Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; 21Department of Genetics, Stanford University, Stanford, California 94305, USA; 22Department of Computer Science, Stanford University, Stanford, California 94305, USA; 23Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; 24Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA; 25Department of Computer Science, Rice University, Houston, Texas 77251, USA; 26Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, Maryland 20892, USA; 27Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA; 28Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado 80045, USA; 29Deep Seq, School of Life Sciences, University of Nottingham, Nottingham NG7 2TQ, UK; 30Department of Bioengineering, Northeastern University, Boston, Massachusetts 02115, USA; 31Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA; 32Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts 02115, USA; 33Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, Washington 98195, USA; 34Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA

35 These authors contributed equally to this work.

Corresponding author: dm1uw.edu Abstract

Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

Footnotes

[Supplemental material is available for this article.]

Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279273.124.

Freely available online through the Genome Research Open Access option.

Received March 4, 2024. Accepted September 16, 2024.

Comments (0)

No login
gif