Targeted long-read sequencing enriches disease-relevant genomic regions of interest to provide complete Mendelian disease diagnostics

Research ArticleGeneticsOphthalmology Open Access | 10.1172/jci.insight.183902

Kenji Nakamichi,1,2 Jennifer Huey,1,2 Riccardo Sangermano,3 Emily M. Place,3 Kinga M. Bujakowska,3 Molly Marra,4 Lesley A. Everett,4 Paul Yang,4 Jennifer R. Chao,1,2 Russell N. Van Gelder,1,2,5 and Debarshi Mustafi1,2,6,7

1Department of Ophthalmology, University of Washington, Seattle, Washington, USA.

2Roger and Karalis Johnson Retina Center, Seattle, Washington, USA.

3Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.

4Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA.

5Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA.

6Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA.

7Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA.

Address correspondence to: Debarshi Mustafi, Department of Ophthalmology, University of Washington and Roger and Karalis Johnson Retina Center, 750 Republican St., E273, Seattle, Washington 98109, USA. Phone: 206.221.2029; Email: debarshi@uw.edu.

Find articles by Nakamichi, K. in: JCI | PubMed | Google Scholar

1Department of Ophthalmology, University of Washington, Seattle, Washington, USA.

2Roger and Karalis Johnson Retina Center, Seattle, Washington, USA.

3Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.

4Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA.

5Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA.

6Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA.

7Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA.

Address correspondence to: Debarshi Mustafi, Department of Ophthalmology, University of Washington and Roger and Karalis Johnson Retina Center, 750 Republican St., E273, Seattle, Washington 98109, USA. Phone: 206.221.2029; Email: debarshi@uw.edu.

Find articles by Huey, J. in: JCI | PubMed | Google Scholar

1Department of Ophthalmology, University of Washington, Seattle, Washington, USA.

2Roger and Karalis Johnson Retina Center, Seattle, Washington, USA.

3Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.

4Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA.

5Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA.

6Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA.

7Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA.

Address correspondence to: Debarshi Mustafi, Department of Ophthalmology, University of Washington and Roger and Karalis Johnson Retina Center, 750 Republican St., E273, Seattle, Washington 98109, USA. Phone: 206.221.2029; Email: debarshi@uw.edu.

Find articles by Sangermano, R. in: JCI | PubMed | Google Scholar

1Department of Ophthalmology, University of Washington, Seattle, Washington, USA.

2Roger and Karalis Johnson Retina Center, Seattle, Washington, USA.

3Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.

4Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA.

5Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA.

6Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA.

7Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA.

Address correspondence to: Debarshi Mustafi, Department of Ophthalmology, University of Washington and Roger and Karalis Johnson Retina Center, 750 Republican St., E273, Seattle, Washington 98109, USA. Phone: 206.221.2029; Email: debarshi@uw.edu.

Find articles by Place, E. in: JCI | PubMed | Google Scholar

1Department of Ophthalmology, University of Washington, Seattle, Washington, USA.

2Roger and Karalis Johnson Retina Center, Seattle, Washington, USA.

3Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.

4Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA.

5Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA.

6Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA.

7Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA.

Address correspondence to: Debarshi Mustafi, Department of Ophthalmology, University of Washington and Roger and Karalis Johnson Retina Center, 750 Republican St., E273, Seattle, Washington 98109, USA. Phone: 206.221.2029; Email: debarshi@uw.edu.

Find articles by Bujakowska, K. in: JCI | PubMed | Google Scholar |

1Department of Ophthalmology, University of Washington, Seattle, Washington, USA.

2Roger and Karalis Johnson Retina Center, Seattle, Washington, USA.

3Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.

4Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA.

5Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA.

6Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA.

7Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA.

Address correspondence to: Debarshi Mustafi, Department of Ophthalmology, University of Washington and Roger and Karalis Johnson Retina Center, 750 Republican St., E273, Seattle, Washington 98109, USA. Phone: 206.221.2029; Email: debarshi@uw.edu.

Find articles by Marra, M. in: JCI | PubMed | Google Scholar

1Department of Ophthalmology, University of Washington, Seattle, Washington, USA.

2Roger and Karalis Johnson Retina Center, Seattle, Washington, USA.

3Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.

4Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA.

5Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA.

6Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA.

7Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA.

Address correspondence to: Debarshi Mustafi, Department of Ophthalmology, University of Washington and Roger and Karalis Johnson Retina Center, 750 Republican St., E273, Seattle, Washington 98109, USA. Phone: 206.221.2029; Email: debarshi@uw.edu.

Find articles by Everett, L. in: JCI | PubMed | Google Scholar

1Department of Ophthalmology, University of Washington, Seattle, Washington, USA.

2Roger and Karalis Johnson Retina Center, Seattle, Washington, USA.

3Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.

4Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA.

5Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA.

6Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA.

7Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA.

Address correspondence to: Debarshi Mustafi, Department of Ophthalmology, University of Washington and Roger and Karalis Johnson Retina Center, 750 Republican St., E273, Seattle, Washington 98109, USA. Phone: 206.221.2029; Email: debarshi@uw.edu.

Find articles by Yang, P. in: JCI | PubMed | Google Scholar

1Department of Ophthalmology, University of Washington, Seattle, Washington, USA.

2Roger and Karalis Johnson Retina Center, Seattle, Washington, USA.

3Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.

4Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA.

5Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA.

6Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA.

7Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA.

Address correspondence to: Debarshi Mustafi, Department of Ophthalmology, University of Washington and Roger and Karalis Johnson Retina Center, 750 Republican St., E273, Seattle, Washington 98109, USA. Phone: 206.221.2029; Email: debarshi@uw.edu.

Find articles by Chao, J. in: JCI | PubMed | Google Scholar

1Department of Ophthalmology, University of Washington, Seattle, Washington, USA.

2Roger and Karalis Johnson Retina Center, Seattle, Washington, USA.

3Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.

4Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA.

5Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA.

6Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA.

7Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA.

Address correspondence to: Debarshi Mustafi, Department of Ophthalmology, University of Washington and Roger and Karalis Johnson Retina Center, 750 Republican St., E273, Seattle, Washington 98109, USA. Phone: 206.221.2029; Email: debarshi@uw.edu.

Find articles by Van Gelder, R. in: JCI | PubMed | Google Scholar |

1Department of Ophthalmology, University of Washington, Seattle, Washington, USA.

2Roger and Karalis Johnson Retina Center, Seattle, Washington, USA.

3Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.

4Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA.

5Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA.

6Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA.

7Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA.

Address correspondence to: Debarshi Mustafi, Department of Ophthalmology, University of Washington and Roger and Karalis Johnson Retina Center, 750 Republican St., E273, Seattle, Washington 98109, USA. Phone: 206.221.2029; Email: debarshi@uw.edu.

Find articles by Mustafi, D. in: JCI | PubMed | Google Scholar

Published September 12, 2024 - More info

Published in Volume 9, Issue 20 on October 22, 2024
JCI Insight. 2024;9(20):e183902. https://doi.org/10.1172/jci.insight.183902.
© 2024 Nakamichi et al. This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Published September 12, 2024 - Version history
Received: June 26, 2024; Accepted: September 10, 2024 View PDF Abstract

Despite advances in sequencing technologies, a molecular diagnosis remains elusive in many patients with Mendelian disease. Current short-read clinical sequencing approaches cannot provide chromosomal phase information or epigenetic information without further sample processing, which is not routinely done and can result in an incomplete molecular diagnosis in patients. The ability to provide phased genetic and epigenetic information from a single sequencing run would improve the diagnostic rate of Mendelian conditions. Here, we describe targeted long-read sequencing of Mendelian disease genes (TaLon-SeqMD) using a real-time adaptive sequencing approach. Optimization of bioinformatic targeting enabled selective enrichment of multiple disease-causing regions of the human genome. Haplotype-resolved variant calling and simultaneous resolution of epigenetic base modification could be achieved in a single sequencing run. The TaLon-SeqMD approach was validated in a cohort of 18 individuals with previous genetic testing targeting 373 inherited retinal disease (IRD) genes, yielding the complete molecular diagnosis in each case. This approach was then applied in 2 IRD cases with inconclusive testing, which uncovered noncoding and structural variants that were difficult to characterize by standard short-read sequencing. Overall, these results demonstrate TaLon-SeqMD as an approach to provide rapid phased-variant calling to provide the molecular basis of Mendelian diseases.

Introduction

The clinical heterogeneity of Mendelian disorders makes genetic testing essential in providing a precise diagnosis. However, despite remarkable advances in sequencing technologies over the past 20 years, nearly half of patients with Mendelian disease lack a complete molecular diagnosis (13). The precise identification of genotypic causes of disease as well as chromosomal phase information has taken on new importance, as treatment is only indicated for specific genetic defects in Mendelian conditions such as inherited retinal diseases (IRDs), for which FDA-approved gene therapy exists (4). The current standard–of-care testing approach to genetically diagnose diseased patients are targeted short-read exome-based sequencing panels (5). Compared with short-read exome sequencing, short-read genome sequencing (GS) provides increased diagnostic efficiency (6), but it has only provided a modest increase in molecular diagnosis (7). The missing genetic causality of disease is thought to reside in genomic regions of known disease-causing loci comprising structural (8) and noncoding variants (9, 10) of the genome, but may be difficult to sequence with short reads. More importantly, short-read GS methods do not yield haplotype information (11), which requires subsequent familial segregation studies to establish a molecular diagnosis in cases of autosomal recessive inheritance.

Long-read GS approaches from Pacific Biosciences and Oxford Nanopore Technologies (ONT) (12) have the potential to overcome these limitations by readily sequencing intronic and flanking genomic regions. Furthermore, by linking variants on single long-reads, long-read GS offers the added benefit of genomic phase information to provide a molecular diagnosis from the proband alone (13, 14). Whereas long-read GS offers immense genomic information, data storage and processing can make analysis costly and burdensome (15). In practice, Mendelian diseases are predominantly caused by diseased alleles located within a limited number of genomic loci, so focused genome-level sequencing of particular disease-causing loci would be more clinically relevant. Furthermore, this would eliminate the ethical issues and familial burden of managing incidental findings uncovered by GS unrelated to the diagnostic aim (16, 17), which can be a reason families defer genetic testing (18). However, current methods for targeted sequencing are labor intensive and not easily modifiable. Target panel enrichment with solution-based selection methods (19), commonly used in commercial exome-based gene panel testing, are difficult to modify to include new genomic regions. Genomic regions can be targeted with Cas9 to ligate adapters for long-read sequencing of multiple loci, but this strategy is limited by the size of fragments that can be targeted (20) and requires significant effort if targeting multiple loci.

To overcome these limitations, we leveraged long-read sequencing technology from ONT with a real-time bioinformatic adaptive sequencing functionality that allows rapid classification of the generated current signal to determine whether a DNA molecule should be sequenced or not (21, 22). In this work, we show that targeted long-read sequencing of Mendelian disease genes (TaLon-SeqMD) is customizable to multiple genomic loci (here, all IRD-associated genes). We developed metrics to analyze proper targeting of genomic loci to generate an optimized targeting genomic reference for use with standardly prepared genomic DNA (gDNA) libraries. After benchmarking the performance of TaLon-SeqMD in individuals who previously underwent CLIA-approved clinical molecular testing, we utilized TaLon-SeqMD to solve the genetic basis of disease in 2 individuals with a clinical presentation consistent with IRD, but with prior inconclusive genetic testing.

Results

Optimized genomic reference targeting provides focused depth of coverage for haplotype-resolved variant calling of targeted genomic loci. With adaptive sampling on the ONT platform, emergent reads of single DNA molecules are compared in real time against a database of desired (positive selection) or undesired (negative selection) sequences, and unwanted sequences are aborted by reversal of charge at the level of individual nanopores. To evaluate the use of adaptive sampling to accurately identify IRD variants, we designed a custom panel encompassing a comprehensive list of genes implicated in IRDs (n = 373, Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/jci.insight.183902DS1). For each gene, the entire locus and flanking 50 kb of sequence in each direction were targeted via positive selection. A 50-kb flank was chosen so that long reads originating outside of each gene were captured and that entire gene was effectively covered. In total, the panel covered 54.3 megabases (Mb), corresponding to approximately 1.7% of the human haploid genome. For sequencing library preparation, high molecular weight genomic DNA was extracted from blood samples of consenting individuals for sequencing on ONT MinION flow cells. Real-time basecalling was carried out using the “super-accurate” model parameters on a custom Linux-based computing workstation equipped with 2 NVIDIA RTX A6000 graphics cards and AMD Treadripper Pro 4995WX 64-core, 128-thread desktop processor.

We first determined the optimal settings required to efficiently target the different genomic loci using the Genome Reference Consortium Human Build 38 (GRCh38) (23). A browser extensible data (BED) file of genomic coordinates from GRCh38 of each of the 373 IRD genomic regions was used for initial targeting. To assess proper targeting, the DNA bases expected to be mapped to each targeted genomic locus were calculated as a fraction of the total bases of all targeted loci (54.3 Mb). This was then compared to the observed DNA bases that were uniquely mapped to each targeted locus after a sequencing run. A linear regression of the observed versus expected bases revealed that 31 genomic loci exhibited lower than expected number of observed reads (Supplemental Figure 1A). Closer examination of these genomic regions revealed that entire genes or portions of a gene did not map correctly due to inherent errors in the GRCh38 assembly (Supplemental Figure 1, B and C). Masking these selected genomic regions and generating a new GRCh38 reference assembly file was a major advance that led to proper targeting and improved correlation of these points on the linear regression (Supplemental Figure 1, D–F). Moreover, this method of read alignment assessment for a sequencing run can be modified for any targeted set of genomic loci to determine the optimal targeting reference necessary for accurate variant calling.

With proper targeting parameters established and optimized for our IRD panel, we sought to compare sequencing efficiency of TaLon-SeqMD to nontargeted long-read GS using DNA libraries prepared from the same individual with IRD (subject 1) on ONT MinION flow cells. In the first flow cell we carried out long-read GS without any targeting, and in the second flow cell we utilized adaptive sampling to target the preselected 373 IRD genomic loci using our updated reference file. TaLon-SeqMD produced enhanced depth of coverage of all 373 loci, whereas with whole-genome sequencing there were gaps, with regions of interest exhibiting little to no read coverage (Figure 1A and Supplemental Figure 2). The depth of sequencing allowed for phasing of the disease-causing variant in the rhodopsin (RHO) gene in this individual with the TaLon-SeqMD run, but not with the GS run (Figure 1B). The mean per-base coverage of the adaptive sampling channels was 25× compared with 3× from the nonadaptive sampling channels (Figure 1C), whereas the GS flow cell resulted in a modest 5× mean per-base coverage of IRD gene loci (Figure 1D). This reduced depth of sequencing resulted in a statistically significant decrease in phased regions of the targeted loci with GS. TaLon-SeqMD resulted in phasing of 85% of targeted loci (median of 100%) compared with 64% (median of 69%) with GS (Figure 1D). More importantly, there were entire genomic regions that were unable to be phased with the GS run.

TaLon-SeqMD generates selective whole-gene coverage of IRD genes to allow pFigure 1

TaLon-SeqMD generates selective whole-gene coverage of IRD genes to allow phased-variant identification. (A) Coverage maps and sequencing alignments of a 1500-kb region of chromosome 11 with whole-genome sequencing (WGS) and targeted sequencing of IRD disease-gene loci in that region (TMEM138, TMEM216, BEST1, ASRGL1, ROM1) demonstrate that bioinformatic targeting provides focused depth of sequencing. The locations of the targeted regions are marked. (B) The rhodopsin (RHO) locus is shown to demonstrate the increased depth obtained from a targeted run compared with a whole-genome run allows for haplotyping to conclusively demonstrate that a disease variant segregates on a single allele. (C) Examination of the coverage across the genome shows selected enrichment of bases covered by the panel genes (blue dots) compared with background coverage of the genome from nonadaptive reads. (D) Box-and-whisker plots show that targeted panel sequencing results in 25× mean per-base coverage compared with 3× with nonadaptive reads and 5× with WGS on a single MinION flow cell. Calculation of phase breath of the data revealed that TaLon-SeqMD was able to phase significantly more of the targeted genomic regions than WGS.

TaLon-SeqMD validates clinical sequencing data and provides full molecular diagnoses in genotypically diverse Mendelian disease cases. To establish that TaLon-SeqMD can provide diagnostic information in Mendelian disease cases, we enrolled individuals that had undergone clinical molecular testing in CLIA-approved facilities. We analyzed DNA samples from 19 additional individuals, which included 14 affected IRD individuals and 5 unaffected family members. Across all samples, the mean per-base coverage of the 373 loci was 22.74 ± 2.88, with a mean read length of 7090 ± 2595 bases from the adaptive sampling channels. We achieved greater than 15-fold enrichment on average across all samples of our 373 targeted genomic loci (Figure 2). More importantly, we demonstrated that greater than 91% (0.91 ± 0.04) of all targeted genomic loci were fully phased across all samples from a single sequencing run. The median across all samples was 100%, and examination of the lower quartile revealed that greater than 96% of targeted genomic loci were fully phased across the samples. Most importantly, despite the range in panel coverage, read length, and phase breadth across the cohort, we were able to deliver a molecular diagnosis in each case.

TaLon-SeqMD across a cohort of familial and isolated proband samples providFigure 2

TaLon-SeqMD across a cohort of familial and isolated proband samples provides selective enrichment of targeted genomic loci and phasing of disease-relevant genes to allow for a full molecular diagnosis in each case. Box-and-whisker plots illustrate the depth of coverage from adaptive targeted channels (purple) and nonadaptive channels (green) as well as the phase breadth (orange) for each subject. (A) Familial individuals were first assessed across 3 different families. TaLon-SeqMD demonstrated an average of 19-fold enrichment of targeted genomic loci and average phase breadth of 0.91, allowing complete verification of allelic architecture of disease variants. (B) Eight proband samples were then examined with TaLon-SeqMD and 16-fold mean enrichment of targeted genomic loci and average phase breadth of 0.91 was achieved across samples. For subject 16 who exhibited the lowest overall sequencing output that led to a mean phase breadth of 0.84, the clinically relevant variants could still be phased to reassign a VUS to provide a full molecular diagnosis.

We initially examined familial data to verify phased variants in disease-affected individuals (Figure 2A). We carried out TaLon-SeqMD on 3 families with disease variants in USH2A (subjects 2–4) TPP1 (subjects 5–8), and USH2A (subjects 9–12). In each case we were able to correctly phase the proband samples with our approach and confirm the allelic architecture with familial data. We then shifted our attention to 8 disease-affected individuals in whom variant phasing was not possible due to lack of familial DNA (Figure 2B). We first showed how different arrangements of complex variants in ABCA4 can lead to varied phenotypic presentations in 2 cases (subjects 13 and 14). The ability to phase variants allowed reclassification variants of uncertain significance (VUS) to likely pathogenic to provide a complete molecular diagnosis in 2 cases (subjects 15 and 16). In 1 case (subject 17) without clinical testing results at the time of TaLon-SeqMD, we showed that pathogenic variants lying over 526 kb apart could be identified and phased to provide a rapid molecular diagnosis. We further show that in 2 cases (subjects 18 and 19) that TaLon-Seq provided a full molecular diagnosis after indeterminate clinical short-read sequencing. Finally, we show in subject 20 that the ability to sequence native DNA allows decoding the base methylation signal to identify potentially important epigenetic features of the genome in the context of disease.

Allelic architecture of variants revealed by TaLon-SeqMD can prioritize variants for further analysis to establish a molecular diagnosis. DNA in each prepared library is stochastically sampled to perform positive selection for full-length sequencing, so we hypothesized that expansion from a single gene to 373 genomic loci should not affect depth of coverage. To this hypothesis, we first examined familial data (family 1) of 2 affected siblings (subjects 2 and 3) with Usher syndrome type 2 (USH2) and their unaffected mother (subject 4) in whom we had previously carried out targeted long-read single-gene analysis of USH2A (13). We found that expanding our targeting to 373 genomic loci did not result in decreased coverage of USH2A relative to single-gene-targeting sequencing (Supplemental Figure 3) and could still provide phased-variant calling for molecular diagnosis. We next examined family 2 afflicted with a syndromic IRD caused by variants in the TPP1 gene, which is one of the most prevalent forms of juvenile neuronal ceroid lipofuscinosis (JNCL) (24), to better understand how allelic architecture may influence disease phenotype. Clinical exome testing had identified a nonsense variant (c.837C>G, p.Tyr279Ter) and a potential second variant in a noncoding region (c.508+4T>C) in both affected siblings. Targeted variant testing of unaffected parents (subjects 5 and 6) revealed that they each harbored 1 of the 2 variants, providing evidence that the suspected disease variants lie in trans. Phenotypically, the older sibling (subject 7) had evidence of retinal disease, but the younger sibling (subject 8) had normal retinal findings. TaLon-SeqMD of all 4 family members provided on average 20× coverage of all targeted loci and phasing of over 90% of all targeted genes, including full-phased coverage of TPP1 (Figure 3A). Closer examination of the 800-bp region of TPP1 containing exons 4–6 demonstrated that the unaffected parents each harbored 1 variant, whereas both affected siblings harbored both the nonsense and noncoding variants in a trans configuration (Figure 3B), confirming previous clinical testing data. Our analysis pipeline identified the nonsense variant in TPP1 as the top disease-variant candidate. Moreover, since we had the benefit of also sequencing other syndromic retinal disease loci, we were able to rule out other genetic etiologies. The atypical presentation of JCNL suggested that the allele harboring the noncoding variant likely retained some activity due to reduced, but not abolished, protein product (25). To test this hypothesis, we carried out a splicing assay that showed the noncoding allele, c.508+4A>G, functions as a nonessential splice site leading to aberrant splicing compared with the normal allele, which led to decreased, but not complete, abolishment of the native protein product (Figure 3C), thereby providing a biochemical basis for a molecular diagnosis of atypical TPP1-associated JNCL.

Targeted panel sequencing reveals haplotagged variants in the TPP1 gene.Figure 3

Targeted panel sequencing reveals haplotagged variants in the TPP1 gene. (A) Coverage maps show the sequence alignments at the TPP1 locus and surrounding 50-kb region denoted by on-target reads. There is negligible coverage in off-target regions flanking the gene. (B) Closer view of the regions encompassing exons 4 to 6 reveal the location of the 2 potential disease-causing SNVs. The unaffected parents are shown to each possess 1 variant, whereas both affected children possess both. (C) Since both of the affected individuals had atypical forms of Batten disease, the intronic variant was hypothesized to be a hypomorph, which was demonstrated to display variant-induced aberrant splicing compared with the normal allele, which is denoted graphically with a sashimi plot.

TaLon-SeqMD provides phased-variant identification for allelic localization of complex variants to explain phenotypic findings and aid in VUS reassignment. Complex alleles, where 2 or more disease variants may lie in cis, can complicate disease diagnosis. Understanding of the precise allelic architecture of disease variants can be critical in disease prognosis. To investigate these cases, we describe family 3 in which 2 affected siblings with USH2 were found to harbor 3 pathogenic variants in the USH2A gene with clinical exome panel testing. Targeted variant testing of the unaffected parents revealed that the father harbored 2 of the variants, whereas the mother harbored the other variant. TaLon-SeqMD of all 4 family members (subjects 9–12) produced fully phased coverage of the large USH2A locus, spanning 1.2 Mb, across all 4 individuals. The data demonstrated that the father harbored 2 heterozygous disease-causing variants in cis (c.6159del and c.4106C>T) and the mother harbored 1 heterozygous disease-causing variant (c.9270C>A). Both affected offspring exhibited all 3 variants, with the c.6159del and c.4106C>T variants in cis and the c.9270C>A variant in trans to the other 2 (Figure 4A). Despite the c.4106C>T and c.9270C>A variants being over 614 kb apart, TaLon-SeqMD was able to phase the variants from a single sequencing run. Whereas all 4 family members underwent long-read sequencing, these results demonstrate that the precise allelic architecture of the complex disease variants could be revealed from sequencing of a single individual without the need for familial samples.

Haplotype-resolved assembly of complex phased variants provides insight intFigure 4

Haplotype-resolved assembly of complex phased variants provides insight into differential disease phenotypes. (A) Two affected siblings with USH2A-associated Usher syndrome were found to have 3 pathogenic variants in USH2A from clinical testing without phase information. Targeted long-read sequencing demonstrated that the unaffected father harbored 2 pathogenic variants in cis, whereas the unaffected mother harbored 1 pathogenic variant. Targeted long-read GS correctly identified the variant architecture from the probands alone with the 2 variants inherited from the father in trans to the variant inherited from the mother. Variant architecture can influence disease progression, as evidenced in ABCA4-associated Stargardt disease. (B) An adolescent subject with severe disease, as exhibited by chorioretinal atrophy with a large region of hypoautofluorescence in the central macula, had 3 pathogenic alleles, 2 of which were severe and were in a trans configuration. (C) In comparison, a middle-aged individual with mild disease, as exhibited by hyperautofluorescent flecks without atrophy in the central macula, also had 3 pathogenic alleles, but had 2 hypomorphic alleles in cis, both of which were in trans to a severe allele.

The allelic arrangement of disease variants has been shown to be integral in determining the prognosis of particular Mendelian diseases such as ABCA4-related Stargardt disease (26). This gene has well characterized hypomorphic variants that can lead to different disease phenotypes (27, 28). We present 2 cases in which familial DNA was not available to determine the allelic architecture and thus explain phenotypic differences in ABCA4-related Stargardt disease. In the first case (subject 13), TaLon-SeqMD revealed that a hypomorphic variant (c.3113C>T, p.Ala1038Val) was in cis with a severe disease-causing variant (c.1622T>C, p.Leu541Pro), both of which lie in trans to another severe variant (c.2041C>T, p.Arg681Ter). This individual exhibited early-onset vision loss in adolescence with severe clinical phenotype noted on retinal imaging (Figure 4B). In the second case (subject 14), TaLon-SeqMD revealed that 2 hypomorphic variants (c.5603A>T, p.Asn1868Ile and c.2588G>C, p.Gly863Ala) lie in cis and are in trans to a severe variant (c.5461-10T>C). Molecular studies have shown that when the hypomorphic variants c.5603A>T/p.Asn1868Ile and p.2588G>C/p.Gly863Ala lie in cis there is relatively normal protein expression and functionality (29). This is consistent with a later onset of disease, as exhibited in this individual who only had mild retinal findings and preserved visual functionality by her mid 30s (Figure 4C).

This demonstrates that in the absence of familial DNA, TaLon-SeqMD can provide precise variant-level insight from the allelic arrangement revealed from phased data sets. In certain cases, demonstrating that the VUS lies in trans to a known pathogenic variant is the final criterion needed to reassign the pathogenicity (30) and provide a full molecular diagnosis. We show this in an adolescent with early-onset ABCA4-related Stargardt disease (subject 15) who harbors the known pathogenic variant c.5461-10T>C noted in the previous individual (Figure 3C). TaLon-SeqMD demonstrated that the previously identified VUS from clinical sequencing, c.3413T>C, p.Leu1138Pro, was in trans and allowed reassignment to likely pathogenic (Supplemental Figure 4). Similarly, deducing the allele-level variant architecture in an individual (subject 16) with PDE6A-associated retinitis pigmentosa (RP) allowed reassignment of c.1646T>C, p.Leu549Pro to likely pathogenic (Supplemental Figure 5).

Phased-variant calls from TaLon-SeqMD provide rapid disease diagnostics in autosomal recessive cases of disease. A critical issue in Mendelian disease diagnostics is the turnaround time for clinical results, which can therefore impact treatment options. Furthermore, initial genetic testing results are often incomplete in autosomal recessive diseases since the chromosomal phase information is not available. Thus, secondary analysis must then be carried out, which extends the time for complete diagnosis, and which can only occur if familial DNA is available for analysis. We show that TaLon-SeqMD not only provides phased genomic data sets, but that it does so in a rapid timeline using a single MinION flow cell. We enrolled an individual who had just been clinically diagnosed with RP and had a history of congenital hearing loss, which was strongly suggestive of a syndromic disorder such as USH2. The individual (subject 17) was having a sample sent for clinical genetic testing for an IRD so we simultaneously obtained a sample to carry out TaLon-SeqMD. After sequencing and analysis, we found that the individual exhibited a known pathogenic splicing variant in the USH2A gene (c.12067-2A>G) along with a previously uncharacterized frameshift variant (c.3299dup, p.Glu1100GlufsTer8) in trans, which was predicted to be pathogenic (Figure 5, A–C). We carried out post hoc analysis of the data to identify when sufficient reads were present to phase the disease variants and found that within 12 hours of sequencing, the 2 variants residing over 526 kb apart had been identified and phased (Figure 5D). Clinical genetic test results were available after 7 weeks, confirming both variants in USH2A found from TaLon-SeqMD, but without the benefit of phase information.

Targeted panel genome sequencing allows for rapid identification of diseaseFigure 5

Targeted panel genome sequencing allows for rapid identification of disease-causing variants. (A) In an affected individual with no prior genetic testing, targeted long-read panel sequencing revealed 2 likely disease-causing variants in the USH2A gene, (B) an SNV noted to be pathogenic, and (C) a duplication leading to a frameshift and early termination that was in trans. (D) Post hoc analysis of the sequencing data revealed that the 2 variants in USH2A were identified and properly phased within 12 hours of sequencing, whereas the entire USH2A gene could be phased 30 hours after sequencing.

Comprehensive genomic profiling with TaLon-SeqMD can provide full molecular diagnosis in cases with missing heritability after clinical sequencing. We show that TaLon-SeqMD can be instrumental in monoallelic cases where only one pathogenic variant was identified after clinical exome-based sequencing. Not only can the second causative variant be identified with targeted long-read sequencing, but phased data can demonstrate that the second variant is in trans to the identified variant to further validate its role in disease. We show that phased data sets can provide complete molecular diagnosis in 2 autosomal recessive cases of USH2. In the first case a known pathogenic coding variant in USH2A had been identified with exome-based panel clinical sequencing. Targeted long-read sequencing of this individual (subject 18) identified a pathogenic noncoding deep intronic variant (c.141314-3169A>G) that leads to new pseudoexon activation leading to a premature termination codon (31). This variant was found to reside in trans to the previously identified coding variant (Figure 6, A and B), which provided a complete molecular diagnosis in this individual.

Targeted whole-genome long-read sequencing can detect complex SVs and deepFigure 6

Targeted whole-genome long-read sequencing can detect complex SVs and deep intronic variants to provide insight in cases of missing heritability. In 2 cases of individuals with Usher syndrome, a pathogenic coding SNV was found with initial clinical exome-based panel testing. (A) The complete USH2A locus was covered, which allowed examination of noncoding regions. (B) Closer view of a 13-kb region encompassing intron 64 to exon 68 shows the known coding variant in exon 68 (red arrow) and the noncoding variant (black arrow) lie in trans. (C) In the second case, we show the 242-kb region encompassing the known coding variant and the large structural deletion encompassing exons 42 and 43, with the 2 variants segregating in trans. (D) Closer examination of the coding SNV shows long-read data are able to segregate the variant on a single haplotype. (E) Long-read data of the SV are able to again show it segregates on a single chromosome, with precise breakpoint detection compared with short-read data.

Another cause of missing heritability in monoallelic cases can be attributed to structural variants (SVs). SVs account for a lower percentage of IRD cases than that of single nucleotide changes and small insertions and deletions (8). This may be because SVs cannot be detected as reliably using standard short-read sequencing approaches. Long-read sequencing is superior in SV detection (32) and can thus better aid in the diagnosis of SVs contributing to IRDs (33). Furthermore, the higher resolution of sequences with long reads allows for more accurate SV detection to determine the precise breakpoint locations (34). We present a case (subject 19) where a pathogenic coding variant in USH2A was identified, but initial clinical exome panel testing did not identify a second variant. Targeted long-read sequencing showed that in addition to the known pathogenic variant, there was a large deletion encompassing exons 42 and 43 that resided in trans (Figure 6C). We carried out short-read genome sequencing as well in this individual to compare the 2 methods in identifying the SV. When examining the coding variant in exon 66, both approaches identified it, but long-read sequencing was able to provide phase information (Figure 6D). When examining the short-read genome data in the region of the SV, there is clear copy number variation suggestive of a deletion, but since reads do not span this region, it is unclear where the precise genomic breakpoints are (Figure 6E). We utilized a deep learning–based SV tool (

Comments (0)

No login
gif