Characterizing tandem repeat complexities across long-read sequencing platforms with TREAT and otter [METHODS]

Niccoló Tesi1,2,3,6, Alex Salazar1,6, Yaran Zhang1, Sven van der Lee1,2, Marc Hulsman1,2,3, Lydian Knoop1, Sanduni Wijesekera1, Jana Krizova1, Anne-Fleur Schneider1, Maartje Pennings4, Kristel Sleegers5, Erik-Jan Kamsteeg4, Marcel Reinders3 and Henne Holstege1,2,3 1Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands; 2Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands; 3Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands; 4Department of Genome Diagnostics, Radboud University Medical Center, 6525GA Nijmegen, The Netherlands; 5Complex Genetics of Alzheimer's Disease Group, Antwerp Center for Molecular Neurology, VIB, Antwerp B-2650, Belgium

6 These authors contributed equally to this work.

Corresponding authors: n.tesiamsterdamumc.nl, a.n.salazaramsterdamumc.nl, h.holstegeamsterdamumc.nl Abstract

Tandem repeats (TRs) play important roles in genomic variation and disease risk in humans. Long-read sequencing allows for the accurate characterization of TRs; however, the underlying bioinformatics perspectives remain challenging. We present otter and TREAT: otter is a fast targeted local assembler, cross-compatible across different sequencing platforms. It is integrated in TREAT, an end-to-end workflow for TR characterization, visualization, and analysis across multiple genomes. In a comparison with existing tools based on long-read sequencing data from both Oxford Nanopore Technology (ONT, Simplex and Duplex) and Pacific Bioscience (PacBio, Sequel II and Revio), otter and TREAT achieve state-of-the-art genotyping and motif characterization accuracy. Applied to clinically relevant TRs, TREAT/otter significantly identify individuals with pathogenic TR expansions. When applied to a case-control setting, we replicate previously reported associations of TRs with Alzheimer's disease, including those near or within APOC1 (P = 2.63 × 10−9), SPI1 (P = 6.5 × 10−3), and ABCA7 (P = 0.04) genes. Finally, we use TREAT/otter to systematically evaluate potential biases when genotyping TRs using diverse ONT and PacBio long-read sequencing data sets. We show that, in rare cases (0.06%), long-read sequencing from coverage drops in TRs, including the disease-associated TRs in ABCA7 and RFC1 genes. Such coverage drops can lead to TR misgenotyping, hampering the accurate characterization of TR alleles. Taken together, our tools can accurately genotype TRs across different sequencing technologies and with minimal requirements, allowing end-to-end analysis and comparisons of TRs in human genomes, with broad applications in research and clinical fields.

Footnotes

[Supplemental material is available for this article.]

Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279351.124.

Freely available online through the Genome Research Open Access option.

Received March 15, 2024. Accepted October 3, 2024.

Comments (0)

No login
gif