Reanalysis of genomic data in rare disease is highly effective in increasing diagnostic yields but remains limited by manual approaches. Automation and optimization for high specificity will be necessary to ensure scalability, adoption and sustainability of iterative reanalysis. We developed a publicly available automated tool, Talos, and validated its performance using data from 1,089 individuals with rare genetic disease. Trio-based analysis identified 86% of known in-scope diagnoses, returning one variant per case on average. Variant burden reduced to one variant per 200 cases on iterative monthly reanalysis cycles. Application to an unselected cohort of 4,735 undiagnosed individuals identified 248 diagnoses (5.2% yield): 73 (29%) due to new gene-disease relationships, 56 (23%) due to new variant-level evidence, and 119 (48%) due to improved filtering and analysis strategies. Our automated, iterative reanalysis model, applied to thousands of rare disease patients, demonstrates the feasibility of delivering frequent, systematic reanalysis at scale.
Competing Interest StatementKBH has received funding for unrelated work from UCB, Praxis Precision Medicines, and RogCon Biosciences Inc, acts as an investigator for a study with Encoded Therapeutics, has served on an advisory board for UCB Australia, and is a member of the Scientific and Medical Board for SCN2A Asia-Pacific. HLR, KDA and KES received research funding from Microsoft. AODL was a paid consultant for unrelated work for Tome Biosciences, Ono Pharma USA, and Addition Therapeutics. GS and JW are employees of Microsoft Corporation. DGM is a paid advisor to Insitro and GSK, and receives research funding from Google and Microsoft, unrelated to the work described in this manuscript. The other authors have no conflicts of interest to declare.
Funding StatementThe project A national large-scale automated reanalysis program was funded by the Australian Government Genomics Health Futures Mission (MRF2008820). The research conducted at the Murdoch Childrens Research Institute was supported by the Victorian Government Operational Infrastructure Support Program. J.Ch. is generously supported by The Royal Childrens Hospital Foundation as The Chair in Genomic Medicine. Analysis was supported by the Centre for Population Genomics (Garvan Institute of Medical Research and Murdoch Childrens Research Institute) and funded in part by a National Health and Medical Research Council investigator grant to DGM (2009982). KBH is supported by an MCRI Clinician-Scientist Fellowship and research funding from the MRFF and NHMRC. AJM is supported by a Queensland Health Advancing Clinical Research Fellowship. DRT is supported by research funding from the MRFF, NHMRC and Mito Foundation. Sequencing and prior analysis of the Australian Genomics rare disease cohorts were funded by the National Health and Medical Research Council (GNT1113531 and GNT2000001) and the Genomics Health Futures Mission (GHFM76747 and EPCD000028). Sequencing and prior analysis of the RGP cohort were provided by the Broad Institute Center for Mendelian Genomics (Broad CMG) and were funded by the National Human Genome Research Institute (NHGRI) grants UM1HG008900 (with additional support from the National Eye Institute, and the National Heart, Lung and Blood Institute), U01HG011755 (GREGoR consortium), R01HG009141, and in part by the Chan Zuckerberg Initiative Donor-Advised Fund at the Silicon Valley Community Foundation (funder DOI 10.13039/100014989) grants 2019-199278, 2020-224274, 2022-316726, 2022-309464 (https://doi.org/10.37921/236582yuakxy) and in part by a research grant from Illumina Inc.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
HREC/77735/RCHM-2021 (Royal Children's Hospital Melbourne, Australia)
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityClinically reported variants have been submitted to ClinVar under accession numbers SUB15325041. Sequence Compressed Reference-orientated Alignment Map (CRAM) files and metadata for the Rare Genomes Project is available via dbGaP under accession numbers phs003047 (GREGoR data set). Access is managed by a data access committee and is based on intended use of the requester and allowed use of the data submitter as defined by consent codes (some are health/medical/biomedical research [HMB] and some are general research use [GRU]). Access to data from other cohorts is restricted by the terms of their project-specific protocols. The authors will facilitate reasonable access requests.
Comments (0)