Construction and evaluation of a new rat reference genome assembly, GRCr8, from long reads and long-range scaffolding [RESOURCES]

Kai Li1, Melissa L. Smith2, J. Chris Blazier3, Kelli J. Kochan3, Jonathan M.D. Wood4, Kerstin Howe4, Anne E. Kwitek5, Melinda R. Dwinell5, Hao Chen6, Julia L. Ciosek1, Patrick Masterson7, Terence D. Murphy7, Theodore S. Kalbfleisch1 and Peter A. Doris8 1Gluck Equine Genomics Center, University of Kentucky, Lexington, Kentucky 40503, USA; 2Department of Biochemistry and Molecular Biology, University of Louisville School of Medicine, Louisville, Kentucky 40202, USA; 3Texas A&M Institute for Genome Sciences and Society, Texas A&M University, College Station, Texas 77843, USA; 4Tree of Life, Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, United Kingdom; 5Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA; 6Department of Pharmacology, University of Tennessee Health Sciences Center, Memphis, Tennessee 38163, USA; 7National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA; 8Center for Human Genetics, Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center, Houston, Texas 77030, USA Corresponding author: peter.a.dorisuth.tmc.edu Abstract

We report the construction and analysis of a new reference genome assembly for Rattus norvegicus, the laboratory rat, a widely used experimental animal model organism. The assembly has been adopted as the rat reference assembly by the Genome Reference Consortium and is named GRCr8. The assembly has employed 40× Pacific Biosciences (PacBio) HiFi sequencing coverage and scaffolding using optical mapping and Hi-C. We used genomic DNA from a male BN/NHsdMcwi (BN) rat of the same strain and from the same colony as the prior reference assembly, mRatBN7.2. The assembly is at chromosome level with 98.7% of the sequence assigned to chromosomes. All chromosomes have increased in size compared with the prior assembly and k-mer analysis indicates that the subject animal is fully inbred and that the genome is represented as a single haploid assembly. Notable increases are observed in Chromosomes 3, 11, and 12 in the prospective rDNA regions. In addition, Chr Y has increased threefold in size and is more consistent with the rat karyotype than previous assemblies. Several other chromosomes have grown by the incorporation of sizable discrete new blocks. These contain highly repetitive sequences and encode numerous previously unannotated genes. In addition, centromeric sequences are incorporated in most chromosomes. Genome annotation has been performed by NCBI RefSeq, which confirms improvement in assembly quality and adds more than 1100 new protein coding genes. PacBio Iso-Seq data have been acquired from multiple tissues of the subject animal and are released concurrently with the new assembly to aid further analyses.

Received March 7, 2024. Accepted September 10, 2024.

Comments (0)

No login
gif