Purpose Ocular screening in primary care is increasingly recognized as a valuable opportunity to detect a wide range of eye conditions at an early stage, especially among individuals with systemic risk factors. The Primary Care Screening dataset is a large-scale, real-world collection of de-identified color fundus photographs (CFP) acquired during routine diabetic retinopathy (DR) screening visits in primary care clinics across the United States, and is available through the Institute for Digital Health (IDHea)—a secure research platform established by Topcon Healthcare, Inc.
Methods The dataset includes CFP from individuals who participated in eye screening at 643 clinical sites in the United States. The majority of images were obtained using the TRC-NW400 (Topcon Corp., Tokyo, Japan), although other imaging devices were also used. Each image was graded by an eye care specialist using the International Clinical Diabetic Retinopathy (ICDR) grading system. Graders also recorded image quality and a wide range of other retinal findings. The dataset also includes patient-level demographics including age, sex, and 3-digit ZIP code, image metadata, along with model-generated annotations such as AutoMorph image quality, vascular metrics, and retinal pigment scores.
Results As of March 2025, the dataset includes 427,182 CFP from 161,705 subjects, representing 372,528 eyes and 186,264 individual visits. Image quality was graded as Excellent or Good (218,139 eyes), Fair (109,127 eyes), or Unreadable (37,491 eyes). Most eyes (n = 293,391) had no diabetic retinopathy present (NDRP), while 15,514 had mild non-proliferative DR (NPDR), 10,856 moderate NPDR, 1,140 severe NPDR, and 1,644 proliferative DR. Diabetic macular edema was present in 4–34% of NPDR categories and 18% of proliferative DR cases. Additional findings included drusen or pigmentary changes (n = 9,649), glaucoma suspect (n = 4,257), and macular degeneration (n = 3,581).
Conclusion The Primary Care Screening dataset represents one of the largest real-world collections of retinal images acquired in primary care settings. Its size, diversity, and detailed expert grading make it a valuable resource for research into automated screening, ocular disease prevalence, and AI model development. An independent Data Access and Governance committee oversees research applications to ensure responsible use. The dataset will be made available via the secure IDHea research platform. More information is available at IDHea.net.
Competing Interest StatementR.C., A.G., J.U., J.B., J.L., J.C., M.K.D. are employees of Topcon Healthcare, Inc. A.P.K. has acted as a paid consultant or lecturer to Abbvie, Aerie, Allergan, Google Health, Heidelberg Engineering, Novartis, Reichert, Santen, Thea and Topcon. P.A.K. has acted as a consultant for Retina Consultants of America, Roche, Boehringer-Ingleheim, and Bitfount and is an equity owner in Big Picture Medical. He has received speaker fees from Zeiss, Thea, Apellis, and Roche. He has received travel support from Bayer and Roche. He has attended advisory boards for Topcon, Bayer, Boehringer-Ingleheim, and Roche. M.P.L. is an employee of Microsoft, and has acted as a consultant for Topcon Healthcare, Inc.
Funding StatementIDHea is funded by Topcon Healthcare, Inc. A.P.K. is supported by a UK Research and Innovation Future Leaders Fellowship (MR/Y033930/1), an Alcon Research Institute Young Investigator Award and a Lister Institute for Preventive Medicine Award. P.A.K. is supported by a UK Research & Innovation Future Leaders Fellowship (MR/T019050/1), Moorfields Eye Charity with The Rubin Foundation Charitable Trust (GR001753), and an Alcon Research Institute Senior Investigator Award.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Data sharing agreements were established with each participating primary care site, permitting the use of de-identified data for research purposes. In instances where a site lacked the technical capabilities to de-identify data independently, a Health Insurance Portability and Accountability Act (HIPAA) Business Associate Agreement (BAA) was executed to facilitate the secure transfer of identifiable information solely for de-identification by an authorized technology team. Only after the data had been de-identified, in accordance with the HIPAA Privacy Rule's Safe Harbor method (45 C.F.R. 164.514(b)(2)), were datasets made accessible to the study team for research use. Under HIPAA, data de-identified per the Safe Harbor standard are no longer considered protected health information (PHI) and may be used or disclosed for research purposes without individual authorization or Institutional Review Board (IRB) approval. Furthermore, according to the U.S. Department of Health and Human Services Office for Human Research Protections (OHRP), research involving only de-identified data does not constitute human subjects research as defined under 45 C.F.R. 46.102(e) and thus does not require IRB review. Given that our study utilized only de-identified data, with no access to identifiable private information or biospecimens, it does not involve human subjects as per the regulatory definition. Consequently, IRB review or waiver was not required for this research.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Comments (0)