Datasheet for the IDHea Primary Eye Care Dataset: A Real-World Ocular Imaging Resource for Research

Purpose Real-world ocular imaging datasets are essential for advancing research in artificial intelligence (AI), autonomous disease screening, and clinical decision support. The Primary Eye Care dataset is a large-scale collection of de-identified retinal imaging data from routine optometric care, made available through the Institute for Digital Health (IDHea)—a secure research platform established by Topcon Healthcare, Inc. This dataset provides an opportunity to study eye health in a community setting and will be available via this cloud-based platform.

Methods Data were collected and de-identified from individuals who underwent imaging as part of their routine care across 40 optometry practices in the United States and one practice in Australia. The dataset includes three-dimensional optical coherence tomography (OCT), and color fundus photographs acquired using Maestro devices (Topcon Corp., Tokyo, Japan), along with demographic data including age and sex. Imaging data were converted to DICOM format, and OCT analysis metrics such as retinal layer thicknesses were derived. Additional labels including image quality, vessel metrics, and retinal pigment score were generated using open-source AI models.

Results TThe dataset comprises 873,291 image acquisitions from 276,061 subjects with a mean age of 43.8 years (standard deviation = 19.5). 48.7% were female, 36.2% as male, and 15.1% not reported. Most OCT scans followed the 12 × 9 mm 3D Wide protocol (86.3%), with additional 3D Macula, 3D Disc, anterior segment, radial, and line scans. 59,049 subjects (21.4%) had two or more scans separated by ≥365 days. Pre-processed metrics and AI-derived labels, such as TopQ image quality scores, glaucoma risk score, and AutoMorph features are included. 89.4% of OCT scans scored above 25 on the TopQ scale, indicating reliable image quality. A propensity score-matched test subset (∽10%) was held out to enable consistent benchmarking across studies.

Conclusion The Primary Eye Care Dataset provides a large-scale, real-world collection of ocular imaging data, reflecting a largely healthy, community-based population attending routine optometric visits. This makes it particularly valuable for developing AI models aimed at early detection and prevention at the population level, where most eyes are healthy, and disease prevalence is low. Data access is governed by an independent committee to ensure ethical and responsible use; more information is available at IDHea.net

Competing Interest Statement

R.C., A.G., Y.H., C.L., J.J., J.U., N.P., J.C., M.K.D. are employees of Topcon Healthcare, Inc. A.P.K. has acted as a paid consultant or lecturer to Abbvie, Aerie, Allergan, Google Health, Heidelberg Engineering, Novartis, Reichert, Santen, Thea and Topcon. P.A.K. has acted as a consultant for Retina Consultants of America, Roche, Boehringer-Ingleheim, and Bitfount and is an equity owner in Big Picture Medical. He has received speaker fees from Zeiss, Thea, Apellis, and Roche. He has received travel support from Bayer and Roche. He has attended advisory boards for Topcon, Bayer, Boehringer-Ingleheim, and Roche. M.P.L. is an employee of Microsoft, and has acted as a consultant for Topcon Healthcare, Inc. T.B. has acted as a consultant for Topcon Healthcare, Inc.

Funding Statement

IDHea is funded by Topcon Healthcare, Inc. A.P.K. is supported by a UK Research & Innovation Future Leaders Fellowship (MR/Y033930/1), an Alcon Research Institute Young Investigator Award and a Lister Institute for Preventive Medicine Award. P.A.K. is supported by a UK Research & Innovation Future Leaders Fellowship (MR/T019050/1), Moorfields Eye Charity with The Rubin Foundation Charitable Trust (GR001753), and an Alcon Research Institute Senior Investigator Award.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

IRB approval was obtained from the Advarra Institutional Review Board (IRB) (Columbia, MD, USA) for the retrospective collection of de-identified data in the US (IRB # CR00611155). IRB approval was also obtained for the Australia dataset from the Bellberry Human Research Ethics Committee (Protocol # 2022-04-345-FR-1).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Comments (0)

No login
gif