UCF-MultiOrgan-Path: A Public Benchmark Dataset of Histopathologic Images for Deep Learning Model Based Organ Classification

ABSTRACT

A pathologist makes a diagnosis using a light microscope on glass slides containing tissue samples. The entire tissue specimen can be stored as a Whole Slide Image (WSI) for further analysis. However, managing and manually diagnosing hundreds of images is time-consuming and requires specific expertise. As a result, there is extensive ongoing research for computer-aided diagnosis of these digitally acquired pathology images. Deep learning has gained significant attention for its effectiveness for disease classification and segmentation of cancer cells in histopathologic images. Building a robust and accurate model for deep learning requires a large number of annotated images. However, it is challenging to find a sufficient number of annotated public images to validate or construct a new pre-trained model based on pathology images due to the labor-intensive and time-consuming nature of annotation, the need for expert knowledge, and privacy concerns surrounding medical data. Current public datasets are often limited to specific organs, types of cancer, or binary classification tasks, which hinders their ability to generalize across diverse pathology applications. This lack of diversity makes it challenging to develop models that can perform well on a wide range of diseases, organs, or multiclass classification problems, limiting their use in broader real-world diagnostic scenarios. To combat this limitation, we are introducing UCF multi-organ histopathologic (UCF-MultiOrgan-Path) dataset where 977 WSIs are available from cadavers containing tissues of multiple organs such as the lung, kidney, liver, pancreas, etc. We constructed the WSI dataset filtering from ∼1700 WSIs with 15 distinct organ classes and ∼2.38 million patches with a size of 512×512 pixels. For technical validation, we provide two approaches: a patch-based approach for patch and slide-level classification and a slide-based approach using multiple instance learning (MIL) for slide-level classification. Our dataset can be used as a benchmark dataset for training and validating deep learning models, especially organ classification models, which contain a large number of WSIs with millions of extracted patches representative of diverse organ classes.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The University of Central Florida Institutional Review Board issued a non-human subject determination for this study.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Comments (0)

No login
gif