Domain specific models outperform large vision language models on cytomorphology tasks

Abstract

Large vision-language models (LVLMs) show impressive capabilities in image understanding across domains. However, their suitability for high-risk medical diagnostics remains unclear. We systematically evaluate four state-of-the-art LVLMs and three domain-specific models on key cytomorphological benchmarks: peripheral blood cell classification, morphology assessment, bone marrow cell classification, and cervical smear malignancy detection. Performance is assessed under zero-shot, few-shot, and fine-tuned conditions. LVLMs underperform significantly: the best LVLM achieves a zero-shot F1 score of 0.057 ± 0.008 for malignancy detection—near random (0.039)—and only 0.15 ± 0.01 in few-shot. In contrast, domain-specific models reach up to 0.83 in accuracy. Even after fine-tuning, a dedicated hematology model outperforms GPT-4o. While LVLMs offer explainability via text, we find the visual-language grounding unreliable, and the morphological features mention by the model often do not match the single cell properties. Our findings suggest that LVLMs require substantial improvements before use in high-stakes diagnostic settings.

Key findings

LVLMs perform poorly on cytomorphology tasks, often near chance level and far below domain-specific models.

Even after fine-tuning, LVLMs lag behind domain-specific models.

While LVLMs provide textual justifications, these often reflect generic descriptions rather than image-specific morphological features.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

All data used in this study is publicly available: HiCervix comprises 40,229 cervical cells from 4,496 whole slide images, categorized into 29 classes. The HiCervix dataset includes normal epithelial cells, infectious agents, and malignant cells. https://zenodo.org/records/11081816 WBCAtt: contains morphology annotations for 10,300 images from the Acevedo data set31. Labels are provided for 11 fine-grained morphological attributes like nucleus shape, chromatin density, granularity, or cytoplasm texture. https://github.com/apple2373/wbcatt/tree/main/submission Acevedo: provide 17,092 white blood cell images from peripheral blood smears, labeled with 11 different cell type annotations. https://data.mendeley.com/datasets/snkd93bnjr/1 BMC: is a collection of 171,373 white blood cell images from bone marrow smears collected from 945 patients. The cells were expert-labeled into 21 different cell types. https://www.cancerimagingarchive.net/collection/bone-marrow-cytomorphology_mll_helmholtz_fraunhofer/ MLL23: was used only as an external test set for fine-tuned models. It includes over 40,000 expert annotated peripheral blood single cell images categorized into 18 classes. https://zenodo.org/records/14277609

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Comments (0)

No login
gif