The research took place in the well-established cervical cancer screening program nested inside public sector primary care health facilities in Lusaka [16, 17]. Women in this analysis were recruited under informed consent from clinics which primarily offer screening and treatment with thermal ablation or large loop excision of the transformation zone (LLETZ) by trained nurses. The standard of care screening was conducted by experienced nurses who examined women with visual inspection with acetic acid (VIA)-based screening aided with high-quality digital cameras; the magnified illuminated images permits both inspection of the surface morphology of the cervix and facilitates expert telemedicine quality assurance. Emphasizing sensitive criteria to avoid missing precancer/cancer, ~ 25% of women screen positive, reflecting partly the high HIV prevalence [17, 18]. As a research ‘add-on’ for this study, the nurses also took an additional triplicate set of images using a Samsung Galaxy J8™ smartphone camera. They also collected a cervical swab that was sent for subsequent HPV testing using the BD Onclarity™ assay system installed in Lusaka at a major referral hospital.
Expert gynecologic pathology review was available. Cases of cervical precancer/cancer were defined clinically as women having histologic CIN2, CIN3, or cancer. Glandular neoplasia was uncommon and grouped with the corresponding severity of squamous diagnoses (AIS with CIN3, ADC with SCC). Controls were women with completely visible squamocolumnar junctions (Type 1 or 2 Transformation Zones) whose visual screen was judged to be normal and not requiring referral for biopsy, combined with those that were referred but had histologic findings < CIN2.
HPV testingThe results of the HPV testing performed in Lusaka were obtained by Onclarity batch testing for research purposes only, and unconnected to clinical management. Onclarity provides HPV typing that can approximate the type groups ranked in order of carcinogenicity [19]. Specifically, the assay yields results individually for HPV 16, 18, 31, 45, 51, and 52, but combines 33/58, 56/59/66, and 35/39/68. For the purposes of this research the results were further grouped based on established risk of cancer in a hierarchical classification as HPV16, else HPV18/45, else HPV 31/33/52/58, else HPV 35/51/56/59/66/68. Of note, the inclusion of HPV 35 in the lowest risk group is now known to be an error (among individuals of African heritage, it properly belongs with the other HPV 16-related types in the HPV 31 group) [20], and the incorrect inclusion of HPV 66 as carcinogenic is another acknowledged limitation of this assay [21], leading to some false positives.
Automated visual evaluation (AVE) algorithmThe AVE algorithm was pre-trained on the NCI cervical image bank that contains more than 150,000 images taken with Cerviscopes (35 mm film images called Cervigrams, subsequently digitized) or DSLR camera images taken by beam splitting of Zeiss colposcope images [13]. The reader is referred elsewhere for detailed description of the logic, training, and initial validation of this deep-learning algorithm [12,13,14,15]. As noted above, the algorithm yields an ordered three-level classification of severity ("likely precancer/cancer", "indeterminate", or "normal" appearance). Its performance has been validated on internal "hold-back" test sets but, prior to this presentation, had not yet been validated in combination with HPV genotyping on an external dataset using a different image device in a distinct screening population.
Treatment and histologic diagnosesAn important aspect of the Zambian screening program is expert treatment of screen-positive women [16]. If visually assessed lesions meet the WHO criteria for ablation by cryotherapy or thermal ablation of the transformation zone, that treatment is performed [22]. For the purposes of this research, women underwent biopsy prior to ablation to detail underlying pathology. If more extensive treatment was needed, either LLETZ was performed or punch biopsies were taken to exclude invasion as guided by clinical assessment or/and expert review of digital cervigrams.
The case and control histologic diagnoses in this study were based therefore on punch biopsies or LLETZ specimens, evaluated by an expert pathologist. As stated above, women that screened negative were also included as controls despite having no biopsy (as were those with negative digital cervicography/biopsy) given the very sensitive threshold for VIA positivity, high rates of referral, and the substantial expertise of the examining nurses.
Data analysisThe population diagram for the study is shown in Fig. 2. The associations of HPV type group and AVE classification with histologic outcome were visualized for all women having all three variables (Additional file 1: Fig. S1).
Fig. 2Consort diagram of Zambia dataset
The data analysis included the following: First, we tested transfer learning of the candidate AVE algorithm for immediate use without modification on the J8 images. The J8 image type was a kind not previously included in AVE training. We postulated that portability might require retraining of the AVE algorithm to permit familiarity with the new image type. A small subset of images from women in the Zambian screening clinic was used for retraining, and contained 80 individuals with each classification (precancer/cancer, indeterminate, normal). The retraining images were added incrementally (20, then 40, then 60, then 80) to the NCI core collection to consider incrementally how many of the previously unfamiliar kind of images were needed to transfer the algorithm successfully; 40 was the chosen number achieving reasonable performance (Additional file 1: Table S1).
Once AVE was trained to analyze the Zambian J8 images, we assessed repeatability of the AVE results obtained from the three replicate images of the same individual captured by the J8 smartphone camera. Repeatability was assessed as an ordinal 3 × 3 table since the output was three ordinal classes of increasing severity: normal, indeterminate (HPV-positive patients with some equivocal/borderline/look-alike cervical changes), and precancer/cancer. In assessing reproducibility, the percent of the individuals that were extremely misclassified on replicates was of special interest (i.e., normal images classified as precancer/cancer, or vice versa).
Accuracy of a test is typically judged to be the correct identification of cases and non-cases, generally assessed in a 2 × 2 table by sensitivity, specificity, and their tradeoff (area under the receiver operating curve, or AUC). However, in this version of AVE, a large "gray zone" of indeterminate results was established between precancer/cancer and normal. Thus, the analysis assessed a 3 × 3 matrix (which shows the three diagnostic truth classes as one dimension and three-level test classification as the other). The worst inaccuracies, i.e., the percent of extreme errors, were again of special interest (precancer/cancer called normal, or vice versa).
Comments (0)