Breast cancer is the most common cancer in the world.1 Current guidelines recommend for estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 testing to be performed on all cases of invasive breast cancer because they are therapeutic predictors and prognostic markers in clinical practice.2–4 ER and PR testing are mostly performed using immunohistochemistry (IHC), in which nuclear staining of any intensity in ≥1% of tumor cells is regarded as positive.2 However, patients with ER-low-positive (1%-10%) breast cancer may derive limited benefit from endocrine therapy.2 Precise ER and PR IHC results are thus crucial for predicting how beneficial endocrine therapy will be for patients with breast cancer.
Staining and interpretation are key components of IHC. An optimized approach to immunohistochemical staining may be used to detect positive staining in a tissue that is known to express low levels of the evaluated marker. To monitor the performance of semiquantitative IHC, the control materials require components of high-positive, low-positive, and negative staining results. Ideal control materials are also easily obtainable, available in large quantities, and capable of producing staining patterns that are easy to interpret. Tumor tissue, such as breast carcinoma, has also been used as a control material for ER and PR staining, but its scarcity and heterogeneity limit its use in this regard. The uterine cervix is a high-positive nonneoplastic tissue commonly used as a control material. The American Society of Clinical Oncology (ASCO)/College of American Pathologists (CAP) guidelines suggest using tonsil tissue as a low-positive control for ER staining and as a negative control for PR staining.2 To improve and ensure the quality of ER IHC, the Committee of Breast Pathology of Taiwan Society of Pathology conducted the ER IHC proficiency test (PT) in 2022. In this study, we verified the suitability of tonsil tissue as a control material for monitoring ER staining. We hypothesized that optimal staining would reduce interlaboratory variations in IHC results.
2. METHODSThe PT activities were conducted in compliance with local ethical regulations and adhered to the ethical standards outlined in the World Association’s Declaration of Helsinki. The study protocol was deemed exempt from individual institutional review board approval.
2.1. Tissue microarrayFabrication of the tissue microarray (TMA) involved the collection of 2-mm-diameter tissue cores from 21 breast cancer specimens. These specimens were processed in accordance with ASCO/CAP guidelines for ER IHC to ensure optimal handling of the specimens and preanalytical factors. To serve as an external control, a piece of tonsil tissue with epithelium and germinal centers was included in the TMA, which was then sectioned at a thickness of 4 µm.
2.2. Procedures of the PTThe ER IHC PT is provided as an optional test for pathology institutions in Taiwan. Participants were provided with one unstained TMA slide, as described in the preceding section, and were required to perform ER IHC staining on the TMA and interpret the respective staining results independently. The staining protocol and results were submitted online, and the TMA slides were sent back to the PT office for quality review.
2.3. Data collectionData collection involved the collection of staining protocols used by each participant and their interpretation of individual tissue cores. The percentage of ER-positive tumor cells was reported in intervals of 0%, <1%, 1% to 10%, 11% to 50%, and >50%. The intensity of the staining was classified into no staining, weak, intermediate, and strong. The interpretations were categorized into negative, low-positive, and positive, per prevailing ASCO/CAP guidelines.2
Consensus for an individual core was established if ≥80% of the participants provided the same interpretation. Interpretations that reached consensus were regarded as reference results. Positive or low-positive responses for interpretations with negative consensus were indicative of overcalls, and negative or low-positive responses for interpretations with positive consensus were indicative of undercalls. To identify whether discrepancies were due to staining issues or interpretation errors, two pathologists from the Committee of Breast Pathology independently reviewed the slides with discordant answers to provide references for interpretation and possible causes of discrepancies.
2.4. Review of staining qualityThe quality of ER staining was reviewed by two pathologists of the Committee of Breast Pathology. The quality evaluation was based on the following staining pattern in tonsil tissue recommended by the 2020 ASCO/CAP guidelines:2 (1) the weak-to-moderate nuclear staining of dispersed germinal center cells and the squamous epithelium that are ER-positive, with each part given scores of 2 (sufficient), 1 (insufficient), or 0 (no stain) and (2) the ER-negative B cells in the mantle zone, given scores of 1 (negative) or 0 (positive). Suboptimal nuclear counterstains, either too faint to appreciate the morphology or too strong to mask the weak ER staining, resulted in a deduction of 0.5 points from the score. Additionally, the low-positive tonsil tissue served as a reference for interpreting the lower boundary for ER positivity, where germinal center cells should exhibit low-positive staining results, with 1% to 10% of the cells showing weak-to-moderate intensity of ER expression. If the ER staining showed stronger than a low-positive result, 0.5 points were subtracted from the score to account for overstaining. The final quality score ranged from 0 to 5 and was categorized as optimal (score = 5), good (score ≥4 and <5), borderline (score ≥2 and <4), and poor (score <2).
2.5. Statistical methodsThe Chi-square (χ²) test was used to compare categorical data, continuous variables were analyzed using the Kruskal-Wallis test, and interparticipant agreement was assessed using Fleiss’ kappa values—where values closer to 1 indicate greater agreement. Staining parameters with a p value of <0.1 were candidates for inclusion in multivariate logistic regression models of their influence on staining quality. The final model was obtained using backward elimination and included only significant parameters. Spearman rho was used to measure the correlation between staining quality and participant response. A two-tailed p value of <0.05 was considered significant.
3. RESULTS 3.1. Association between staining quality and staining parametersAmong the 74 participants in this study, 31.1%, 33.8%, 17.6%, and 17.6% had optimal, good, borderline, and poor staining quality, respectively (Table 1). Only two (3%) participants performed ER staining manually, and 72 (97%) used automated stainers. The antibody clones 6F11 (53%) and SP1 (43%) were commonly used. Poor staining quality was frequently observed in 33% (10/30) of the participants using Ventana autostainers; among these participants, 60% (6/10) used concentrated antibodies. Multivariate logistic regression models demonstrated that the use of Ventana autostainers and the use of concentrated antibodies were significantly associated with poor staining quality (Table 2).
Table 1 - Staining quality in terms of various staining parameters N Poor Borderline Good Optimal p Participant 74 13 (17.6%) 13 (17.6%) 25 (33.8%) 23 (31.1%) Platform 0.063 Autostainer-Leica 38 3 (7.9%) 11 (28.9%) 10 (26.3%) 14 (36.8%) Autostainer-Ventana 30 10 (33.3%) 2 (6.7%) 10 (33.3%) 8 (26.7%) Autostainer-Dako 3 0 (0%) 0 (0%) 2 (66.7%) 1 (33.3%) Manual 2 0 (0%) 0 (0%) 2 (100%) 0 (0%) Autostainer-Biocare 1 0 (0%) 0 (0%) 1 (100%) 0 (0%) Antigen retrieval 0.505 Autostainer 70 13 (18.6%) 13 (18.6%) 21 (30%) 23 (32.9%) Steamer 2 0 (0%) 0 (0%) 2 (100%) 0 (0%) Water bath 1 0 (0%) 0 (0%) 1 (100%) 0 (0%) Pressure cooker 1 0 (0%) 0 (0%) 1 (100%) 0 (0%) HIER buffer 0.315 High pH (>7.0) 63 13 (20.6%) 10 (15.9%) 20 (31.7%) 20 (31.7%) Low pH (<7.0) 11 0 (0%) 3 (27.3%) 5 (45.5%) 3 (27.3%) HIER time, min 0.226 ≤15 2 0 (0%) 1 (50%) 1 (50%) 0 (0%) 16-30 45 4 (8.9%) 10 (22.2%) 15 (33.3%) 16 (35.6%) 31-45 13 6 (46.2%) 1 (7.7%) 4 (30.8%) 2 (15.4%) 46-60 7 1 (14.3%) 1 (14.3%) 2 (28.6%) 3 (42.9%) >60 7 2 (28.6%) 0 (0%) 3 (42.9%) 2 (28.6%) Antibody clone 0.115 6F11 39 4 (10.3%) 11 (28.2%) 12 (30.8%) 12 (30.8%) SP1 32 9 (28.1%) 2 (6.3%) 10 (31.3%) 11 (34.4%) EP1 2 0 (0%) 0 (0%) 2 (100%) 0 (0%) ID5 1 0 (0%) 0 (0%) 1 (100%) 0 (0%) Antibody concentration 0.067 RTU 24 4 (16.7%) 2 (8.3%) 11 (45.8%) 7 (29.2%) >0.01 17 7 (41.2%) 1 (5.9%) 5 (29.4%) 4 (23.5%) 0.01 23 1 (4.3%) 7 (30.4%) 7 (30.4%) 8 (34.8%) <0.01 10 1 (10%) 3 (30%) 2 (20%) 4 (40%)HIER = heat-induced epitope retrieval; RTU = ready-to-use.
Consensus was reached in 18 cores (85.7%), among which 10 were positives and eight were negatives. Consensus was not reached in any of the cores with a low-positive result. In the 10 cores with positive consensus, >50% of tumor nuclei expressed ER; eight cores had strong intensities and two cores had intermediate intensities. The eight cores with negative consensus had no ER staining on tumor nuclei. The concordance rate between participant responses and the consensus was high (median 1, range from 0.82 to 1) (Table 3). The concordance rates were not significantly different across staining quality (Kruskal-Wallis test p = 0.541). Interparticipant agreement kappa coefficients for optimal, good, borderline, and poor staining quality were 0.842, 0.793, 0.713, and 0.753, respectively.
Table 3 - Concordance rate grouped by staining quality Total (n = 74) Poor (n = 13) Borderline (n = 13) Good (n = 25) Optimal (n = 23) p* Median (range) 1 (0.82-1) 1 (0.94-1) 1 (0.82-1) 1 (0.89-1) 1 (0.89-1) 0.541Table 4 lists 19 discordant participant responses, of which eight (42.1%) were overcalls and 11 (57.9%) were undercalls. In the central review, 63.2% (12/19) and 36.8% (7/19) of discordant responses could be attributed to staining problems and misinterpretation, respectively.
Table 4 - List of discordant interpretations Institute no. Core # Staining quality Participant response Consensus Possible reason of discordance Overcall 6 5 Good Low-positive Negative Overstaining 6 18 Good Low-positive Negative Overstaining 21 5 Optimal Low-positive Negative Misinterpretation 21 18 Optimal Low-positive Negative Misinterpretation 42 7 Borderline Positive Negative Misinterpretation 42 18 Borderline Positive Negative Misinterpretation 59 15 Optimal Low-positive Negative Misinterpretation 70 5 Good Low-positive Negative Overstaining Undercall 15 2 Poor Low-positive Positive Inadequate staining 15 19 Poor Negative Positive Inadequate staining 24 19 Poor Low-positive Positive Inadequate staining 32 19 Poor Low-positive Positive Inadequate staining 42 8 Borderline Negative Positive Misinterpretation 44 19 Poor Low-positive Positive Inadequate staining 47 19 Poor Low-positive Positive Inadequate staining 50 19 Borderline Low-positive Positive Inadequate staining 53 2 Borderline Low-positive Positive Inadequate staining 64 2 Good Negative Positive Misinterpretation 74 2 Poor Low-positive Positive Inadequate stainingInadequate staining accounted for 75% (9/12) of the staining problems and was the main cause (81.8%) of undercalls. Additionally, undercalls were clustered in core #19 (n = 6) and core #2 (n = 4); the staining intensity of these two cores was reported as weak or intermediate by more than 80% of the participants. Poor (70%) and borderline (20%) staining quality with insufficient staining explained most of the 10 undercalls clustered in core #19 and core #2 (Fig. 1).
Fig. 1:Tonsil control with optimal staining shows dispersed positive staining of the germinal center (arrowhead) and squamous epithelium (A), and the invasive tumor of concurrent core 19 is diffusely positive (B). Insufficient staining of the tonsil control (C) and invasive tumor (D) leads to undercall. High-power views are provided in the inserts on the right side.
By contrast, overstaining only represented 37.5% (3/8) of the overcalls, but misinterpretation was the cause of most overcalls (62.5%). Three overcalls with overstaining exhibited increased staining in the mantle zone of the tonsil controls (Fig. 2C). Three overcalls with optimal staining were caused by misinterpretations of nonspecific faint staining, which was weaker than the tonsil control, as positive staining (Fig. 2D).
Fig. 2:Compared with the tonsil control with optimal staining quality (A), the nonspecific background stain of the invasive tumor can be ignored (B). Please note that the high-power view in the insert on the right side shows no staining in the mantle zone in the upper part and a weak-positive nucleus in the germinal center in the lower area (A). However, for the case with an overstained tonsil control with faint nuclear staining (red arrowhead) in the mantle zone (C), staining of tumor nuclei weaker than that in the germinal center of the tonsil control can be misinterpreted as positive (D).
3.4. Analyses of cores without consensusOf the three cores (#16, #20, and #21) in which consensus was not reached, the evaluations of core #16 had discrepancies in the identification of invasive tumors and were thus eliminated from the analysis. The results for core #20 reflected the difficulty in interpreting ER staining, with 43.2%, 41.9%, and 14.9% of participants’ interpretations being negative, low-positive, and positive, respectively. However, higher quality staining was associated with a greater likelihood of a low-positive or positive response (Fig. 3). Poor staining quality was significantly associated with a negative response (Spearman rho = 0.507, p < 0.001). As illustrated in Fig. 4, core #21 had a similar association between staining quality and participant response (Spearman rho = 0.610, p < 0.001).
Fig. 3:Participant responses of core #20 grouped by staining quality.
Fig. 4:Participant responses of core #21 grouped by staining quality.
The central review of core #20 revealed low-positive results with weak intensity on the slides that had optimal staining quality and negative results on slides that had poor staining quality (Table 5). Core #21 showed positive results with intermediate intensity on the slides that had optimal and good staining quality, and 42.3% (11/26) of the slides with borderline or poor staining quality had either low-positive or negative results.
Table 5 - Central review of cores without consensus by staining quality Total (n = 74) Poor (n = 13) Borderline (n = 13) Good (n = 25) Optimal (n = 23) p Core #20 <0.001 Negative 30 (40.5%) 13 (100%) 9 (69.2%) 8 (32%) 0 (0%) Low-positive 40 (54.1%) 0 (0%) 4 (30.8%) 13 (52%) 23 (100%) Positive 4 (5.4%) 0 (0%) 0 (0%) 4 (16%) 0 (0%) Core #21 <0.001 Negative 4 (5.4%) 4 (30.8%) 0 (0%) 0 (0%) 0 (0%) Low-positive 7 (9.5%) 4 (30.8%) 3 (23.1%) 0 (0%) 0 (0%) Positive 63 (85.1%) 5 (38.5%) 10 (76.9%) 25 (100%) 23 (100%)In the PT conducted in Taiwan, 64.9% of ER IHC stains exhibited optimal or good staining qualities, which indicates room for improvement. The use of Ventana autostainers and the use of concentrated antibodies were strongly associated with poor staining quality. Although the concordance rate did not significantly differ between levels of staining quality, interparticipant agreement decreased as staining quality declined. Of the 19 discordant participant responses, 63.2% could be attributed to staining problems, and 36.8% could be attributed to misinterpretation. Poor staining quality due to inadequate staining was the main reason for the undercalls, whereas misinterpretation was the cause of most overcalls. Of the cores for which consensus was absent, the low-positive cores were negative on slides with poor staining quality. False-negative results due to poor staining quality can have a significant impact on the diagnostic process. Although the use of tonsil tissue as an external control for ER IHC is well-established, staining problems and misinterpretations can still occur. These problems are difficult to identify in daily practice due to the scarcity of critical cases that are either ER-low-positive or close to the threshold values of 1% and 10%. Therefore, laboratories should strive to improve the quality of ER staining and interpretation to ensure accurate diagnosis.
Our results demonstrated that the use of concentrated antibodies on Ventana autostainers was strongly associated with poor staining quality. However, the 2022 NordiQC reported that 70.4% (19/27) of its participants using concentrated antibodies on Ventana autostainers and 45.6% (26/57) of those using non-Ventana autostainers achieved optimal results.5,6 However, optimal results were achieved using ready-to-use antibodies in 56.1% (247/440) of participants where Ventana autostainers were used and 65.5% (135/206) of participants where non-Ventana autostainers were used. This highlights the importance of using appropriate concentrations of antibodies in IHC tests to ensure accurate and reliable results. To ensure optimal staining quality, sound validation procedures should be applied, antibody concentrations should be adjusted accordingly, and the proper staining protocols should be used.
Staining problems accounted for 63.2% of discordant participant responses in this study. However, concordance rates did not significantly differ between levels of staining quality. This may be because ER expression was evaluated as being either strongly positive (10/18) or completely negative (8/18). Even poor staining quality caused by inadequate staining could result in weaker positive or identical negative staining and did not always lead to misclassification. Only two cores (2/21, 10%) were considered to be challenging to evaluate due to either low-positive ER expression or them being close to the threshold values of 1% and 10%. However, these challenging cores were excluded, because the consensus (≥80% of participants giving the same interpretation) was not reached.
Observations of clinical daily practice have indicated results similar to those of this study. Low-positive ER status can be challenging to interpret, but the proportion of low-positive cases is relatively small; the prevalence has been reported to be 2% to 7%.7–9 Of the 1323 consecutive cases of invasive carcinoma from 2021 to 2022 in our department (Table 6), cases in which ER was strongly positive (Allred score 6-8) or completely negative (Allred score 0) accounted for 72.3% and 21.2% of all cases, respectively. ER-low-positive cases (1%-10%) and those close to the threshold values of 1% and 10% (>0%-33%) made up only 1.9% and 6.0% of all cases, respectively.
Table 6 - Result of ER IHC results of consecutive cases of invasive carcinoma from 2021 to 2022 ER IHC n (%) Positive cells, % 0 281 (21.2) <1 35 (2.6) 1-10 25 (1.9) 11-33 19 (1.4) 34-66 42 (3.2) ≥67 921 (69.6) Allred score 0 281 (21.2) 2 35 (2.6) 3-5 51 (3.9) 6-8 956 (72.3)ER IHC = estrogen receptor immunohistochemistry.
Patients with ER-low-positive results are considered eligible for endocrine treatment, but data on the overall benefit of this treatment remain limited.2 Additionally, a low concordance among pathologists might lead to inconsistent ER reports. The ASCO/CAP 2020 Guidelines recommended that laboratories should establish and follow a standard operating procedure to confirm or adjudicate ER results for cases with weak stain intensity or ≤10% of cells staining.2
This study found a clear correlation between staining quality and misinterpretation among challenging cases through analyzing the results of the PT and central review of staining quality. As demonstrated by our study
Comments (0)