Interval breast cancer rates for tomosynthesis vs mammography population screening: a systematic review and meta-analysis of prospective studies

In this systematic review and meta-analysis, we focus on prospective studies (including RCTs) reporting ICR at follow-up of participants screened with DBT vs DM to determine the effect on this key outcome in BC screening. Although earlier reviews have examined DBT screen detection [2, 4, 28], we address a critical evidence gap regarding ICR using high-level evidence (all prospective studies). Alongside DBT’s known effect of significantly increasing CDR [2, 4, 28], we report suggestive evidence that DBT may be associated with a reduction in ICR. Currently, many organised screening programmes conditionally recommend DBT instead of DM with very low certainty of evidence due to little or no effect on interval cancers. Our findings are timely and relevant to these screening practice recommendations [8].

The inclusion of both RCT and prospective non-randomised studies expanded the available sample size for evidence synthesis and enabled us to explore the potential for bias that may have been introduced in non-randomised studies. While we did not find DBT significantly reduced the ICR compared to DM, the estimate and CI provided an indication of a possible effect (pooled RD −2.92 per 10,000; 95% CI: −6.39 to 0.54)—this can be considered weak (suggestive) evidence of an effect. Although we did not find any evidence that non-randomised studies had bias for any outcome in our subgroup analyses, we did find evidence that studies that sourced cohorts from different periods or locations may have biased ICR estimates. Three studies sourced groups from different populations (this was done to allow comparison of ICR from independent groups): the OVVV study assigned screening modality based on the county of residence (DBT in Olso, DM in Vestfold and Vestre Viken), and the OTST and Trento pilot studies compared cohorts from different time periods (using historical cohorts). It has been well documented that BC incidence rates change across times and places in Europe [29, 30]. While incidence rates may be affected by many background factors, it is plausible that baseline ICR risk changes for groups sampled at different times and from different regions. A subgroup test revealed studies that sourced groups from the same period and region produced ICR outcomes that differed from studies that did not (p = 0.02). One explanation for the difference across subgroups is that sampling from different populations may lead to cohorts with different baseline cancer risks, which could bias within study estimates of ICR. Pooling the subgroup of studies that sampled groups from the same timeframe and region (n = 7), DBT produced a relative reduction in IC (RR: 0.72, 95% CI: 0.58–0.89, I2 = 6.6) with an absolute reduction of 5.50 IC per 10,000 screens (95% CI: −9.47 to −1.54, I2 = 29.7). Relative heterogeneity (as indicated by I2) reduced from 52% with all studies, to 29% when only including studies that sourced screening groups from the same population. This suggests studies that sourced cohorts from different timeframes and regions may have introduced substantial heterogeneity in estimates. It also indicates that DBT reduced the IC rate compared to DM when controlling for groups sampled from the same population. In interpreting these findings, it should be noted that the subgroup of studies that sampled from the same time and population (Fig. 2b), was an exploratory comparison and more evidence needs to accumulate to confirm this ICR finding.

Examining secondary outcomes confirms that DBT significantly increased CDR (pooled absolute RD 24.17/10,000 screens) vs DM, with consistent findings in subgroup analysis (by study design) and in sensitivity analysis. In contrast, there was no evidence of a difference in recall rate between DBT and DM in pooled analyses, and recall rates were heterogeneous across studies. These findings are consistent with previous reviews that have also found DBT has a higher CDR than DM, but mixed recall rates [2, 4, 27].

There is an increasing body of evidence on DBT’s greater CDR relative to DM particularly in biennial screening. This is in contrast to sparse information and inconclusive findings on the follow-up outcomes of DBT screened populations, specifically ICRs [11]. Such evidence is central to informing BC screening policy decisions. Interval BCs are diagnosed after a negative screen and have been found to share the prognostic features of clinically diagnosed cancers [9]. For these reasons, they are monitored as an indication of screening sensitivity and potential benefit [9, 31]. We have reported ICR as a surrogate measure for screening effect. In our main analysis, we return weak but inconclusive evidence that DBT reduces ICR relative to DM screening. However, in a post-hoc sensitivity analysis including only comparable populations, we report that DBT has a beneficial effect at follow-up by reducing ICR, meaning that DBT detected some of the cancers that would have clinically progressed within two years of screening, possibly contributing a mortality benefit. This suggests the increased CDR of DBT does not completely reflect over-detection [10]. While more evidence is needed to confirm any relative difference in ICR between screening methods, this review provides initial support for DBT in programmatic screening.

The review draws attention to possible bias introduced by sampling groups from different populations. While we cannot disentangle the true influence of all sources of bias, the difference in ICR results between trials that sourced groups from the same time and location and those that did not, presents suggestive evidence that such factors may influence ICR. Policymakers and practitioners may need to be mindful of such design features when using synthesised screening outcomes in decisions.

A limitation of our subgroup analyses (Fig. 2) is that subgroups were informed by the authors’ judgments about the most important potential sources of bias (whether randomisation was used; location or time differences across groups being compared), however, this approach is not exhaustive, and it is possible there are other sources of bias from known or unknown factors. Examples of possible sources of bias in specific trials are differences in reading strategies between arms in the Córdoba breast tomosynthesis screening trial (standard double-reading for DM and quadruple reading for DBT), and the 5% ‘opt-out’ from DBT in the BreastScreen Victoria pilot trial which meant that around 10% of women in the DM group chose not to receive DBT.

Previous studies have found that DBT may be more effective for dense breasts [32]. However, we did not attempt to analyse whether outcomes interact with breast density because only four of the eligible studies reported such data [18,19,20, 25]. Future reviews should consider using IPD meta-analysis or recovering additional aggregate information to enable the evaluation of whether breast density is an effect modifier of screening benefit. Another limitation of our review is the inability to directly evaluate overdiagnosis. Our findings indicate an additional 24 cancers are detected per 10,000 people using DBT, and that DBT may also produce approximately 5–3 fewer interval cancers per 10,000 screens during the biennial interval; if this is the case, a subset of the additional cancers detected by DBT did not progress to interval cancers within two years (so there may be some lead time effect) [10]. Longer-term follow-up data is needed to estimate overdiagnosis for DBT vs DM screening.

In conclusion, we present the initial but weak evidence that DBT is detecting additional clinically meaningful cancers. While this is a promising finding, future research is needed to quantify the longer-term benefits (beyond 2 years) and potential overdiagnosis.

Comments (0)

No login
gif