How to evaluate perfusion imaging in post-treatment glioma: a comparison of three different analysis methods

This study illustrates that at its current state the usage of VOI based DSC PW-MRI analysis compared to a ‘hot spot’ or purely visual approach results in a slightly higher diagnostic accuracy. It also elucidates the subjectivity of the ‘hot spot’ and visual approach, which is mostly circumvented by the usage of VOI based analyses.

When used to distinguish TP from TRA in this study, a VOI cut-off value of rCBVmean of 1.11 mL/100 g resulted in a sensitivity and specificity value of 72% and 76%, respectively (AUROC = 0.82). In comparison, the highest scoring observer using the visual assessment methodology, had a sensitivity and specificity of 41% and 91% respectively. The sensitivity and specificity achieved with ‘hot spot’ assessment of DSC perfusion data was 67% and 70% respectively (AUROC = 0.69). While the achieved sensitivity and specificity in this study are mediocre, it should be noted that in clinical reality a treatment decision would be made after a follow-up period and with availability of predating scans, in that setting a higher sensitivity and specificity is to be expected. This study is meant to analyse the difference in predictive accuracy of these different techniques based on a single moment analysis.

On a group level, the rCBVmax of a lesion showed a moderate correlation with the volume of the lesion (p < 0.001; r = 0.64), while the mean rCBV value does not seem to be influenced by the volume (p = 0.24; r = 0.17). The nature of this correlation on an individual level has not been tested in this study, so the assumption that a large lesion will have a higher rCBVmax and is thus more likely to be TP should not be made without consulting other means of differentiating TP and TRA. The differences found in rCBVmax,rCBVmean and median volumes between the TP and TRA group in this study match the hypothesis that perfusion is different between the two mentioned states. However, as the ages between the groups differ significantly (t(48) = 2.09, p = 0.04), with TRA having a higher mean age, this comparison should be used with caution as the age difference could also potentially explain the found differences.

DSC PW-MRI in the radiological follow-up of post-treatment glioma lesions is widely used to distinguish TRA from TP. Three recent reviews on this topic provided a sensitivity ranging from 83–93% and a specificity ranging from 75–88%, indicating good diagnostic accuracy with regard to distinguishing TRA from TP [8,9,10]. However, the included studies in these three meta-analyses mainly used the ‘hot spot’ technique. Next to that these studies differed from ours, as some of the studies they base their results on differentiated between pseudoprogression and radiation necrosis, while we used TRA as a term to cover both. Included studies also often used predating or follow up scans in their analysis and not single moment data. In this study the single-moment ‘hot spot’ methodology was performed by three experienced neuro-radiologists and resulted in an AUROC (0.69) which was significantly lower than the AUROC of the ROC graph based on the rCBVmean from the VOI study (0.82), as tested with the DeLong’s test (Z = -2.26, p = 0.023). In the ‘hot spot’ assessment, the ICC showed poor reliability in the placement of references (ICC = 0.54), despite using a clearly communicated reference placement in the contralateral centrum semiovale that should have had the least inter-observer variability [17]. This might be explained by variation in the chosen slice in which the readers chose to place their reference, however this level of subjectivity with an agreed placement of reference underlines the weakness of manual placement. The placement of the ‘hot spot’ itself resulted in an excellent reliability (ICC = 0.89) between the observers, indicating that the manually placed ‘hot spot’ does provide similar values between researchers. Besides that, the ‘hot spot’ placement also showed moderate reliability when compared to a complete VOI analysis (ICC = 0.72), indicating that if done correctly the ‘hot spot’ placement does match the rCBV found in a complete VOI analysis.

Additionally, a single moment visual assessment of the same DSC perfusion data was also included in this study, showing large differences between the achieved sensitivity and specificity per neuro-radiologist, ranging from 41–86% and 33–91% respectively. The found agreement between the neuro-radiologists or residents differed from poor (κ = -0.98, κ = -0.72) to substantial (κ = 0.65), indicating that there was a high degree of subjectivity. While this assessment deviated from clinical practice due to a lack of available follow-up or predating images, the found sensitivity–specificity ratios found per radiologist indicate that in order to get a high sensitivity or specificity they greatly sacrifice the other. In this diagnostic dilemma a high specificity is preferred, as you want to correctly diagnose TP to allow for earlier treatment, but the sacrificed sensitivity means that TP is often missed if specificity is prioritized.

A comparison between the inter-operator agreement of the ‘hot spot’ analysis and the agreement between the readers in the visual assessment illustrates that the visual approach is most susceptible to subjectivity. In the visual assessment the differences between readers were very large, while in the ‘hot spot’ approach the placement of the ROI was reliable between all readers. Only the reference placement showed variability. We feel the disagreement in the visual assessment reflects the clinical tendencies of radiologists as the expert readers were not instructed to act more defensively or otherwise. In this case a defensive choice would be to opt for the ‘worst-case’ scenario, being TP. Reader Z chose to utilise a more defensive approach than the others, resulting in more frequent TP diagnoses (predictions reader Z: 39 TP, reader X: 14 TP and reader Y: 18 TP), thus causing the found disagreement. This disagreement underlines the subjectivity of visual assessment even between trained radiologists.

All in all, this study confirms that a single-moment exclusively visual assessment of post-operative glioma DSC PW-MRI data is vulnerable to inter-observer variability. This finding and the low ICC found in the reference placement of the ‘hot spot’ analysis matches the high inter-observer variability found in Kouwenberg et al. [27] and Smits et al. [16]. Both studies describe that the placement of ‘hot spots’ and references in post-treatment glioma DSC PW-MRI data shows low reliability and reproducibility. It is therefore recommended that if rCBV is visually measured in post-operative glioma patients it should be carried out by two readers and with precaution.

The use of semi-automatic complete VOI DSC has been attributed good discriminative power to distinguish TP from TRA. For example, in a recent study rCBV values of a complete VOI obtained from DSC PW-MRI yielded an AUROC of 0.81 to distinguish TRA from TP [21]. Similar results were obtained in the rCBV values of a complete VOI analysis of DSC PW-MRI data in the setting of metastatic neuro-oncological disease. In the study of Kuo et al., thirty subjects with 37 lesions were investigated (20 TRA; 17 TP). When using rCBV values of a VOI obtained from DSC PW-MRI data, an AUROC of 0.79 was yielded to discriminate TRA from TP [20]. The outcome of these papers is corroborated by the current results, as we found an AUROC of 0.82 (95%-CI: 0.70–0.94) for the mean rCBV in the semi-automatic VOI analysis.

It has been reported that implementation of DSC PW-MRI in routine follow-up MRI of glioma can aid the early diagnosis of TP [28]. In this study, a standardised perfusion acquisition protocol and standardised methodology to process data with well-validated criteria was used. This has been recommended by others for research on post-treatment radiological evaluation of glioma patients [13]. However, harmonization of the imaging protocol and the post-processing work-flow remains lacking, probably explaining the wide range of cut-off values that have been reported in literature [6]. Also, these differences hinder the sharing and pooling of imaging data. Nevertheless, important steps with regard to the standardisation of DSC acquisition parameters have been taken during the last years [12,13,14].

Strengths and limitations

In the VOI analysis, the entire T1-weighted contrast enhancing lesion was included, which limits the inter- and intra-observer variability when compared to manual placement of regions of interest. The used IB-Rad Tech software and it’s semi-automatic processing of DSC-PWI images has been described to further reduce user-related variability [22], as its automated standardisation circumvents the manual placement of references which was shown to be especially vulnerable to subjectivity. Another strength of the current study concerns the fact that for the first time, single-moment rCBV values of VOIs were directly compared with rCBV values of ‘hot spots’ and a purely visual assessment, which are commonly used in standard clinical reading of DSC PW-MRI data in the post-treatment evaluation of gliomas to distinguish TP from TRA.

An important limitation of the current study comprised its retrospective nature and the lack of an external validation cohort in which the observed threshold values can be tested. It must be emphasized that the differences between the rCBV values of the VOIs were valid on a group level; the usefulness of this technique in individual patients needs further investigation.

The single-moment analysis of the PWI data can also be seen as a limitation, as a three to six-month period of follow-up imaging and pre-operative scans are usually utilised to differentiate between TP and TRA. However, readers were informed with regard to the fact that the MR data they read were the first imaging study on which a new or growing contrast-enhancing lesion appeared. Therefore, it was not expected that this single-moment analysis impacted the outcome of the reader study.

The heterogeneity of the studied population can be seen as a limitation as they possess inherent differences in malignancy grade and therefore in behaviour and treatment options. However, all included subtypes can progress on follow-up imaging and can show new enhancing lesions in the post-treatment setting. As this study investigated the diagnostic accuracy of three approaches to analyse DSC PW-MRI data when new contrast-enhancing lesions occur, the impact of the rather heterogenous population is considered minimal.

The small number of patients per subgroup (when stratified by for example tumor type or treatment schedule) prevented a statistically sound analysis of the used subgroups and identification of potential differences between the used pathologies. As the scope of this study was aimed at elucidating differences between the used methodologies, this was deemed acceptable by the authors, however a future study with larger subgroups per pathology and treatment used should aid in the identification of potential differences.

A final limitation is the usage of a 1,5 T MRI system instead of a 3 T system. A 3 T scanner would allow for an increased signal-to-noise (SNR) ratio, increased temporal- and spatial resolution. However, DSC PW-MRI is often not limited by SNR, and the usage of 1,5 T in glioma imaging shows an almost perfect correlation with 3 T in MR modalities such as rCBV and identified lesion volume [29] so using a 1,5 T system should be sufficient in answering our study objectives.

Comments (0)

No login
gif