Efficiency of pulmonary nodule risk scoring systems in Turkish population

When evaluated independently of other variables, it was seen that all scoring systems gave significant results in the differentiation of benign and malignant nodules. In addition, no significant results were observed for any scoring system with nodule sizes of < 1 cm, while all scoring systems were successful in the differentiation of benign and malignant nodules of > 1 cm.

According to nodule attenuation, while no scoring system gave significant results for ground-glass nodules, all scoring systems provided significant results for solid and semi-solid nodules.

In our study, there were significant relationships between age, nodule diameter, gender, spiculation, emphysema, and FDG uptake of the nodule and malignancy, which are among the parameters considered in these models, but no significant relationship was found between the other variables and malignancy.

When the ideal threshold values and different threshold values determined for each scoring system for our cases were evaluated, the obtained significance levels did not change. It was also observed that the history of granulomatous disease did not cause a significant change in the number of nodules.

When benign and malignant cases were compared according to the scores obtained from the pulmonary nodule malignancy prediction models evaluated in this study, the risk scores of malignant cases were found to be statistically significantly higher with all three models. However, when the mean malignancy risk score of benign nodules was considered, it was seen that it was 16.99% for the Brock model, 22.89% for the Mayo model, and 31.17% for the Herder model. The relevant threshold values attributed to malignancy risk probability for nodules in terms of benign/malignant distinction were determined as 5% in the ACCP and Fleichner guidelines and as 10% in the BTS guidelines [3, 5, 6]. Therefore, most of the nodules in our study had risk scores above the specified threshold values. In our opinion, the reason for this is that almost all of the nodules in the cases included in this study were operated on in our clinic due to moderate or high suspicion of malignancy. To optimally distinguish between benign and malignant nodules in patients with high mean risk scores by all three models and with many clinical and radiological risk factors, for the models evaluated in this study, it was necessary to determine new threshold values for the possibility of malignancy as specified in the guidelines. As a result of the statistical analyses, optimal threshold values were found to be 19.5% for the Brock model, 23.1% for the Mayo model, and 56% for the Herder model. Although the new threshold values slightly decreased the overall sensitivity of the models in distinguishing benign and malignant nodules, they had positive effects on other parameters, especially specificity and positive predictive values.

AUC for ROC curves were measured for evaluating the performances of the calculation models. It was observed that the Herder model performed significantly better than the Brock and Mayo models, which had very close AUC values. In light of this situation, FDG uptake in PET–CT may play an important role in the evaluation of pulmonary nodules. While developing the Herder model, only the performance of FDG uptake in the differentiation of benign and malignant nodules was examined, and no statistically significant difference was found for the performance of the Mayo model as used in this study. However, after integrating the FDG uptake of the nodule into the Mayo model, it was seen that the final version of the Herder model was statistically significantly superior to the performance of the Mayo model and isolated FDG uptake [9]. As can be understood here, the FDG uptake level of a nodule in PET–CT is not a sufficient parameter for evaluating the possibility of the malignancy of that nodule. However, when PET–CT findings are evaluated together with other clinical and radiological features of the patient, it becomes a valuable tool in determining the possibility of malignancy.

In addition, in our study, it was observed that the AUC values for all three models were lower than the AUC values reported in the original publications on the models’ development and validation (0.96 for the Brock model, 0.79 for the Mayo model, and 0.92 for the Herder model) [7,8,9]. This may be because, in addition to many other factors, almost all of the patients included in this study from our clinic were being followed due to a relatively high risk of malignancy. Therefore, the scores of malignant and benign nodules were generally closer to each other than they were in the populations studied in the original development of the models.

Compared to the AUC values obtained when all nodules were included, significant performance loss was detected upon differentiating benign and malignant nodules in all three models for subcentimetric nodules. There could be several reasons for this. First of all, the numbers of benign and malignant nodules included in this study were very close to each other (27 benign, 25 malignant). In the populations in which the models were developed, the malignancy rate was below 5% in both cohorts for the Brock model, 23% for the Mayo model, and 57% for the Herder model [7,8,9]. The performance degradation of the models may be due to this. Furthermore, while developing the Mayo model, all of the evaluated nodules were detected by chest X-ray [8]. Since subcentimetric nodules are more difficult to detect by chest X-ray than large nodules, the characteristic features of the detected subcentimetric nodules may have differed from our study. The same reasoning applies for the Herder model, since parameters other than PET–CT findings are calculated in the Herder model in contrast to the Mayo model. In addition, since none of the subcentimetric nodules in our study had moderate or high uptake of FDG, the guiding effect of PET–CT was limited, and the effectiveness of the Herder model may have therefore decreased. In the BTS guidelines, in accordance with the inferences to be made from the results of this study, the use of the Herder model is not recommended for nodules smaller than 8 mm [5]. Since all of the nodules included in the original study were detected by CT for the Brock model, the rate of subcentimetric nodules was higher than that evaluated by the other models [7]. Although a statistically significant effect was not observed, we think that the higher AUC value obtained for the Brock model compared to the other models was related to this. However, in our clinic, many patients with malignant subcentimetric nodules that should be conservatively followed-up according to the guidelines or even removed from follow-up were operated on thanks to the individual experiences and initiatives of the experienced radiologists and clinicians in our hospital, and these patients obtained curative treatment at the earliest possible stage. Sometimes clinicians or radiologists with quite experience may use clinical judgment which is different from the calculation model or guideline, and this is as effective as risk prediction models because of considering more variables and old experiences [12]. In addition, it is difficult to detect these nodules intraoperatively as well as in follow-up. Marking methods can also be used preoperatively [13].

For nodules larger than 1 cm, results were all statistically significant for all three models, both for solid and semi-solid nodules. In our opinion, with the elimination of the disadvantages of subcentimetric nodules, the malignancy risk estimation models achieved significant success in distinguishing between benign and malignant nodules. In addition, since nodules larger than 1 cm do not pose the difficulties for diagnostic factors that are seen with subcentimetric nodules, the AUC values were significantly higher.

The reason why the AUC values obtained for nodules of 11–20 mm were higher than those obtained for nodules of 21–30 mm, in our opinion, is the false positivity of large benign nodules. While the mean malignancy probabilities observed from the models for benign nodules of 11–20 mm were calculated as 15.24%, 17.51%, and 26.24% for the Brock, Mayo, and Herder models, respectively, these probabilities were calculated as 34.55%, 48.44%, and 59.66% for nodules of 21–30 mm. In other words, the mean probability of the malignancy of benign nodules of 21–30 mm in all models is higher than the optimal threshold values calculated for those models. This increases the false positive results and causes a negative effect on the performance measures of the nodules. Among the three models, the highest AUC value was obtained for the Herder model for both size ranges, and the lowest AUC value was that of the Brock model. However, no significant difference was observed between the AUC values of the Brock and Mayo models. Considering this finding, PET–CT is an important tool in the management of nodules of > 1 cm.

The efficacy of the models compared in this study was also compared according to the attenuation and the malignancy probability scores obtained for ground-glass nodules showed that only the Brock model determined malignant and benign nodules sufficiently. Ground-glass nodules are very difficult to evaluate, similar to subcentimetric nodules. In our study, it was an expected finding that the Mayo and Herder models, which could not make optimum use of these two factors, could not make effective distinctions between benign and malignant nodules, since spiculation, which has a significant difference between benign and malignant nodules, was not seen in ground-glass nodules due to their structures and generally low FDG avidity.

While AUC measurements for ground-glass nodules compared, in our opinion, the reason for the poor performance of the Mayo and Herder models in this regard may be that, similar to subcentimetric nodules, the nodules in the population included in the development of the Mayo model were evaluated after chest radiographs were reviewed [8]. Since ground-glass nodules, and especially those that are small in size, are difficult to detect on chest radiographs, the rate of ground-glass nodules included in the original study is likely very low compared to our study. Since the parameters of the Herder model, excluding PET–CT findings, are based on the Mayo model, the same problem is likely to be experienced with the Herder model. The Brock model, on the other hand, was created based on nodules detected by CT and the attenuation of the nodules was integrated into the model [7]. However, in the Brock model, the ground-glass character was a factor that reduced the possibility of malignancy, while 75% of the ground-glass nodules in our study were found to be malignant. Despite this, it is an interesting finding that the Brock model yielded the highest AUC value for ground-glass nodules among all groups evaluated by the Brock model in this study.

All three models successfully differentiated malignant and benign semi-solid nodules. Most of the semi-solid nodules included in this study were over 1 cm in size and it is possible that all of the models produced significant results in the differentiation of benign and malignant semi-solid nodules for this reason, in contrast to ground-glass nodules. In addition, when the mean malignancy probabilities of benign and malignant semi-solid nodules by the Brock model were examined, it was seen that they were higher than those obtained for solid and ground-glass nodules. This, in line with the model, suggests that the semi-solid nature of a nodule increases the possibility of malignancy.

When the AUC values of the models for semi-solid nodules were considered, excluding the Brock model, a significant increase was found in the AUC values of the other two models. In our opinion, the reason for this is likely related to the fact that the Mayo and Herder models are models developed based on nodules detected by chest radiography, as mentioned above while discussing the AUC values of the models for ground-glass nodules [8, 9]. The Mayo and Herder models may have been more successful in distinguishing benign and malignant semi-solid nodules compared to ground-glass nodules because the solid components of these nodules are increased. Therefore, the probability of their detection by chest X-ray also increases. In addition, it seems likely that the higher FDG uptake of semi-solid nodules compared to ground-glass nodules in this study contributed to the increased efficiency of the Herder model. The Herder model had the best performance among the three models for semi-solid nodules.

When it comes to solid nodules, all three models also differentiated malignant and benign ones successfully. Approximately 80% of the nodules in the two cohorts included in the original study for the development of the Brock model were solid nodules. Since the Mayo model and the Herder model, which is a derivation of the Mayo model, are models developed on the basis of nodules seen by chest X-rays, it is highly likely that the majority of the nodules included in those studies were solid. Models developed in studies in which solid nodules were the majority may have differentiated benign and malignant solid nodules more effectively in our study.

Considering the AUC values in the evaluation of solid nodules, the most effective model was the Herder model. The Mayo and Brock models followed respectively. The Mayo and Herder models yielded the highest AUC values here among all the groups evaluated in this study. This is because, as mentioned above, these two models, which are closely related, are likely to have been developed and validated in populations with high numbers of solid nodules. In addition, the AUC value for solid nodules with the Herder model is very close to the AUC value of the original study (0.92) [9]. This highlights the superiority of PET–CT for solid nodules.

As a result of various studies, many risk factors related to lung cancer were determined according to the clinical and demographic characteristics of the patients and the radiological characteristics of the nodules. However, these risk factors, and especially clinical and demographic factors, may differ in terms of their effects according to the structure of local populations and geographical features [10]. In the Brock model, female gender, family history of lung cancer, nodule type, localization of the nodule, and number of nodules are parameters that affect the probability of malignancy since there was a statistically significant difference between benign and malignant cases in the population investigated during the development of that model [7]. In the present study, a statistically significant relationship was found between male gender and malignancy and no other statistically significant differences were found between benign and malignant cases for the other parameters. Similarly, a history of smoking and a history of extrapulmonary malignancy at least 5 years ago were determined as risk factors in the Mayo and Herder models, but in our study, no statistically significant difference was found between benign and malignant cases for either parameter [8, 9]. Thus, the effects of these parameters on the differentiation of benign and malignant nodules in our study were reduced compared to the populations in the original studies. In such cases, it is inevitable that the performance of all three models will be decreased.

The main contribution of this study is its evaluation of nodules in the Turkish population with the currently used malignancy scoring systems by referring to definitive postoperative pathology results to retrospectively calibrate risk calculation models before using them in a new local population and provide a new optimal threshold value, as it mentioned in the literature [14].

The main limitation of the study is that it was conducted among patients who were followed in a thoracic surgery clinic in a reference center and operated on for pulmonary nodules. Most of the nodules in this study were already considered risky by clinicians and radiologists.

In conclusion, all models effectively differentiated benign from malignant pulmonary nodules in all groups except subcentimetric nodules and ground-glass nodules. However, none of the groups for which these models were effective had AUC values as high as those obtained in the original studies. This highlights the need to optimize models and malignancy risk thresholds for this population or develop a new model.

Comments (0)

No login
gif