Patients who received treatment at our institution from March 2022 to April 2023 were included in this study. The patient plans were generated using Eclipse treatment planning system (TPS) (Varian Medical Systems, Palo Alto, CA, version 15.6) or Pinnacle TPS (Philips Radiation Oncology Systems, Fitchburg, WI, version 16.2) and delivered using a Varian IX linear accelerator with a 6MV photon-beam. PSQA was scheduled using the ArcCHECK phantom from Sun Nuclear Corporation (Melbourne, FL, USA). Prior to the study, accelerator commissioning and ArcCHECK phantom configuration were performed following the vendor’s standard procedures. The measurement results obtained from the annular detector matrix of the ArcCHECK phantom were utilized to reconstruct simulated 3D dose distributions on the patient’s CT scans using the planned dose perturbation (PDP) algorithm, which was matched with 3DVH software version 3.0, which is an additional tool to ArcCHECK [28]. The workflow of this research is illustrated in Fig. 1.
Plans with insufficient target coverage resulting from OAR overlap and measurement files with excessive hot or cold points during reconstruction were excluded from the study. Eventually, a total of 687 IMRT and VMAT plan data were successfully reconstructed and used for subsequent model development and verification. Detailed plan information is provided in Table 1. Global GPRs were calculated using a 3%/2mm criterion with a 10% threshold, and a GPR value of 90% was employed to determine QA pass or fail. For DVH metrics, considering the variation in treatment sites among the patients, only metrics of PTV were calculated, which were PTV D95 (minimum dose received by 95% of the planning target volume), PTV D2, PTV Dmean (mean dose received by planning target volume), homogeneity index (HI) and conformity index (CI).
Fig. 1Workflow of this study. The metrics in the orange boxes are utilized as 1D predictive input, while the planning dose is utilized as 3D input to establish 3D only predictive models. And both of them are utilized as multimodal inputs for combined model. ACPDP: ArcCHECK planned dose perturbation algorithm
Table 1 Plan characteristic distributionsA total of 687 plans were utilized to calculate the complexity metrics, following the methodology described in previous studies [29, 30]. These complexity metrics encompassed information regarding the machine unit, leaf aperture, and leaf movement. Regular QA for linear accelerators (linacs) is performed as reported in TG142 [31], and the linac QA metrics on or closest to the measuring day were also recorded, such as absolute dose variation, flatness and symmetry. Additionally, certain dosimetric parameters of the plan were considered in the model, including the HI and CI of the target volume, the volume of the PTV, and the prescription dose. Thus, a total of 71 one-dimensional metrics were incorporated into the model, and the specific metrics are outlined in Supplementary Materials Table S1.
Combined model establishingGiven the diverse treatment sites of the patients, the sizes of the planning dose grid varied within the range of 63 × 68 × 62 to 336 × 259 × 263. Additionally, there were two sizes of grid spacing: 2.5 × 2.5 × 2.5 mm3 and 3 × 3 × 3 mm3. To ensure consistency and facilitate analysis, the dose data underwent preprocessing and scaling, resulting in a standardized size of 192 × 192 × 192 before entering the model. As for the 71 1D metrics, normalization was applied using the Z-score method, enabling effective concatenation within the model. In this study, we develop a novel combined architecture based on the Swin-transformer to effectively fuse multimodal inputs [32]. We employed 2 Swin-Transformer blocks, with 4 and 8 heads incorporated in each block. The 3D Swin-transformer block was employed to process the planning doses of patients. Subsequently, the extracted features from the 3D dose were subjected to an average pooling layer, resulting in a one-dimensional feature vector with 256 elements. On the other side, 71 1D inputs were processed through a multi-layer perceptron (MLP) with a hidden layer of 128 units to obtain another 256 1D features. These two types of features were then combined and directly used for multi-task prediction. In the Swin-Transformer block, the MLP was configured with 2 hidden layers, each having 2 times the input dimension units. Dropout layers were incorporated into the model in both 1D metrics processing and Swin-Transformer blocks to prevent overfitting. The architecture of the network employed in this study is illustrated in Fig. 2.
Fig. 2Network architecture utilized in this study. (a) The overall architecture of the combined model. (b) Reprocess block of the input image, including rescale and a convolution layer for all backbones. (c) The simplified Swin-transformer block of 3D version, consisting of two successive layers. (d), (e) The 3D residual blocks and U-net Encoder utilized in this study for comparison. The number of n or 2n means output channels of the convolution layer and n is the input channel of the block, the s means stride of the convolution kernel or pooling kernel. Norm: Normalization layer, in (b), (d) and (e) means Batch Normalization, in (c) means Layer Normalization. MSA: Multi-head Self-attention. MLP: Multilayer Perceptron
The dataset was split into train, validation and testing. The validation dataset was utilized to fine-tune the hyperparameters of the model during the training phase, and the independent test set was used only once for testing after the model was developed. To assess the regression model’s performance in predicting DVH metrics and GPR, the mean square error (MSE) loss function was employed. For QA classification task, the binary cross-entropy (BCE) loss function was utilized. The overall loss function was defined as the sum of these three individual losses, as depicted in Eq. 1.
$$\beginTotal\,loss = \frac\sum\limits_^N ^2}} + \frac\sum\limits_^N ^n ^2}} } \\\,\,\,\,\,\,\, - \frac\sum\limits_^N ( }^i} + (1 - )}(1 - ^i}))\end$$
(1)
Note: N: total number of samples, n: number of DVH metrics, γ: gamma passing rate, D: DVH metrics, y: the probability of passing the QA criteria, means the label of the variable.
Finally, we employed the determined optimal hyperparameters to train the model using a combined dataset consisting of the 447 training samples and 90 validation samples and the performance of the model was evaluated on the independent test set comprising 150 cases. The evaluation of predictions was based on the MAE and the AUC of ROC curve. The proposed deep network architecture was implemented using PyTorch [33] and executed on a NVIDIA GeForce RTX 3090Ti GPU with 24GB memory.
Other modelsAdditionally, we also calculated and compared the results obtained using either the 3D dose or the 1D metrics alone (shown in Fig. 1) to assess the individual predictive capabilities of these components as well as the combined model. For 3D model, we adopted commonly used network models in medical dose process, such as ResNet and U-Net encoder, the architectures are shown in Fig. 2. And for 1D model based on 1D metrics, we adopted a three-layer MLP with 2 hidden layers of 128 and 256 units. These training processes were on the same dataset as before. For clinical practice, the results of predicting QA classification directly or determining by predicting GPR were also compared using sensitivity and specificity.
Comments (0)