We collected pathological sections and clinical data from 513 ccRCC patients using formalin-fixed, paraffin-embedded samples (scanned slides with excessive labeling over the tissue area, damaged slides, and slides without tumors were excluded, with only one sample selected per patient.) in The Cancer Genome Atlas (TCGA) database, including details such as survival time, survival status, ethnicity, and staging. Additionally, we acquired 144 renal cancer paraffin samples from Shanghai Outdo Biotech Compang (Shanghai, China) for external validation. As the TCGA database is publicly available for research purposes, no ethical approval was required.
2.2 Image pre-processingColor is a critical concern in whole-slide images (WSIs). Ensuring the standardization and verification of color on digital slide displays is crucial for implementing digital pathology effectively. Color variation primarily arises from differences in histology laboratory protocols and practices. Additionally, capture parameters such as illumination and filters, along with the image processing inherent to digital systems and display characteristics, can also influence the displayed color [14, 15]. Consequently, following the acquisition of H&E images, the originals were cropped and subjected to color normalization. These small patches then underwent normalization for them using the Macenko method with appropriate modifications [16]. Z-score normalization was subsequently applied to the RGB channels to standardize the distribution of image intensities.
2.3 DL feature extraction and selectionGiven the substantial dimensions of WSIs, typically 100,000 * 80,000 pixel tiles, they underwent segmentation into numerous patches. The segmentation process involved exhaustively partitioning tissue regions into non-overlapping patches measuring 256 × 256 pixel tiles, employing a magnification factor of 20× with the OpenSlide library in Python. Feature vectors were derived utilizing a modified ResNet50 model pretrained on ImageNet. These vectors were generated by inputting cropped patches of size 256 × 256 pixel tiles.
2.4 Deep learning trainingThe 513 cases of ccRCC pathology slides were randomly divided into training set (80%) and validation set (20%) for DL. A ccRCC pathological histological classification was established, and its robustness was further estimated in an external validation set. The training was conducted using a 10-fold Monte Carlo cross-validation strategy. In order to validate the accuracy of the pathology model in identifying regions, we conducted a thorough evaluation employing receiver operating characteristic (ROC) curves at the patch level.
2.5 Attention map generationCLAM produces interpretable heatmaps, allowing for an intuitive analysis of how each tissue region contributes to the model’s predictions in each WSIs [17]. These heatmaps provide pathologists with insights into histological and cytological features that are closely linked to high predictive value. To account for the differing significance of various regions in the pathological image for the model’s predictions, we calculated and saved unstandardized attention scores for all patches extracted from the image, using attention branches aligned with the model’s predicted categories. CLAM learned the attention score for each patch, which was then converted into percentiles. Subsequently, the percentiles for each WSI were normalized to a range of [0, 1], where 1 represented the highest predictiveness and 0 represented the lowest informativeness. The normalized scores were then converted into RGB colors using heatmaps and depicted above their corresponding spatial positions in the pathological images, visually highlighting areas of high attention in red and areas of low attention in blue.
2.6 The least absolute shrinkage and selection operator (LASSO)LASSO regression is an extension of linear regression, which is characterized by variable selection and regularization while fitting a generalized linear model. The L1 regularization term was used to constrain the coefficients of the model to achieve feature selection. The complexity of LASSO regression is controlled by the parameter λ, with larger values of λ obtaining fewer features [18]. The R package ‘glmnet’ was used for LASSO analysis to select the most relevant DL features. The optimal penalty parameter λ values were determined through tenfold cross-validation.
2.7 Identification and validation of histopathologic DL‑signatureWe utilized LASSO-Cox modeling to construct a DL signature, and a DL signature associated risk score was calculated for each patient by summing the product of each DL feature and its regression coefficients.This risk score was independently evaluated in an external validation set for ccRCC patients. Using the median risk score, patients were categorized into high-risk or low-risk groups. Kaplan-Meier curves were drawn, and differences in survival outcomes between these groups were analyzed using the log-rank test. The prognostic efficacy of the histopathological DL signature within our model was assessed through the area under the curve (AUC) values, which were derived from time-dependent ROC curves. Furthermore, the predictive power of the DL features was examined using multivariable Cox analysis that integrated clinical factors.
2.8 Statistical analysisAll analyses were performed with R (version 4.3.1) or Python (version 3.7.12). The Wilcoxon test was used to analyze the differences between the two groups. Kaplan-Meier method was used for estimating overall survival, and a Log-rank test was taken to compare different Kaplan-Meier curves. All statistical tests were considered significant with p < 0.05.
Comments (0)