Learnable prototype-guided multiple instance learning for detecting tertiary lymphoid structures in multi-cancer whole-slide pathological images

Tertiary lymphoid structures (TLS) are specialized immune tissues that are typically absent in certain organs under normal physiological conditions. However, they frequently emerge in various pathological conditions, including pathogenic infections, autoimmune diseases, allograft rejection, and multiple types of cancer (Yu et al., 2024; Zhao et al., 2024). Structurally and functionally similar to secondary lymphoid organs, TLS provide a well-organized microenvironment that supports antigen presentation and lymphocyte activation (Bilotta et al., 2022; Xiao and Yu, 2021). In tumor tissues, the presence of TLS is closely associated with antitumor immune responses and correlates strongly with clinical features such as immune activity, therapeutic efficacy, and overall survival (He at al., 2020; Lynch et al., 2021; Tanaka et al., 2023; Nakamura et al., 2023). Therefore, TLS are considered significant biomarkers indicative of favorable clinical prognosis (Wang et al., 2022b; Yu et al., 2023a). Accurate detection of the presence and characteristics of TLS is of great importance for the study of the tumor immune microenvironment, providing potential targets and an important basis for clinical intervention (Schumacher and Thommen, 2022; Teillaud et al., 2024).

Despite the well-established research importance of TLS in tumor immunology, their detection remains a significant challenge (Aoyama et al., 2021; Zhao et al., 2024; Yu et al., 2024). The sparse distribution of TLS within tissues, coupled with the intricate nature of their microenvironment, makes automated identification of these structures highly challenging, particularly in the presence of elevated background noise and complex tissue architecture (Sautès-Fridman et al., 2019; Koksoy et al., 2024). Current methods predominantly rely on histopathological techniques such as immunohistochemistry (IHC) and multiplex immunofluorescence (mIF), which excel in visualizing TLS by identifying specific immune cell types and revealing their cellular composition and spatial organization (Peng et al., 2023; Chen et al., 2024b). Although these methods are accurate and clear in defining TLS structures, they are limited by high cost, operational complexity, and time inefficiency, making them impractical for large-scale studies. Hematoxylin and eosin (H&E) staining, by contrast, is widely used in routine practice due to its low cost and efficiency (Zhao et al., 2024; Zhang et al., 2024). However, its interpretation relies heavily on pathologists' expertise, particularly when the boundaries of the TLS are indistinct or poorly defined, leading to significant inter-observer variability. This subjectivity reduces consistency across laboratories and poses significant challenges for multi-institutional studies. Additionally, the absence of standardized criteria for TLS size, morphology, and cellular composition complicates cross-institutional comparisons and affects the reproducibility of research findings (Wang et al., 2022a; He et al., 2024). Furthermore, TLS in whole slide images (WSIs) are typically sparse and irregularly distributed, occupying only a small fraction of the tissue. This sparse distribution, combined with high background noise and complex tissue structures, makes their automated detection and recognition exceedingly difficult, often surpassing the capabilities of traditional approaches and manual assessments.

With the rapid development of digital pathology, deep learning methods have made remarkable progress in analyzing pathological images (Janowczyk and Madabhushi, 2016; Jimenez-del-Toro et al., 2017; Deng et al., 2020; Kim et al., 2022). WSIs, which consist of ultra-high-resolution tissue sections, pose significant challenges due to their enormous complexity and data volume, rendering traditional image processing techniques inadequate. However, deep learning-based WSI analysis faces two major obstacles. First, the extraordinary resolution of WSIs (often spanning tens of thousands of pixels in each dimension) makes it impractical to feed them directly into deep learning models, necessitating their segmentation into smaller patches for processing. Second, the immense volume and complex nature of WSI data further complicates efficient analysis using traditional techniques. In this context, multiple instance learning (MIL) has emerged as a compelling solution. By effectively handling global image information without requiring explicit annotations, MIL is particularly well-suited for large-scale pathological image analysis tasks where fine-grained labels are often unavailable (Pathak et al., 2015; Ilse et al., 2018; Carbonneau et al., 2018; Fatima et al., 2023).

Currently, MIL methods have demonstrated significant potential in tumor microenvironment analysis, but their application to TLS identification remains relatively underexplored. Due to the significant histological variations in TLS across different cancer types in whole-slide pathological images (see Fig. 1), traditional MIL methods based on a single cohort face several challenges in detecting TLS in WSIs. These challenges are mainly reflected in the following aspects: 1) Morphological Heterogeneity: TLS exhibits distinct structural variations across different cancer types. For instance, in breast cancer, TLS often appears as compact clusters of lymphocytes, whereas in lung cancer, it tends to be more diffusely distributed along bronchovascular structures. This morphological variability poses a significant challenge for MIL methods that rely solely on local morphological features, making it difficult to generalize to TLS in other cancer types and thereby affecting detection accuracy and robustness. 2) Spatial Heterogeneity: The spatial distribution of TLS can vary significantly across tumor types and patients. For example, while TLS in breast cancer is typically characterized by tightly packed lymphocyte clusters, in lung cancer, it is more diffusely distributed around bronchovascular structures. This diversity in spatial patterns greatly increases the complexity of classification tasks that depend exclusively on morphological features. 3) TLS Biogenesis Variability: The developmental stages and cellular composition of TLS differ across cancer types. For example, some cancers exhibit TLS with well-defined germinal centers, whereas in others, TLS may lack this feature entirely. 4) Background Complexity: The tumor microenvironment surrounding TLS varies across cancer types, making it challenging to distinguish TLS from different histological backgrounds. For instance, in gastric cancer, TLS must be correctly identified within inflammatory regions rich in plasma cells, whereas in bladder cancer, TLS is often embedded within the muscularis propria. These distinctive TLS characteristics, along with clinical demands, present substantial challenges for MIL methods based on a single cohort, limiting their generalizability across different cancer types.

To address the above challenges, this paper proposes a weakly supervised learning method, learnable prototype-guided multiple instance learning (LPGMIL). Due to the sparse and highly heterogeneous nature of TLS, LPGMIL uses the Hover-Net model (Graham et al., 2019) to select lymphocyte-dense instances from WSIs, which are then treated as candidate prototypes. A hierarchical clustering model is used to derive global prototypes from these candidate prototypes. Previous studies primarily employed fixed global prototypes for modeling and inference, which lack flexibility and struggle to adapt to the complex background and diversity of TLS, thereby limiting the model's generalization and adaptability (Yu et al., 2023b; Yan et al., 2024). Although some recent methods have introduced learnable prototypes (Rymarczyk et al., 2022; Yang et al., 2023; Liu et al., 2024), their interaction with WSI features still largely relies on simple gated attention mechanisms or similarity measures that only provide a single similarity score (e.g., Cosine similarity or Euclidean distance), which fails to effectively capture the heterogeneity of TLS. More importantly, existing prototype-based multiple instance learning methods generally overlook the long-range dependencies between instances within the WSI, which are crucial for capturing the global structure of TLS and enhancing the model's discriminative power. To this end, LPGMIL integrates a state-space model and learnable prototype reasoning with a multi-branch masked multi-head attention mechanism to adaptively capture the heterogeneous features of TLS. The state-space model is employed to reconstruct global features of WSIs, while the learnable prototypes and masked attention are designed to extract discriminative local features. This synergy enhances the model’s adaptability and robustness in TLS detection tasks - an aspect that remains insufficiently explored in existing MIL-based methods.

The main contributions of this study are as follows:•

We propose a TLS-oriented MIL framework that builds learnable global prototypes from intrinsic TLS features, enabling customized modeling of TLS feature.

We propose an adaptive TLS feature capture strategy by integrating state space model with multi-branch masked multi-head attention to handle TLS heterogeneity and sparsity.

Our method achieves competitive performance across both multi-cancer and single-cancer datasets, demonstrating its potential utility in multi-cancer TLS detection and the promise of future clinical applications.

Comments (0)

No login
gif