Integrating WGCNA and machine learning to distinguish active pulmonary tuberculosis from latent tuberculosis infection based on neutrophil extracellular trap-related genes

Pulmonary tuberculosis (PTB), a chronic infectious disease caused by Mycobacterium tuberculosis (MTB), is primarily transmitted via airborne droplets [1]. Drug resistance of MTB is closely related to its unique cell wall lipid metabolism. The imbalance of lipid metabolism directly promotes drug tolerance. Particularly, the AccD3 gene (encoding the β subunit of acetyl-CoA carboxylase) as the core enzyme in the biosynthesis of mevalonate, regulates cell wall integrity and becomes a potential drug design target [2]. The World Health Organization's 2024 report indicates that more than a quarter of the global population is infected with MTB (latent tuberculosis infection), and approximately 5% may progress to active tuberculosis within two years of infection, significantly increasing the risk of death and causing a heavy economic burden in low- and middle-income countries [3,4]. However, the existing diagnostic methods (such as TST and IGRA) only reflect the immune memory after MTB exposure and lack specific markers that can dynamically reflect the activity of the lesion and the state of host-pathogen interaction, and cannot effectively distinguish latent from active tuberculosis (the two have significant differences in clinical significance, treatment, and management strategies) [[5], [6], [7], [8]]. Therefore, developing methods and biomarkers that can accurately distinguish between these two states is of crucial importance for interrupting transmission, optimizing treatment, and improving prognosis.

Neutrophils orchestrate adaptive immune responses and regulate chronic inflammation [9]. They have long been recognized as the predominant cell type infected by mycobacteria in active PTB patients [10]. This marker research strategy based on immune regulation mechanisms is also widely applied in the identification of disease stages for other chronic infectious diseases, such as chronic hepatitis B (for example, IL-1α, IL-2 and oxidative stress markers) [11]. While neutrophils cannot directly kill mycobacteria, they may assist host defense by interacting with other cell types [12,13]. Neutrophils release neutrophil extracellular traps (NETs)—composed of DNA, histones, and granule proteins—which mediate immune responses by trapping pathogens [14]. Recent studies implicate NETs in host responses to TB. Schechter et al. [15] demonstrated elevated plasma levels of NETs, neutrophil ELASTASE, and myeloperoxidase in active TB patients, correlating with disease severity. Another study found that mycobacteria induce NETs containing MMP-8 in vitro. Sputum from PTB patients showed increased NETs, with MMP-8 secretion linked to lung tissue damage [16]. García-Bengoa et al. [17] reported that MTB proteins PE18, PPE26, and PE31 stimulate NET formation in human blood-derived neutrophils via elevated intracellular ROS. In conclusion, NETs play a crucial role in the pathogenesis of PTB, suggesting their potential as diagnostic markers. However, the diagnostic value of NET-related genes in PTB has not yet been clarified.

With the development of high-throughput technologies, more omics data have become easily accessible. Network analysis based on systems biology is a powerful tool for mining omics data, thus making it particularly important in bioinformatics research [18,19]. Weighted Gene Co-Expression Network Analysis (WGCNA) can identify co-expressed gene modules and explore their associations with specific phenotypes [20]. On this basis, machine learning algorithms can further explore the relationships between these genes and disease phenotypes, thereby revealing potential biological mechanisms. The Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination (RFE), Random Forest (Random Forest), and Boruta feature selection algorithm are commonly used feature selection methods. LASSO regularizes high-dimensional data and can effectively reduce overfitting risk and improve prediction accuracy while performing feature selection [21]. RFE recursively eliminates redundant features and retains the most explanatory feature subset [22]. RF, as a non-linear model, can evaluate feature importance, distinguish and select genes that contribute significantly to the classification model [23]. The Boruta algorithm selects truly important variables based on RF by comparing the original features with randomly generated "shadow features", further reducing false positives and enhancing the stability of the selection results [24]. These methods perform feature selection from different perspectives and are complementary and robust. Previous research has combined WGCNA with various machine learning methods (such as LASSO, RFE, RF, Boruta) to successfully screen out key genes related to Alzheimer's disease associated with apoptosis and established corresponding diagnostic models [25]. Similarly, Wang et al. used WGCNA in combination with LASSO, RFE, and other methods in the study of extrapulmonary tuberculosis to identify multiple diagnostic biomarkers related to immune infiltration [26]. These findings provide powerful ideas for the molecular classification and precise diagnosis of complex diseases.

This study innovatively combined WGCNA with multiple machine learning algorithms (such as LASSO, RFE, RF, Boruta) to systematically screen and identify NETs-related characteristic genes and constructed a miRNA regulatory network to analyze its potential mechanism. Through in-depth exploration of the GSE39939 training set and validation with the GSE39940 verification set, this study identified a group of molecular markers with high diagnostic value from the NETs perspective for the first time. This strategy provides an efficient potential diagnostic tool for precisely differentiating latent and active tuberculosis and opens up a new direction for the mechanism research targeting key NETs-related genes in PTB.

View original article

DIAGNOSTIC MICROBIOLOGY AND INFECTIOUS DISEASE

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Integrating WGCNA and machine learning to distinguish active pulmonary tuberculosis from latent tuberculosis infection based on neutrophil extracellular trap-related genes

Comments (0)