Intelligent Prediction Platform for Sepsis Risk Based on Real-Time Dynamic Temporal Features: Design Study


Introduction

Sepsis, a life-threatening clinical syndrome triggered by infection, continues to impose a significant global burden due to its persistently high incidence and mortality rates. In the United States, approximately 750,000 individuals are diagnosed with sepsis annually, with a mortality rate exceeding 30%; Europe reports 150,000 sepsis-related deaths yearly [,]. The economic impact is staggering, with annual US health care costs exceeding US $20.3 billion []. Patients with sepsis experience hospital stays twice as long as those with other conditions, and the incidence of severe sepsis continues to rise by 13% annually []. Early detection is critical to reducing mortality; yet, current diagnostic methods lack accuracy and real-time capability []. The multifactorial characteristics of sepsis increase the difficulty of early diagnosis, and the specificity of its diagnostic indicators is relatively low, which is prone to cause misdiagnosis []. Timely initiation of protocolized treatment bundles significantly improves survival [], with evidence suggesting that a 1-hour resuscitation bundle may become the cornerstone of septic shock management []. However, sepsis pathophysiology is complex, microbiological confirmation is time-consuming, and existing scoring systems (eg, Mortality in Emergency Department Sepsis or Sequential Organ Failure Assessment) have limitations []. Thus, establishing a real-time sepsis prediction system is imperative to reduce clinical mortality.

Current research has leveraged artificial intelligence (AI) to develop sepsis prediction models using dynamic physiological and laboratory data. Machine learning has gained traction in critical care for disease diagnosis [], outcome prediction [-], and clinical decision support []. Recent AI models for sepsis prediction outperform traditional methods [-], but reliance on non–real-time laboratory data and the “black-box” nature of AI models hinder clinical adoption [,]. These models often fall short in real-time performance and interpretability, limiting their clinical utility. Intensive care unit (ICU) clinicians require transparent, medically logical AI tools to preserve decision-making autonomy, aligning with evidence-based principles. Current sepsis prediction models also lack integration into practical platforms, further limiting clinical utility.

This study addresses these gaps by developing ML models based on dynamic features from real-time, noninvasive physiological indicators. We use local and global interpretability methods to enhance clinical trust and establish a web-based sepsis prediction platform.


MethodsData Source

Data were extracted from the MIMIC-IV (Medical Information Mart for Intensive Care IV) database, developed by the MIT (Massachusetts Institute of Technology) Laboratory for Computational Physiology. This deidentified database includes clinical and waveform data from ICU and emergency department patients at Beth Israel Deaconess Medical Center (2008-2019) [,]. The physiological indicators were obtained from patient monitors. Blood oxygen saturation (SpO2), heart rate (HR), and respiratory rate (RES) were collected 1-10 times per hour. Blood glucose (GLU) and body temperature (TEM) were measured once every 1 to 5 hours, depending on the patient’s condition. Other indicators were collected hourly.

Study Cohort and Variable Selection

Patients with sepsis were identified per the 2018 Chinese Guidelines for Sepsis or Septic Shock Management (positive blood culture + antibiotic use + Sequential Organ Failure Assessment score ≥2). A control cohort included ICU patients without sepsis. Among 1118 patients (550 with sepsis and 568 controls), 8 real-time physiological indicators were selected: HR, systolic blood pressure (SP), diastolic blood pressure (DP), mean arterial pressure (MP), RES, TEM, SpO2, and GLU. Stratified sampling divided the cohort into training (n=894) and test (n=224) groups. Baseline characteristics showed no significant differences ().

Table 1. t test analysis of characteristics between training and test groupsa.ParameterTraining (n=894), median (IQR)Test (n=224), median (IQR)P valueHeart rate (bpm)86.1 (30.5-157)86.5 (43-137.5).81Systolic BPb (mm Hg)119.6 (48-191)115.5 (36.8-187).08Diastolic BP (mm Hg)61.5 (10-116)61.9 (19-121).67Mean arterial pressure (mm Hg)77.8 (24.3-149)75.7 (8-127).08Respiratory rate (bpm)19.8 (5-44)19.6 (6-52).60Temperature (°C)36.8 (31.7-40.4)36.8 (33.1-39.9).74SpO2c (%)96 (28.4-100)95.6 (23-100).42Blood glucose (mg/dL)140.6 (42-309)139.3 (63-326).75

aP<.05 indicates statistical significance.

bBP: blood pressure.

cSpO2: blood oxygen saturation.

Data Preprocessing

First, the data underwent outlier removal. Since part of the MIMIC-IV data was manually entered by health care providers, potential input errors or anomalies were addressed. Clinical knowledge was applied to define valid physiological ranges and filter absolute outliers (for instance, TEM: 20 °C-50 °C; SpO2: 21%-100%; HR: 0-300 bpm). Data points outside these ranges were deemed invalid and removed. Next, missing data imputation was performed to address gaps in the cleaned dataset. A hybrid approach combining multiple interpolation methods was applied:

1. Mean imputation: Replacing missing values with the overall mean of the feature.

2. Class-specific mean imputation: Using mean values from subgroups (eg, sepsis vs nonsepsis cohorts).

3. Linear interpolation: Filling gaps using linear trends between adjacent valid data points.

4. Forward-fill imputation: Propagating the last valid observation forward.

The entire preprocessing workflow is illustrated in .

Figure 1. The entire preprocessing workflow.

To minimize reliance on long-term temporal dependencies, the prediction model was designed to operate on a 3-hour sliding window of real-time physiological data. This allows the model to initiate sepsis risk prediction immediately after 3 hours of monitoring. A 3-hour window allows sufficient time for sign monitoring and subsequent sepsis prediction, providing doctors with ample time for intervention. From the 3-hour time series of each physiological indicator, 3 linear parameters were computed: mean value over the 3-hour window, fluctuation coefficient (SD within the window), and endpoint value (the last recorded value in the window).

This generated a 24-dimensional feature vector for each patient, structured as follows:

Endpoint values: HR, SP, DP, MP, RES, TEM, SpO2, and GLU.Mean values: mean-HR, mean-SP, mean-DP, mean-MP, mean-RES, mean-TEM, mean-SpO2, and mean-GLU.Fluctuation coefficients (SDs): var-HR (variation in heart rate), var-SP (variation in systolic blood pressure), var-DP, var-MP (variation in mean arterial pressure), var-RES (variation in respiratory rate), var-TEM (variation in body temperature), var-SpO2 (variation in blood oxygen saturation), and var-GLU (variation in blood glucose).

All features were standardized (z score normalization) to ensure consistent scaling before model training. This preprocessing pipeline ensures robust, clinically meaningful inputs for the subsequent machine learning workflow.

Sepsis Prediction Model

We developed and evaluated the sepsis prediction model for actual ICU sign monitoring to achieve a model with greater generality in a broader clinical setting. Critically ill patients in the ICU have different needs for GLU monitoring frequency. The incidence of glucose metabolism disorders in critically ill patients is high, and the incidence increases in turn in sepsis, severe sepsis and septic shock [], which not only reflects the abnormal secretion of hormones and the severity of the disease, but is also closely related to the increase of mortality and complications []. High or low GLU is one of the causes of organ dysfunction. Therefore, it is necessary to increase the frequency of GLU monitoring in the face of the clinical background of insufficient energy intake, high catabolism, impaired GLU regulation mechanism, or the implementation of insulin treatment. At the same time, it is also necessary to increase the frequency of GLU monitoring in critically ill patients with diabetes or hypoglycemia. Whereas in most other ICU patients, GLU is usually measured only once or twice a day.

Therefore, to improve the generalization of sepsis prediction models, we developed and evaluated 2 sepsis prediction models, one based on high-frequency glucose monitoring and the other based on routine vital signs monitoring without glucose. By combining the 2 models, a model with greater generality in a wider range of clinical settings was achieved.

This study used 6 machine learning algorithms—support vector machine, random forest, ExtraTrees, XGBoost (Extreme Gradient Boosting), AdaBoost, and Logistic—to construct real-time sepsis prediction models. Model performance was evaluated and compared using metrics including accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUROC).

Model Interpretation

The TreeSHAP (Tree-Based Shapley Additive Explanations) method was used to provide both local (individual prediction) and global explanations for the sepsis prediction model. TreeSHAP, grounded in cooperative game theory, constructs an additive explanation model that treats all features as “contributors” to the prediction outcome. For each sample, TreeSHAP generates a Shapley value for each feature, quantifying its marginal contribution to the model’s decision. For the sepsis prediction model with 24 features, let the original model be f, trained using AdaBoost. The explanation model g in SHAP (Shapley Additive Explanations) is defined as:

x=(x1,x2…x24) represents the feature vector of a single patient. f(x) is the prediction from the original model. g(x) is the prediction from the explanation model. øi is the Shapley value for the ith feature.

S is a subset of , with 224-1 possible combinations. |S| is the number of features in subset S. fx (S ∪ i) and fx(S) are model predictions with and without feature i, respectively. Global interpretation aggregates local explanations by averaging the absolute Shapley values across all samples.

Ethical Considerations

Our study was conducted in accordance with the guidelines of the Helsinki Declaration. The Review Committee of the Massachusetts Institute of Technology and Beth Israel Deaconess Medical Center approved access to the MIMIC-IV database. Authors fulfilled the database access request. All these data were deidentified; therefore, the study was exempt from ethical approval and informed consent requirements.


ResultsModel Prediction Performance

The performances of the sepsis prediction models based on high-frequency glucose monitoring are summarized in and . Among all models, AdaBoost achieved the highest accuracy of 0.70 (95% CI 0.68-0.71), precision of 0.69 (95% CI 0.68-0.71), F1-score of 0.69 (95% CI 0.67-0.70), and AUROC of 0.76 (95% CI 0.74-0.77), demonstrating superior real-time prediction capability. The performances of the sepsis prediction models based on routine vital signs monitoring without glucose are summarized in and . Among all models, AdaBoost achieved the highest accuracy of 0.67 (95% CI 0.66-0.69), precision of 0.67 (95% CI 0.65-0.68), and AUROC of 0.75 (95% CI 0.74-0.77). The experimental results show that the sepsis prediction model based on high-frequency monitoring of GLU has better prediction performance, and the sepsis prediction model based on routine vital signs monitoring has a wider application scenario.

Figure 2. Receiver operating characteristic curves of machine learning models based on high-frequency glucose monitoring. AUC: area under the curve; ROC: receiver operating characteristic; SVM: support vector machine; XGBoost: Extreme Gradient Boosting. Table 2. Models’ performance comparison of machine learning models based on high-frequency glucose monitoring.ModelsAccuracy, median (IQR)Precision, median (IQR)Recall, median (IQR)F1-score, median (IQR)AUROCa, median (IQR)SVMb0.64 (0.61-0.66)0.64 (0.62-0.67)0.6 (0.59-0.61)0.62 (0.6-0.65)0.7 (0.69-0.72)Random forest0.68 (0.65-0.7)0.65 (0.62-0.67)0.7 (0.68-0.73)0.67 (0.65-0.69)0.75 (0.74-0.77)ExtraTrees0.69 (0.67-0.71)0.69 (0.67-0.71)0.66 (0.65-0.68)0.67 (0.65-0.7)0.75 (0.72-0.78)XGBoostc0.7 (0.68-0.72)0.68 (0.68-0.69)0.71 (0.7-0.72)0.69 (0.68-0.69)0.74 (0.73-0.75)AdaBoost0.7 (0.68-0.71)0.69 (0.68-0.71)0.67 (0.66-0.69)0.69 (0.67-0.7)0.76 (0.74-0.77)Logistic0.63 (0.62-0.63)0.62 (0.61-0.62)0.62 (0.61-0.63)0.62 (0.62-0.63)0.69 (0.67-0.7)

aAUROC: area under the receiver operating characteristic curve.

bSVM: support vector machine.

cXGBoost: Extreme Gradient Boosting.

Figure 3. Receiver operating characteristic curves of machine learning models based on routine vital signs monitoring without glucose. AUC: area under the curve; ROC: receiver operating characteristic; SVM: support vector machine; XGBoost: Extreme Gradient Boosting. Table 3. Models’ performance comparison of machine learning models based on routine vital signs monitoring without glucose.ModelsAccuracy, median (IQR)Precision, median (IQR)Recall, median (IQR)F1-score, median (IQR)AUROCa, median (IQR)SVMb0.61 (0.59-0.64)0.61 (0.59-0.63)0.57 (0.55-0.6)0.59 (0.57-0.61)0.69 (0.67-0.71)Random forest0.66 (0.65-0.68)0.65 (0.62-0.67)0.68 (0.66-0.71)0.66 (0.64-0.69)0.73 (0.72-0.74)ExtraTrees0.67 (0.65-0.7)0.65 (0.63-0.67)0.66 (0.65-0.68)0.66 (0.65-0.67)0.74 (0.72-0.76)XGBoostc0.67 (0.66-0.69)0.65 (0.63-0.67)0.73 (0.71-0.74)0.68 (0.66-0.7)0.72 (0.7-0.73)AdaBoost0.67 (0.66-0.69)0.67 (0.65-0.68)0.66 (0.64-0.68)0.66 (0.65-0.67)0.75 (0.74-0.77)Logistic0.64 (0.63-0.65)0.63 (0.62-0.64)0.63 (0.61-0.64)0.63 (0.62-0.63)0.69 (0.67-0.7)

aAUROC: area under the receiver operating characteristic curve.

bSVM: support vector machine.

cXGBoost: Extreme Gradient Boosting.

Interpretability Analysis Using TreeSHAPIndividual Prediction Explanation

The TreeSHAP method was used to generate local explanations for individual cases to elucidate the contribution of dynamic features to model predictions. illustrates the interpretability analysis for a patient predicted as having sepsis, where features are color-coded to reflect their impact: red denotes a positive contribution (increasing sepsis risk) and blue indicates a negative contribution (reducing sepsis risk). The baseline value E[f(x)], representing the model’s average prediction across the dataset, serves as the reference point. Key findings include: SP fluctuation coefficient (var-SP=12.257) exerted the strongest positive influence (+1.18), followed by SpO2 fluctuation (var-SpO2=2.867), mean RES (mean-RES=3.667), mean DP (mean-DP=68.667), TEM (TEM=35.833), and MP fluctuation (var-MP=6.164). Negative contributors included SP (SP=15), RES (RES=11), and DP (DP=75). These results basically align with clinical intuition, where elevated variability in hemodynamic parameters (eg, var-SP or var-MP), hypothermia, and reduced RES correlate with sepsis pathophysiology. The interpretability framework enables clinicians to validate model logic against established diagnostic criteria and identify anomalous predictions.

Figure 4. Individual prediction explanation. DP: diastolic blood pressure; MP: mean arterial pressure; RES: respiratory rate; SP: systolic blood pressure; SpO2: blood oxygen saturation; TEM: body temperature. Global Interpretation

Global interpretability analysis aggregated Shapley values across all samples to reveal the model’s overarching decision logic and feature importance rankings (). As illustrated in A, the model demonstrated associations between decreased SP and elevated sepsis risk (row 9), increased SP fluctuation (var-SP) and elevated sepsis risk (row 6), and increased MP fluctuation (var-MP) with elevated sepsis risk (row 2). These interpretations align closely with existing clinical evidence and provide clinically meaningful insights that are of particular interest to clinicians. For instance, increased var-SP is more likely to elevate the probability of sepsis compared to increased var-MP. The Surviving Sepsis Campaign 2021 guidelines emphasize the importance of SP in the diagnostic criteria for sepsis and recommend using SP<90 mm Hg after adequate fluid resuscitation as one of the indicators for evaluating septic shock []. Additionally, studies have shown that blood pressure variability in sepsis patients correlates with disease severity [].

Figure 5. Global interpretation of the model and feature importance based on mean Shapley values. DP: diastolic blood pressure; GLU: blood glucose; HR: heart rate; MP: mean arterial pressure; RES: respiratory rate; SHAP: Shapley Additive Explanations; SP: systolic blood pressure; SpO2: blood oxygen saturation; TEM: body temperature.

The connection between the parasympathetic nervous system and inflammation suggests interdependence between autonomic nervous system function and inflammatory responses []. Furthermore, physiological systems exhibit nonlinear patterns of complexity, including fractal self-similarity across time scales []. Previous research has documented an association between the complexity of autonomic nervous system control and hemodynamic instability, indicating that such complexity may serve as a potential window into understanding hemodynamic confusion. However, the relevance of these measures of complexity in sepsis remains unclear [].

High temperature fluctuations (var-TEM) were strongly associated with increased sepsis probability, while stable temperatures reduced the risk (row 1). Notably, extreme TEMs, whether hyperthermia or hypothermia, were linked to elevated risks, whereas normal-range temperatures exhibited protective effects (row next-to-last). This interpretation of the model aligns with the clinical presentation of sepsis.

Both hypothermia and hyperthermia are generally associated with elevated lactate levels, and patients with severe sepsis often develop either hypothermia or, more commonly, a febrile response. Hypothermia in some patients with sepsis is well-documented and forms part of the definition of systemic inflammatory response syndrome []. There is variable thermoregulatory response in sepsis, and the definition of systemic inflammatory response syndrome in sepsis-1 includes both fever and hypothermia. The impact of thermoregulatory response on sepsis prognosis remains controversial. Studies have shown that hypothermia or fever can have either protective or detrimental effects in animal models of severe infection or inflammation []. Fever is a physiological response to infection that inhibits bacterial growth, prevents fungal proliferation, and enhances immune cell activity against pathogens []. Pathogens detected in blood cultures and elevated procalcitonin levels, both associated with high fever, indicate robust immune resistance to pathogen challenges [].

The model revealed an association between increased variability in oxygen saturation (var-SpO2) and elevated sepsis risk (row 5). This finding underscores the necessity of maintaining SpO2 levels within a reasonable range for ICU patients during hospitalization, which is consistent with existing clinical evidence. Studies have demonstrated a U-shaped relationship between SpO2 levels and in-hospital all-cause mortality in patients with sepsis, where both hyperoxia and hypoxia are associated with increased mortality risk. The optimal SpO2 range is determined to be 0.96-0.98 []. Under normal physiological conditions, oxygen supply and consumption remain relatively stable, with SpO2 fluctuating within a normal range without extreme variation. During shock, however, the imbalance between oxygen supply and consumption leads to deviations in SpO2 values, resulting in increased var-SpO2. A multicenter observational study involving over 600 patients with sepsis confirmed that abnormal SpO2 levels (either abnormally high or low) were associated with increased mortality [].

B ranks feature mean absolute Shapley values, identifying the top 6 determinants of sepsis risk: temperature fluctuation (var-TEM), MP fluctuation (var-MP), mean HR (mean-HR), mean RES (mean-RES), SpO2 fluctuation (var-SpO2), and SP fluctuation (var-SP). These findings underscore the critical role of hemodynamic and thermoregulatory instability in sepsis onset, consistent with clinical biomarkers. The global analysis also highlights potential outliers, guiding model refinement and clinical vigilance.

Implementation of the Practical and Efficient Sepsis Risk Prediction Platform

We developed a practical and intelligent sepsis risk prediction platform using a web-based tool [] (). By monitoring 3-hour dynamic temporal features of ICU patients—including HR, SP, DP, MP, RES, TEM, SpO2, and GLU—the platform automatically generates a graphical analysis report comprising:

Figure 6. Practical and efficient sepsis risk prediction platform. BP: blood pressure.

Statistical summary: quantitative analysis of dynamic temporal features (mean, SD, and endpoint values).

Vital signs trend analysis: visualization of physiological trends via time-series curves.

Risk assessment: sepsis risk prediction using the trained model, based on calculated linear parameters.Clinical actions: automated recommendations (for reference only) tailored to risk stratification.Prediction explanation: visual display of key influencing factors using Shapley values to interpret model decisions.

This platform, built on real-time dynamic temporal features, enables personalized sepsis risk prediction for ICU patients, enhances clinical utility by integrating interpretable AI insights, and streamlines decision-making through intuitive data visualization and actionable outputs.


DiscussionPrincipal Findings

Our study developed a real-time sepsis prediction model that integrates high timeliness and clinical interpretability based on 3-hour dynamic temporal sequences of 8 rapidly accessible physiological indicators. The real-time sepsis prediction model demonstrated robust performance. The output interpretation of explainable artificial intelligence (XAI) enhanced model transparency through both individual prediction and global explanations, and it linked the potential physiological or pathophysiological significance, including the patient’s hemodynamics, thermoregulatory response, and the balance between oxygen delivery and oxygen consumption. Although current consensus emphasizes early intervention for sepsis management, the optimal predictors guiding intervention and markers of early sepsis severity remain unclear. In this study, we further elucidated the model’s output results, explored the potential relationships between relevant sign parameters and sepsis, and discussed their potential physiological or pathophysiological significance. This enhances the interpretability and credibility of the XAI method, supporting the model’s applicability in real-world clinical practice. Finally, the web-based platform significantly enhanced clinical utility by providing real-time risk assessment, statistical summaries, trend analysis, and actionable insights.

Limitations

However, as a retrospective study, potential biases may exist in this study. Future efforts should prioritize multicenter validation and large-scale prospective studies to strengthen the robustness of these results. XAI holds immense promise in sepsis diagnosis and treatment, yet its development and clinical application face significant challenges []. Data quality remains a critical bottleneck. Heterogeneous hospital databases, inconsistent data collection and storage standards, and poor interoperability between health care information systems have led to fragmented “data silos,” hindering the application of large-scale clinical feature datasets. Additionally, the reliance on limited public or institution-specific databases in research settings restricts generalizability. In clinical practice, clinicians must integrate XAI predictions with laboratory findings (eg, lactate or procalcitonin), imaging results, and patient-specific symptoms to ensure accurate diagnosis and treatment planning.

Conclusions

In ICU settings, real-time physiological indicators—such as HR, RES, and SpO2—are typically used to monitor symptom fluctuations rather than generate objective diagnostic reports. This limitation stems from the significant variability in individual physiological data, the complexity of multiparameter interactions, and the diverse clinical implications of these metrics. This study demonstrates that XAI can bridge this gap by synthesizing real-time physiological data into actionable insights. By analyzing dynamic trends and providing interpretable explanations, XAI uncovers the diagnostic potential of these data. Our model, focused on sepsis risk prediction, leverages real-time physiological features to generate predictions while emphasizing the interpretability of results. In the future, XAI systems could deliver intelligent diagnostic reports integrating disease prediction, anomaly detection, and causal analysis of abnormal indicators, empowering clinicians to navigate complex physiological data with precision.

This work was supported by Shanghai Engineering Technology Research Center (18DZ2250900).

The data used in this study are publicly available from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. As a result of patient privacy concerns, the raw EHR (electronic health records) text data cannot be shared publicly but can be accessed through a compliant application to MIMIC-IV.

M Zhang conceptualized the study and carried out the formal analysis, investigation, and writing and editing of the original draft. M Zhong conceptualized the study and carried out the formal analysis, investigation, and validation. YC carried out the formal analysis, investigation, and validation. TZ assisted with the conceptualization, investigation, analysis, project administration, and paper review and editing.

None declared.

Edited by M Focsa; submitted 25.03.25; peer-reviewed by E Kawamoto, Z Peng; comments to author 25.04.25; revised version received 15.05.25; accepted 17.05.25; published 30.05.25.

©Mingwei Zhang, Ming Zhong, Yunzhang Cheng, Tianyi Zhang. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 30.05.2025.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

Comments (0)

No login
gif