Acute pancreatitis (AP) is a common and potentially life-threatening gastrointestinal disease that places a substantial burden on healthcare systems worldwide. ICU readmissions among patients with AP remain frequent, especially in severe or recurrent cases, with rates exceeding 40%. Timely identification of patients at high risk for readmission is critical for guiding clinical decision-making and improving outcomes. In this study, we used the MIMIC-III database to identify ICU admissions for AP based on standardized diagnostic codes.
We implemented a structured preprocessing pipeline that included missing data imputation, correlation analysis, and hybrid feature selection. Specifically, we applied Recursive Feature Elimination with Cross-Validation (RFECV) and LASSO regression, supported by clinical expert review, to reduce an initial set of over 50 variables to 20 key predictors encompassing demographics, comorbidities, laboratory tests, and interventions. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was incorporated within a stratified five-fold cross-validation framework to maintain balanced training and unbiased evaluation.
Six machine learning models—Logistic Regression, k-Nearest Neighbors, Naive Bayes, Random Forest, LightGBM, and XGBoost—were developed and optimized through grid search. Model performance was assessed using standard metrics including AUROC, accuracy, F1 score, sensitivity, specificity, Positive Predictive Value (PPV), and Negative Predictive Value (NPV). XGBoost achieved the best performance, with an AUROC of 0.862 (95% CI: 0.800–0.920) and accuracy of 0.889 (95% CI: 0.858–0.923) on the test set.
An ablation study demonstrated the importance of each selected feature, as removing any one led to a reduction in model performance. Furthermore, SHAP (SHapley Additive exPlanations) analysis was conducted to enhance interpretability. Platelet count, age, and peripheral oxygen saturation (SpO2) were identified as major contributors to readmission prediction. Overall, this study shows that ensemble learning, informed feature selection, and class imbalance handling can improve prediction of ICU readmission risk in patients with AP. These findings may support the development of more targeted post-discharge interventions to reduce preventable readmissions.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis study did not receive any funding.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study uses data from the Medical Information Mart for Intensive Care (MIMIC) database, which contains de-identified health-related data associated with patients admitted to the critical care units of the Beth Israel Deaconess Medical Center in Boston, Massachusetts. The use of this dataset was approved through completion of the required Collaborative Institutional Training Initiative (CITI) program and acceptance of the data use agreement. All data were fully de-identified in compliance with the Health Insurance Portability and Accountability Act (HIPAA) regulations, and no further institutional review board (IRB) approval was required.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityThe data used in this study are available from the MIMIC-III database, which is publicly accessible at https://physionet.org/content/mimiciii/1.4/ to credentialed researchers who complete the required training and data use agreement. All data preprocessing steps, model code, and analysis scripts used in this study are available from the corresponding author upon reasonable request.
Comments (0)