Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Formative Research

Date Submitted: Apr 1, 2025
Open Peer Review Period: Apr 1, 2025 - May 27, 2025
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Machine Learning-Augmented Surveillance of Surgical Site Infections

  • Ugur Celik; 
  • Feifan Liu; 
  • Kimiyoshi J Kobayashi; 
  • Richard T Ellison 3rd; 
  • Yurima Guilarte-Walker; 
  • Deborah A Mack; 
  • Qiming Shi; 
  • Adrian Zai

ABSTRACT

Background:

Surgical Site Infection (SSI) is one of the most common healthcare-associated infections, comprising nearly 20% of all Healthcare-Associated Infections (HAI) in hospitalized patients. SSIs contribute to increased hospital length of stay, readmission, and healthcare costs, with an associated mortality rate twice that of non-infected patients.

Objective:

To develop machine learning models to predict potential SSI cases following colon surgeries, thereby improving the efficiency of SSI surveillance and enabling healthcare professionals to better prioritize patient care.

Methods:

Data was extracted from the EPIC Electronic Health Record system at a single academic center focusing on colon surgery patients between 2018 and 2023. The dataset included structured features such as demographics, medications, and vital signs, as well as unstructured clinical notes processed using natural language processing techniques. Logistic Regression, Random Forest, and XGBoost models were developed and trained to predict SSI risk. Cost-sensitive learning and Synthetic Minority Over-sampling Technique (SMOTE) were applied to handle the imbalanced dataset. Model evaluation was conducted using performance metrics including accuracy, precision, recall, and Area Under the Receiver Operating Characteristic curve (AUC-ROC).

Results:

From a cohort of 1508 patients, 66 (4.4%) developed SSIs. The XGBoost model demonstrated the best overall performance with precision of 50%, recall of 38%, and an AUC-ROC score of 0.788. The Random Forest model achieved the highest precision (100%) but with lower recall (23%), while Logistic Regression showed higher recall (46%) but lower precision (10%). Patients who developed SSIs were significantly older (mean age 61.1 vs. 58.5 years), had higher ASA scores (78.8% vs. 45.3% with ASA score 3), more frequently had contaminated wounds (40.9% vs. 27.5%), and more commonly received steroids (83.3% vs. 58.7%).

Conclusions:

Machine learning has shown promising results in enhancing the efficiency of SSI surveillance by enabling healthcare professionals to prioritize high-risk patients. While the models demonstrated strong overall accuracy, challenges related to imbalanced datasets remain. The XGBoost model provided the best balance between precision and recall, making it potentially the most clinically useful approach for SSI surveillance augmentation.


 Citation

Please cite as:

Celik U, Liu F, Kobayashi KJ, Ellison RT 3rd, Guilarte-Walker Y, Mack DA, Shi Q, Zai A

Machine Learning-Augmented Surveillance of Surgical Site Infections

JMIR Preprints. 01/04/2025:75121

DOI: 10.2196/preprints.75121

URL: https://preprints.jmir.org/preprint/75121

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.