Intensive Care Unit (ICU) Readmission Prediction

Intensive Care Unit (ICU) Readmission Prediction


Unrecognized precursors could lead to preventable mortality and morbidity, apart from the extra costs that the patients would have to pay. The majority of mortality rates are associated with patients who require an unplanned transfer to the ICU. Few studies report that the mortality rates in these circumstances could be above 50%. Determining clinical deterioration in patients can help identify risk associated with each of them and hence could act as a warning system to prevent mortality in certain cases.

Apart from reducing mortality rates, such an automation process could also help in campaign management and could be very useful from an insurance perspective. Owing to all these factors, automated identification of clinical deterioration has become highly pertinent. Several studies have been reported on identifying the risk associated with patients, and some of these have been frequently used to predict the mortality rate in patients, length of stay, readmission, etc.

Readmission prediction is imperative for the fact that Healthcare treatments in the US are expensive and predicting ICU Readmission well in advance helps improve patient care and also helps in improving hospital operational efficiency. The Agency for Healthcare Research and Quality (AHRQ), after an exclusive study on readmission, concluded that the average cost for any diagnosis in 2016 was approximately $14,400 and this figure is quite likely to significantly increase in the near future. Patients’ EHR data can be used by robust machine learning models to identify subtle patterns and potentially create an early warning system for clinicians so they can intervene before the ICU transfer/readmission is needed.


  • Develop machine learning models that can predict unplanned ICU readmission based on EHR records.
  • Identify features that could be used to commonly identify patient risk.


ICU readmission blog - Ideas2it

Data Source

The Dataset used for this study was sourced from the MIMIC database. MIMIC III dataset is an open-source dataset that basically has patients’ EHR from 6 different ICUs collected over a period of many years. The dataset has information on 61532 ICU stays, 53432 adults and 8100 newborns. The data was prepared in a way that the following tables were merged into a single union of truth: Admissions, ICUStays, Admission, Diagnoses, Procedures. The Gender Ratio of Male:Female was found to be 56:44. The mean age of the patients was 61.96 ± 16.47 years. The variables that were present in the tables were Admission Type, Admission Location, Admit Time, Discharge Time, Marital Status, Ethnicity, Diagnosis, Procedures

Labeling the Data

A lag variable of ICU Admit Time was created and the time taken between two admission dates were calculated. If the Admit Times were less than 30 days for a patient, then a readmission flag was labeled. The data was considered only for adults and only unplanned emergencies were considered for the ICU Readmission Prediction.

Exploratory Data Analysis on the Source

Output Label


Length of Stay

Top 20 Primary Diagnosis

Admission Type

Admission Location

Discharge Location


Top 20 Languages Spoken By Patients

Marital Status


Top 20 Diagnoses

Clinical Features Computation

Patients’ parameters were present in the chart events table, which basically talks about vital signs and other body parameters measured at a regular interval of time. These time-related measurements were used in the current study to determine the risk associated with each patient at various intervals of time. Clinical research papers report risk scores such as LACE (Length of Stay, Acuity of Admission, Charlson Comorbidity Score, Emergency Score), SIRS (Systemic Inflammatory Response Score), SOFA (Sequential Organ Failure Assessment Score), qSOFA (Quick Sequential Organ Failure Assessment Score), SAPS II (Simplified Acute Physiology Score II), Elixhauser Score as being significantly related to predicting mortality, length stay and readmission. Therefore the current study has used the above scores to compute the risk score associated with each patient at different time intervals. Subsequently, these clinical features were computed for every patient’s ICU stay and the data was aggregated on a Patient ID, Admission ID and ICU Stay ID basis and measures like the clinical feature’s min, max, mean, standard deviation, skewness , 5th Percentile and 95th percentile were computed.


The clinical features and patient demographics were merged to form a flat-file. The sample data was split into training and testing dataset of 75% – 25% . Since the dependent variable had a Yes:No ratio of 6:94, the dataset is a highly imbalanced one. Hence the dataset was balanced to get an equal ratio of Yes-No in the training dataset. A Master Model with a combination of the clinical and non-clinical features was built with Random Forest of 100 Trees.

ML Results of the Master Model

Classification Report

AUC Curve

Feature Importance Plot


Sample used for the study was limited to adults’ unplanned ICU visits. Random forest model was able to predict ICU readmission with a balance accuracy score of 75%. Models were built on each risk score, however those models yielded a lesser accuracy. LACE, AGE and Elixhauser scores seem to be more important than the other scores for the input sample. However it must be noted that some of the measurements were missing in chart events, and some of the measurements were taken at different time intervals, thus creating a discrepancy in the scores computed at that time. This might be a reason for lower accuracy.

To know more, or visit