2020 Rbme Fs

Uploaded by

syedkousar576

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views12 pages

2020 Rbme Fs

Uploaded by

syedkousar576

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Machine Learning for Clinical Outcome Prediction

Farah Shamout, Tingting Zhu, and David A. Clifton

Abstract— Clinical decision-making in healthcare is algorithm. The loss function L(y, ŷ|Θ), also known as
already being influenced by predictions or recommen- the cost function, measures the dissimilarity between
dations made by data-driven machines. Numerous the true labels y and the values ŷ predicted by the
machine learning applications have appeared in the
latest clinical literature, especially for outcome pre- approximated model (e.g., mean square error, binary
diction models, with outcomes ranging from mortality cross-entropy, etc.). An optimization algorithm, such as
and cardiac arrest to acute kidney injury and arrhyth- gradient descent [8], minimizes L(y, ŷ|Θ) in an iterative
mia. In this review article, we summarize the state- manner based on the examples present in the dataset.
of-the-art in related works covering data processing,
inference, and model evaluation, in the context of
outcome prediction models developed using data ex- I. Clinical Context & Frameworks of
tracted from electronic health records. We also discuss Outcome Prediction Models
limitations of prominent modeling assumptions and
highlight opportunities for future research. Care pathways within hospitals vary largely due to the
Recent artificial intelligence (AI) developments seek diversity of admitted patients. Thus, an understanding
to positively impact medicine and clinical practice [1]. of the clinical context is key for developing machine
Machine learning (ML), an application of AI, recognizes learning models that can be incorporated within existing
patterns within large quantities of medical data to make medical processes. As shown in Fig. 2, a patient may be
future predictions, ranging from natural language pro- hospitalized as an emergency or elective admission, where
cessing to computer vision applications [2], [3]. Several the latter constitutes a routine procedure. During hospi-
ML frameworks have been proposed to predict clinical talization, different types of data are routinely collected
outcomes within a certain time period in the future, such from the patient for monitoring purposes.
as cardiac arrest, mortality, or intensive care unit (ICU) Patient monitoring tools, such as early warning sys-
admission [4], [5], [6], [7]. tems [9], are widespread across different hospital wards
In general, designing an ML system involves a multi- to continuously assess for patient deterioration. The def-
disciplinary effort that extends from data engineering to inition of what exactly constitutes clinical deterioration
training and evaluating a predictive model. We consider has evolved over time based on the data collection and
the general model as a mapping of an input to an output: processing techniques. Early attempts to define clinical
deterioration focused on medical neglect and its end
f :X→y (1)
result of clinical complications [10]. Subsequent studies
where f (.) is a function consisting of parameters Θ, X focused on more discrete clinical events, such as se-
is the input and y is the output. For example, X can vere sepsis, unexpected cardiac arrest, ICU admission
consist of vital signs measurements of the patient, such or mortality [11], [12], and tend to select one or more
as heart rate, blood pressure, and respiratory rate, and end-point measures of clinical deterioration. Such events
y can represent a binary label indicating the occurrence incur high costs of prolonged hospital stays, litigation,
of ICU transfer or cardiac arrest during the patient’s staff time, impact on patients and staff, and broader
hospital stay [7]. economic consequences [13]. The latter definition is the
Fig. 1 depicts the typical pipeline of a ML application, most popular one, as it enables researchers to group
starting from the input X, and ending with the corre- patients into discrete classes, such as deteriorating (i.e.,
sponding output represented by y. The first task learns those who experience an outcome) and non-deteriorating
to extract intermediary features (Section III) while the (i.e., those who are discharged without experiencing any
second task learns from patterns in the data to produce outcomes), and as such infer the y labels.
the predicted label (Section IV). Such models are usually The framework of outcome prediction models also
assessed based on clinical utility and interpretability varies across the literature. Some studies predict the risk
(Section V). of an outcome only once using the patient’s first N hours
As we discuss related works throughout this review, of data after admission, such as 24 or 48 hours [14].
we also provide an intuitive explanation of the ML Others choose to predict the risk of an outcome, such
techniques used for feature extraction or predictive in- as ICU readmission, using the patient’s last N hours of
ference. In general, ‘learning’ how to map the input to data prior to discharge. Another common methodology
the output involves approximating the parameters of the is to develop a real-time alerting score, which computes
model f (.), a loss function L(y, ŷ|Θ), and an optimization the risk of deterioration every time a set of clinical
Fig. 1. General ML pipeline that maps an input to a label. The two main steps of the pipeline are (i) extraction of an intermediary
feature space and (ii) label prediction using a classification or clustering algorithm.

observations is collected [15], as in clinical early warning be private, and there have only been a few notable efforts
systems [16]. to release open access datasets, such as the MIMIC-III
database [19]. Data and resource sharing is important for
II. Electronic Health Records
the advancements of the field.
Various types of data can be used to develop outcome It is also commonly agreed that data in EHRs may
prediction models, such as imaging, speech, or claims reflect the recording process present in the hospital
data [17]. Here, we focus on data extracted from elec- rather than being a direct reflection of patient physiology
tronic health records (EHR), which are being increas- [20]. First, EHRs are complex as they may include struc-
ingly deployed in hospitals worldwide. EHRs are used tured and unstructured data; an example of the latter
in hospitals to store longitudinal information of patients is textual information which could require natural lan-
collected in a care delivery setting. Such information guage processing techniques to process [21]. Additionally,
includes patient demographics, vital signs, medications, categorical data, such as diagnostic coding, may adopt
laboratory data, and description of any outcomes that different coding systems across different institutions.
may have occurred to the patient during hospitalization, Another important dimension is data completeness,
or shortly after discharge. which may be defined as “the proportion of observa-
Data extracted from an EHR database can be used to tions that are actually recorded in the system” [22].
develop and evaluate ML models. The dataset is typically Incompleteness of EHRs can be a result of health service
split into a training set and a testing set 1 , either by fragmentation due to inefficient communication following
a random or a nonrandom split based on location or patient transfer among institutions; the recording of
time. According to the Transparent Reporting of a multi- data taking place only during healthcare episodes that
variable prediction model for Individual Prognosis Or correspond to illness, or the increased personalisation of
Diagnosis (TRIPOD) statement, the nonrandom split by attributes per patient [20], [23]. Completeness may also
time is the strongest evaluation technique as it avoids vary across institutions based on adopted protocols.
random variations between the training and testing sets The third challenge is the accuracy of the data, or “the
[18]. During model learning, the training set is used to proportion of recorded observations in the system that
optimize the parameters Θ of the model. The trained are correct” [22]. Errors can occur while clinical staff
model is then evaluated on the held-out test set using observe a patient or record data, and their occurrence
various performance metrics. may be influenced by random and systematic errors
Fig. 3 shows the overall dataset sizes, in terms of such as billing requirements or avoidance of liability
number of patient admissions, reported in studies pub- [20]. The accuracy of EHRs can be assessed by checking
lished in the last decade (arranged in chronological order agreement between different elements within the EHR
from left to right), extracted from EHRs. There is an (such as assigned diagnosis and supplied medications), or
increase of six orders of magnitude between 2008 and by verifying whether values are within expected ranges
2019, which highlights the increased accessibility to EHR [24].
data for research purposes. Most datasets are reported to Finally, it is important to verify whether the data was
1 In clinical studies, the test set is usually termed the validation
recorded within a reasonable period of time [24]. For
set, not to be confused with the portion of the training set used for example, the recorded collection time of vital signs may
ML-oriented tasks, such as hyperparameter selection. precede the time of admission. Although this aspect of
Fig. 2. Visualization of the patient flow in a hospital: Patient is either admitted as an elective or emergency admission, monitored in ward
stay(s) during consultant episode(s). Patient may transfer from one ward to another, or may change the consultant during the in-hospital
stay. * Accident & Emergency patients may be admitted as inpatients or just discharged.

data quality is highly dependent on the efficiency of the from the distributions of the raw data, such as mini-
clinical staff, it also depends on the work flow protocols mum and maximum extremes, moments (mean, standard
adopted at different institutions. Timeliness of data must deviation, and skewness), percentiles or the difference
be assessed to evaluate the chronology of data elements in between two percentiles [25], [45], [32].
relation to admission or discharge decisions, for example Previous research also computed time series features
laboratory results prior to admission may be considered from waveform data [28], [46], [26], [5]. Those features can
as part of subsequent admission, or death within 24 hours be categorized into four types: data adaptive, non-data
of discharge can be considered as in-hospital mortality. adaptive, model-based and data-dictated approaches
This imposes challenges on the usability of the data, [47]. Fourier and wavelet transforms, for instance, decom-
which usually incurs preliminary data pre-processing as pose raw signals into frequency and wavelets respectively.
shown in Fig. 4. The first step is to define an inclusion Time domain, Poincaré nonlinear, cross-correlation anal-
and exclusion criteria to extract the patient cohort of ysis and geometric measures have also been used to
interest. The second step involves setting assumptions investigate variability of vital signs [5], [26].
to aid the analysis of the heterogeneous data, such as Deriving hand-crafted features is a powerful tool in the
defining a minimum length of stay. Finally, meaningful design of ML models and has been used extensively over
features as input variables to the ML model can be the years. However, it is a time-consuming and labor-
extracted using a variety of techniques. intensive process, requires expert knowledge, and may
not scale well to new problems.
III. Feature Extraction
The performance of clinical predictive models relies B. Data Standardization
on the feature representation of the data, as in other ML algorithms require further data preparation steps
domains [44]. As reported in related works, feature to ensure stability of learning. Here, related works reduce
extraction generally involves at least one of domain- the noise, sparsity and irregularity of the clinical data, as
expertise for hand-crafted features (Section III-A), data well as align the scales of the various predictor variables.
standardization (Section III-B), or representation learn- 1) Time-series Modeling: Time-series modeling is
ing (Section III-C). widely used in studies pertaining to early warning models
[29], [40]. It is often used either (i) to infer a pattern
A. Hand-crafted Features of the physiological trajectory or (ii) as an interpolation
Domain expertise is commonly used to provide guid- technique to overcome the sparsity and irregularity of
ance on the design of the data pre-processing pipeline. physiological data.
This involves (i) preliminary feature selection from the Linear dynamic systems have been previously used to
input space, (ii) designing hand-crafted features, and (iii) model physiological variables for ICU monitoring [48]
incorporating prior knowledge of the structure of the and detection of sepsis [49]. Hidden Markov Models
data in the model design. (HMMs) were also used to model health trajectories of
Examples of hand-crafted features in related works are patients [31], [50]. However, such models cannot easily
pulse pressure [38], [26], shock index [25], [34], [38], mean adapt to irregularly sampled vital-sign data. Addition-
arterial pressure [27], [38], oxygen delivery index [34], ally, each hidden state in an HMM only depends on
absolute successive difference of heart rate, estimated the previous state [51]. Another approach for modeling
cardiac output, slope of fitted regression lines, or slope similar data is the kernel-based support vector regression
projections [25]. Statistical measures can be obtained [29].
·107
1.2

0.8

0.6

0.4

0.2

[25][26][27][28][29][30][31][32][33][34][35] [7] [36][37][38] [6] [39][40][41][42][43] [4] [15]

Fig. 3. Dataset sizes reported in the literature in ascending order from left (2008) to right (2019). The vertical axis represents the dataset
size, in terms of the number of patient admissions, and the horizontal axis represents the reference number.

become increasingly popular for further use in classifica-

tion [40] or clustering applications [31], [53].
2) Feature Scaling: Empirical studies show that the
performance of predictive models relies on the statistical
normalisation of the input space [57]. Z-score normal-
isation with zero mean and unit standard deviation is
a widely used tool in feature scaling of numeric clinical
variables [58], [59], [42], [6], [60]. Min-max normalisation
performs a scaling of the feature values to lie within
a range, such as [0,1] in [4]. A rigorous comparison of
the different normalisation techniques in the context of
clinical deterioration does not exist. The current practice
Fig. 4. Clinical outcome prediction models first extract a cohort
is to choose the normalisation technique based on its
of interest based on different characteristics, and then prepare the effect on the performance of the respective classifier. This
data for further feature extraction. presents an opportunity for future research.

C. Representation Learning
One of the most popular techniques for time series Learning a suitable lower-dimensional embedding or
modeling within the clinical domain is Gaussian Process representation of a high-dimensional input space is a
Regression (GPR). GPR is based on a non-parametric fundamental component of ML research [44]. The embed-
stochastic process that offers a probabilistic approach for ding can represent a medical concept [61] or summarize
time-series modeling by providing confidence intervals a patient’s hospital visit [62]. It often performs better
for estimated values at unobserved time instances. A than the raw input for learning subsequent tasks [63],
comprehensive introduction to GPR can be found in [52]. [64], [65]. We now provide an overview of the techniques
Previous studies illustrate the robustness of the single- for obtaining embeddings in related medical applications:
task GPR [29], [53], [54] in modeling a single physiologi- (i) standard dimensionality reduction techniques, (ii)
cal time-series variable. Others focus on multi-task GPR distributed representations used in language modelling,
[55], [40], [35], which learns similarities across several (iii) using embedding layers as part of a larger model, or
time-series data data and models them simultaneously. (iv) through the latent space of autoencoders and their
The use of GPR relies heavily on the choice of the kernel variants. Such compact representations are then further
that encodes prior knowledge of any nonlinear time-series used as inputs for classification or clustering purposes
dynamics that might be hypothesized to exist in the data. (covered in Section IV).
Most recently, neural processes, a class of neural latent 1) Standard Dimensionality Reduction Techniques:
variable models, were also introduced as a probabilistic One of the most popular statistical dimensionality reduc-
regression approach [56], which generalizes GPR through tion techniques is principal component analysis (PCA)
the use of generative models from deep learning. [66]. PCA transforms a set of possibly correlated vari-
Modeling the physiological trajectory of patients has ables to a set of linearly uncorrelated components. It
has been used to extract features for various clinical and on wards, and we expect it to continue to be an
applications [67], [46], [68], such as for the detection active area of research in the near future. The consistent
of hypotensive episodes [26], mortality prediction across use of hand-crafted features over the years indicates its
stroke patients [69], or prediction of hospital readmission effectiveness in training ML models. Additionally, time-
[70]. The main limitation of PCA is that it extracts series modeling may not be widely used as it requires hy-
linear features that may not well represent non-linear re- perparameter tuning and high computational resources.
lationships present in complex clinical data [44]. Another It also limits end-to-end training of the pipeline, since
popular technique is independent component analysis some operations cannot be differentiated for gradient
(ICA) [71], [37], which transforms the variables to a set descent.
of independent components.
2) Distributed Concept Representations: Patient IV. Predictive Inference
records may contain discrete categorical codes, such The extracted features can then used to train an
as diagnosis, medication, or treatment codes. Several outcome prediction model. The task can be posed either
studies [41], [39], [72] propose learning from such as a classification (Section IV-A) or clustering (Section
variables using embedding techniques derived from the IV-B) problem.
distributional hypothesis in semantic modeling. The
distributional hypothesis states that words that appear A. Outcome Classification Framework
in similar contexts in large samples of language data Table II summarizes the different classification models
are semantically similar [73]. The skip-gram algorithm that have been used to predict various clinical outcomes,
learns the co-occurrence of information inside a context as presented in recent papers. Most papers compare
window of a fixed size [74]. It has been used to convert the performance of their models to those of simple ML
medical codes to dense representations in [33], [61], techniques, such as regression [42], [77], which have
[41]. Similar to skip-gram, the Global Vectors (GloVe) been useful statistical techniques long since before the
algorithm was also used to learn the global co-occurrence rise of ML. We also observe that predictions are often
matrix of medical codes [75]. defined within a particular future time-frame, ranging
3) Embedding Layers: Embedding layers can also be from short-term 48 hours prediction windows [4] to 6
integrated as part of a larger model to transform high- months. The varying definitions in the literature of what
dimensional features into a lower-dimensional space. The exactly constitutes an outcome makes it challenging to
embedding can consist of a simple linear transformation compare methods directly. Additionally, some studies
[76], [77] or as a fully-connected (deep) network [4], tend to focus on specific patient subgroups, such as
[76], [72]. One study projected the input into a higher- pediatrics [38].
dimensional space using a convolutional layer [39]. 1) Regression Models: Logistic regression is one of the
4) Autoencoders and their variants: An autoencoder simplest linear classifiers [83] and is often considered as
is a neural network architecture that is often used for a standard benchmark for sophisticated clinical models
dimensionality reduction or feature extraction [78]. It [84]. Previous studies used logistic regression to predict
first transforms the input space to a (typically noise-free) hemodynamic instability [25], imminent mortality [85],
lower-dimensional representation using an encoder, and or the composite outcome of cardiac arrest, unplanned
then reconstructs the input from this compact represen- ICU admission, and mortality [12]. However, logistic
tation. The sparse autoencoder (SAE) enforces a sparsity regression cannot learn non-linear relationships and as-
constraint on the learned representation, and it has been sumes independence across the input variables.
used to learn latent representations of clinical data [30], Decision tree learning involves the stratification of the
[62]. The denoising autoencoder (DAE) reconstructs the feature space based on a criterion defined by informa-
input from a partially corrupted version of the input. The tion theory, such as entropy. One study developed an
stacked DAE, which consists of several autoencoders that early warning score based on decision trees, using seven
are initially pre-trained independently then connected routinely-collected laboratory tests [86], while another
in one network, has also been used for clinical appli- constructed an ensemble model with gradient tree boost-
cations [79], [37], [58], [80]. Another popular variant of ing and adaptive boosting to predict the likelihood of
autoencoders is the variational autoencoder [81], which transfer to pediatric ICU [38]. Despite the high inter-
is a generative model that learns a probabilistic latent pretability of the aforementioned studies, they heavily
space, unlike the previously mentioned discriminative rely on task-specific hand-engineered features and do not
autoencoders. learn complex patterns in the data.
In Table I, we summarize the feature extraction tech- 2) Kernel Methods: Kernel methods rely on a user-
niques in related outcome prediction studies. In terms defined kernel function that estimates the ‘similarity’
of variable selection, we observe that free clinical text between pairs of data [87]. The support vector machine
is the least-used input. That may be due to the limited is a popular example of kernel methods. It projects data
availability of datasets. We also note that representation into a higher-dimensional space and finds the optimal
learning has gained popularity from approximately 2013 discriminatory hyper-planes between classes [88]. The
TABLE I
Overview of feature representation techniques adopted in related works using a variety of predictor variables: vital
signs (VS), laboratory tests (LT), demographic information (DI), diagnostic codes (DC), interventions (INT) such as
procedures and medications, and free text (TEX).

Predictor Variables Feature Representation

Hand-crafted Time-series Representation
Ref Year VS LT DI DC INT TEX
Features Modelling Learning
[25] 2008 X X
[26] 2010 X X X X
[27] 2012 X X X
[28] 2012 X X X
[50] 2012 X X
[29] 2013 X X
[30] 2013 X X X
[32] 2014 X X X
[34] 2015 X X X
[35] 2015 X X X X
[33] 2015 X X X
[7] 2016 X X X X
[37] 2016 X X X X X X
[62] 2016 X X X X
[61] 2016 X X X X
[6] 2017 X X X X X X
[40] 2017 X X X X X X
[41] 2017 X X X X
[82] 2017 X X X
[38] 2018 X X X
[42] 2018 X X X X X X X
[4] 2019 X X X X X X X X

use of support vector machines heavily relies on the Within the context of predicting adverse clinical out-
choice of the kernel and regularization, and they have comes, this can involve creating a ‘dictionary’ or cluster
shown strong performance in recent clinical applications of healthy patients and computing a similarity metric for
[28], [89], [90], [34]. Computing the kernel matrix for a new patient [45], [53], [95]. Popular similarity metrics
all pairs of data may be computationally expensive for are the Kullback-Leibler (KL) divergence [96] and the
large clinical datasets especially when a non-linear kernel Mahalanobis distance [45]. Clustering analysis has also
is used. Further work must investigate approximation been useful for patient phenotyping [30]. The concept
techniques for applications involving large-scale medical of creating patient dictionaries is a subset of novelty
data. detection. An example of such approaches is ‘one-class
3) Deep Learning: Deep learning models are also classification’ [97], [48].
becoming increasingly popular for outcome prediction
V. Performance Evaluation
tasks [91], [7], [5], [27], [40]. The simplest form of
neural networks is the multi-layer perceptron (MLP), The performance of supervised outcome prediction
which consists of fully-connected perceptrons. The main models on the testing set is evaluated using various
limitation of the MLP is its inability to account for statistical methods. Those statistical methods mainly
temporal dependencies. Recurrent neural networks and assess the performance of the model in terms of accuracy
their variants seek to model temporal behaviour through metrics. In recent years, model interpretability has also
feedback connections. Both Long Short Term Memory become an area of interest as it directly reflects how we
(LSTM) networks [92], [93], [40] and Gated Recurrent translate technologies into clinical practice [98].
Units (GRU) [76], [41] were constructed to predict (and A. Performance Metrics
alert in advance of) clinical outcomes. There is also a
Model discrimination refers to the model’s ability in
growing interest in developing ‘end-to-end’ architectures
separating classes of interest. In the context of outcome
that can jointly extract features and perform classifica-
prediction models, we will here refer to patients who
tion [77], [82], [94]. Although deep learning techniques
experience an adverse outcome as the positive class,
are typically characterized by strong performance, their
and those who do not as the negative class. Many ML
decision-making process lacks interpretability.
models are trained to compute the probability of the
positive class, which is then converted to a binary value
B. Clustering for Abnormality Detection
by fixing a decision threshold. The predictions are then
Clustering algorithms are unsupervised learning tech- compared to the true labels and can classified into one of
niques that group data based on similarity measures. four categories: (1) True Positives (TP): model correctly
TABLE II
Overview of classifiers used for outcome prediction in related works.

Model Outcome References

Novelty detection ICU readmission [29]
Hemodynamic instability 2 hours in advance [25]
Logistic regression Gout vs. acute leukaemia [30]
Mortality on the same or next day [85]
Cardiac arrest within 72 hours [28], [34]
Support vector machine
Mortality within 72 hours [28], [35], [31]
Random forest classifier Diseases within one-year interval [37]
Support ensemble boosting Paediatric transfer to ICU [38]
Gaussian process classier Cardiac arrhythmia [46]
ICU transfer and cardiac arrest [7]
Hypotensive episodes [26], [27]
Multi-layer perceptron
Ventricular tachycardia [5]
In-hospital mortality [43]
Congestive heart failure after 6 months [36]
Convolutional neural network COPD after 6 months [36]
Hospital readmission after discharge [82]
Mortality [6]
Recurrent neural networks Acute kidney injury [4]
Diagnosis & medication codes for next visit [33]
Heart failure [41]
Gated recurrent units
Multi-label diagnoses [33]
Long short term memory networks Sepsis at least 4 hours in advance [40]

predicts the positive class, (2) True Negatives (TN): 0.8 implies that the model has good diagnostic ability.
model correctly predicts the negative class, (3) False An AUROC higher than 0.9 means that the model has
Positives (FP): model incorrectly predicts the positive excellent diagnostic ability [100].
class, and (4) False Negatives (FN): model incorrectly Precision, also known as the Positive Predictive Value
predicts the negative class. (PPV), assesses the proportion of correctly predicted
Accuracy, which summarizes the proportion of cor- positive class across all of the true positive class.
rectly classified samples across all samples, is highly bi-
ased when using highly imbalanced datasets. Therefore, TP
other metrics are usually considered. Sensitivity, or the PPV = (4)
TP + FP
True Positive Rate (TPR), assesses the model’s ability
to correctly predict the positive class.
The Precision-Recall curve, where recall is essentially
TP sensitivity, plots the TPR on the horizontal axis and the
TPR = (2)
TP + FN Precision on the vertical axis and integrates the area
Specificity, also known as the True Negative Rate under the curve. The integral under the curve is the
(TNR), assesses the model’s ability to correctly predict Area under Precision-Recall Curve (AUPRC). Unlike the
the negative class. AUROC, the AUPRC and PPV are highly sensitive to
TN class imbalance. Outcome prediction models are gener-
TNR = (3) ally characterized with low AUPRC and PPV values
TN + FP
[101]. Due to low PPV values, such systems should be
The receiving operator characteristic (ROC) curve considered as risk stratifiers rather than predictors [26].
plots the TPR on the vertical axis and (1-TNR), also There are other commonly assessed metrics, such as
known as the False Positive Rate (FPR), on the horizon- the F1-score [102], [91] and the likelihood ratio [103].
tal axis. The integral under the curve is the Area Under Some studies also report the false positives to true
the Receiving Operator Characteristic Curve (AUROC) positives ratio [4] and the inverse of the PPV known as
[99].2 The AUROC assesses the model’s overall diagnos- the work-up-to-detection ratio [104], [42]. The efficiency
tic ability as the decision threshold is varied. An AUROC curve [105], [86] is a qualitative summary that plots
of 0.5 means that the model is making predictions at the number of positives generated at different decision
random in a two-class setting. An AUROC higher than thresholds against the sensitivity of the model. This tool
2 Some studies refer to the AUROC as the ‘concordance-statistic’ is essential to evaluate the trade-off between the total
(C-statistic). number of positives and the number of false positives.
B. Interpretability Additionally, outcome labels are defined based on a
Despite the good performance of recently introduced specific time-window, where the features are associated
ML models, interpretability remains to be a challenge with a positive outcome label only if they are within
for their clinical utility [98]. There are various defini- N hours to an outcome. This creates a strict cut-off
tions of interpretability in existing literature and they where data collected prior to this N -hours window is not
refer to several distinct ideas [106], [107]. Most of these associated with a future outcome. Realistically speaking,
ideas pertaining to the clinical domain revolve around deterioration is likely to develop gradually over time,
trustworthiness of the results and transparency of the yet this is the state-of-the-art approach in developing
model. In the context of this review, we summarize the outcome prediction models within clinical practice. Fu-
efforts of outcome prediction models that considered ture work should consider time-to-event analysis, which
interpretability as a key component of model assessment. focuses on predicting the time until the occurrence of an
outcome, rather than predicting a binary label.
Mimic learning assumes that shallow models, such as
linear models, are interpretable. It aims to identify the B. Personalized Predictive Models
features that are potentially relevant to the prediction. It
Most of the outcome prediction models are devel-
involves first training a deep learning model for a specific
oped and evaluated population-wide and recent improve-
clinical task. It then trains a shallow model, such as
ments show marginal improvements. As more data is
gradient boosting trees, to mimic the behaviour of the
collected per patient, we hypothesize that the predic-
deep learning model [80], [108]. The local interpretable
tive power of such models could improve by develop-
model-agnostic explanation (LIME) [109] generates a
ing patient-specific models, that account for individual-
local explanation of the model behaviour using a shallow
, disease-, and organizational-based factors [113]. On
model. It has been even used to explain ML models for
an individual-level, factors may include demographics,
the prediction of in-hospital mortality [110]. However, it
lifestyle, coexisting medical conditions, or genetic infor-
has also been argued that linear models, rule-based mod-
mation. Disease-related factors may include degree of
els, and decision trees are not intrinsically interpretable
severity, medications and therapy, rate of progression,
[106]. Other post-hoc interpretability techniques such as
interventions, surgeries, and procedures. Organizational-
saliency maps rely on qualitative visual interpretations
factors may include type of hospital, time of the day, staff
commonly used in computer vision applications.
ratio, or staff training. This also motivates the advance-
It is often argued that deep learning models compro-
ment of internet of things in healthcare to enhance the
mise interpretability for high accuracy [111]. Thus, there
collection of integrated data, and would certainly allow
have been recent breakthroughs in developing inherently
us to move forward towards ‘precision medicine’.
interpretable deep learning models instead of perform-
Additionally, in the development of machine learn-
ing post-hoc interpretation [112]. For instance, attention
ing and deep learning models, it is assumed that the
mechanisms are incorporated within deep learning mod-
data samples are independent and identically distributed
els and assign normalised weights to a set of features.
(i.i.d.) random sets. However, this may not be the case
The weights indicate the feature importance for the
in practice, since some data samples may belong to
prediction of a future diagnosis [94], [39], [75] or high
the same patient and spatio-temporal patterns may be
risk vascular diseases [102]. Other works impose non-
indicative of deterioration prior to an outcome.
negativity [62] or sparsity [30] constraints on the learned
embedding space of medical data. C. General Learning Models
VI. Moving Forward Deep neural networks are powerful processing tech-
niques. However, most of the state-of-the-art models
The prediction of clinical outcomes is essential to seek to learn how to predict a specific outcome or a
detect deterioration in a timely manner and to ease particular task, which can generally be referred to as
burden off clinical staff. The development of the ML ‘narrow AI’. While some of the motivation behind using
pipelines and their subsequent performance can also be representation learning has been to learn general patient
improved by accounting for a few considerations. representations in order to perform a variety of predictive
tasks, more work needs to be done into developing
A. Noisy Outcome Labels generalized models that can automatically learn from
To train outcome prediction models, outcome labels heterogeneous EHR data to perform diverse tasks.
are currently being defined based on the occurrence of While recently developed ML models perform well
discrete clinical events. However, such labels may be within retrospective studies, validating their success in
noisy or inaccurate since EHRs only reflect parts of the practice requires prospective analysis. The progress of
hospital experience. For example, while a patient may the field relies on increased multidisciplinary collabo-
experience cardiac arrest, the patient may be on terminal rations between ML research scientists and clinicians.
care pathways with ‘do not resuscitate orders’, and such While it will take time for both parties to speak the same
information may not be present in the available dataset. language, we hope that this review would demystify the
overall ML pipeline and summarize the assumptions and [16] Royal College of Physicians. National Early Warning Score
techniques of the state-of-the-art. (NEWS) 2: Standardising the assessment of acute-illness
severity in the NHS. Technical report, 2017.
[17] Maggie Makar, Marzyeh Ghassemi, David M. Cutler, and
References Ziad Obermeyer. Short-Term Mortality Prediction for El-
derly Patients Using Medicare Claims Data. International
[1] Kun Hsing Yu, Andrew L. Beam, and Isaac S. Kohane. Journal of Machine Learning and Computing, 2015.
Artificial intelligence in healthcare, 2018. [18] Karel G.M. Moons, Douglas G. Altman, Johannes B. Re-
[2] Naveed Afzal, Vishnu Priya, Sunghwan Sohn, Hongfang Liu, itsma, John P.A. Ioannidis, Petra Macaskill, Ewout W.
Rajeev Chaudhry, Christopher G Scott, Iftikhar J Kullo, and Steyenberg, Andrew J. Vickers, David Ransohoff, and
Adelaide M Arruda-olson. International Journal of Medical Gary S. Collins. Transparent Reporting of a multivari-
Informatics Natural language processing of clinical notes able prediction model for Individual Prognosis or Disagnosis
for identi fi cation of critical limb ischemia. International (TRIPOD): Explanantion and Elaboration. Annals of Inter-
Journal of Medical Informatics, 111(September 2017):83–89, nal Medicine, 162(1):W1–W74, 2015.
2018. [19] Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li Wei H.
[3] Varun Gulshan, Lily Peng, Marc Coram, Martin C Stumpe, Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin
Derek Wu, Arunachalam Narayanaswamy, Subhashini Venu- Moody, Peter Szolovits, Leo Anthony Celi, and Roger G.
gopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, Ra- Mark. MIMIC-III, a freely accessible critical care database.
masamy Kim, Rajiv Raman, Philip C Nelson, Jessica L Mega, Scientific Data, 2016.
and Dale R Webster. Development and Validation of a Deep [20] George Hripcsak and David J Albers. Next-generation phe-
Learning Algorithm for Detection of Diabetic Retinopathy notyping of electronic health records. Journal of the Ameri-
in Retinal Fundus Photographs. JAMA : the journal of the can Medical Informatics Association : JAMIA, 20(1):117–21,
American Medical Association, 316(22):2402–2410, 2019. 2013.
[4] Nenad Tomašev, Xavier Glorot, Jack W. Rae, Michal Zielin- [21] Jon D Patrick, Dung H M Nguyen, Yefeng Wang, and Min
ski, Harry Askham, Andre Saraiva, Anne Mottram, Clemens Li. A knowledge discovery and reuse pipeline for information
Meyer, Suman Ravuri, Ivan Protsyuk, Alistair Connell, extraction in clinical notes. Journal of the American Medical
Cían O. Hughes, Alan Karthikesalingam, Julien Cornebise, Informatics Association : JAMIA, 18(5):574–579, 2011.
Hugh Montgomery, Geraint Rees, Chris Laing, Clifton R.
[22] William R Hogan and Michael M Wagner. Accuracy of data
Baker, Kelly Peterson, Ruth Reeves, Demis Hassabis, Do-
in computer-based patient records. Journal of the American
minic King, Mustafa Suleyman, Trevor Back, Christopher
Medical Informatics Association, 4(5):342–355, 1997.
Nielson, Joseph R. Ledsam, and Shakir Mohamed. A clin-
ically applicable approach to continuous prediction of future [23] E. M. Mirkes, T. J. Coats, J. Levesley, and A. N. Gorban.
acute kidney injury. Nature, 572(7767):116–119, 2019. Handling missing data in large healthcare dataset: A case
study of unknown trauma outcomes. Computers in Biology
[5] Hyojeong Lee, Soo-Yong Shin, Myeongsook Seo, Gi-Byoung
and Medicine, 75:203–216, 2016.
Nam, and Segyeong Joo. Prediction of Ventricular Tachy-
cardia One Hour before Occurrence Using Artificial Neural [24] Nicole Gray Weiskopf and Chunhua Weng. Methods and di-
Networks. Scientific Reports, 6(August):32390, 2016. mensions of electronic health record data quality assessment:
[6] M Aczon, D Ledbetter, L Ho, A Gunny, A Flynn, J Williams, enabling reuse for clinical research. Journal of the American
and R Wetzel. Dynamic Mortality Risk Predictions in Pedi- Medical Informatics Association : JAMIA, 20:144–151, 2012.
atric Critical Care Using Recurrent Neural Networks. arXiv, [25] Hanqing Cao, Larry Eshelman, Nicolas Chbat, Larry Nielsen,
pages 1–18, 2017. Brian Gross, and Mohammed Saeed. Predicting ICU hemo-
[7] Scott B. Hu, Deborah J L Wong, Aditi Correa, Ning Li, dynamic instability using continuous multiparameter trends.
and Jane C. Deng. Prediction of clinical deterioration in In Conference proceedings : ... Annual International Con-
hospitalized adult patients with hematologic malignancies ference of the IEEE Engineering in Medicine and Biology
using a neural network model. PLoS ONE, 11(8):1–12, 2016. Society. IEEE Engineering in Medicine and Biology Society.
[8] Sebastian Ruder. An overview of gradient descent optimiza- Annual Conference, volume 2008, pages 3803–6, 2008.
tion algorithms. 2016. [26] Joon Lee and Roger G Mark. An investigation of patterns
[9] M. E.Beth Smith, Joseph C. Chiovaro, Maya O’Neil, Devan in hemodynamic data indicative of impending hypotension
Kansagara, Ana R. Quiñones, Michele Freeman, Makala- in intensive care. BioMedical Engineering OnLine, 9(1):62,
pua L. Motu’apuaka, and Christopher G. Slatore. Early 2010.
warning system scores for clinical deterioration in hospital- [27] Rob Donald, Tim Howells, Ian Piper, I. Chambers, G. Cite-
ized patients: A systematic review. Annals of the American rio, P. Enblad, B. Gregson, K. Kiening, J. Mattern, P. Nils-
Thoracic Society, 11(9):1454–1465, 2014. son, A. Ragauskas, Juan Sahuquillo, R. Sinnot, and A. Stell.
[10] Lucian L Leape, Troyen A Brennan, Nan Laird, Ann G Early Warning of EUSIG-Defined Hypotensive Events Using
Lawthers, Russel Localio, Benjamin A Barnes, Leisi Herbert, a Bayesian Artificial Neural Network Article. Acta Neu-
Joseph P Newhouse, Paul C Weiler, and Howard Hiatt. The rochirurgica Supplementum, 114(January 2012):87–91, 2012.
Nature of Adverse Events in Hospitalized Patients: Results of [28] Marcus Eng Hock Ong, Christina Hui Lee Ng, Ken Goh,
the Harvard MEdical Practice Study II. The New England Nan Liu, Zhi Xiong Koh, Nur Shahidah, Tong Tong Zhang,
Journal of Medicine, 324(6):377–384, 1991. Stephanie Fook-Chong, and Zhiping Lin. Prediction of car-
[11] Daryl Jones, Imogen Mitchell, Ken Hillman, and David Story. diac arrest in critically ill patients presenting to the emer-
Defining clinical deterioration. Resuscitation, 84(8):1029– gency department using a machine learning score incorporat-
1034, 2013. ing heart rate variability compared with the modified early
[12] Matthew M. Churpek, Trevor C. Yuen, and Dana P. Edelson. warning score. Critical care (London, England), 16(3):R108,
Predicting clinical deterioration in the hospital: The impact 2012.
of outcome selection. Resuscitation, 84(5):564–568, 2013. [29] David A Clifton and Marco Pimentel. Gaussian Processes
[13] G Neale, M Woloshynowych, and C Vincent. Exploring the for Personalized e-Health Monitoring With Wearable Sensors
causes of adverse events in NHS hospital practice. Journal Gaussian Processes for Personalized e-Health Monitoring
of the Royal Society of Medicine, 94(7):322–30, 2001. With Wearable Sensors. IEEE Transactions on Biomedical
[14] Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Engineering, 60(March 2013):193–197, 2013.
Yan Liu. Benchmark of Deep Learning Models on Large [30] Thomas A. Lasko, Joshua C. Denny, and Mia A. Levy. Com-
Healthcare MIMIC Datasets. 2017. putational Phenotype Discovery Using Unsupervised Feature
[15] Farah E. Shamout, Tingting Zhu, Pulkit Sharma, Peter J. Learning over Noisy, Sparse, and Irregular Clinical Data.
Watkinson, and David A. Clifton. Deep Interpretable Early PLoS ONE, 8(6), 2013.
Warning System for the Detection of Clinical Deterioration. [31] Shima Ghassempour, Federico Girosi, and Anthony Maeder.
IEEE Journal of Biomedical and Health Informatics, 2019. Clustering multivariate time series using Hidden Markov
Models. International Journal of Environmental Research Applied to the Detection of Sepsis in Neonatal Condition
and Public Health, 11(3):2741–2763, 2014. Monitoring. UAI’14 Proceedings of the Thirtieth Confer-
[32] Marzyeh Ghassemi, Tristan Naumann, Finale Doshi-Velez, ence on Uncertainty in Artificial Intelligence, pages 752–761,
Nicole Brimmer, Rohit Joshi, Anna Rumshisky, and Peter 2014.
Szolovits. Unfolding Physiological State: Mortality Modelling [50] Li Wei H. Lehman, Shamim Nemati, Ryan P. Adams, and
in Intensive Care Units. Bone, 23(1):1–7, 2014. Roger G. Mark. Discovering shared dynamics in physiological
[33] Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, signals: Application to patient monitoring in ICU. Proceed-
Walter F. Stewart, and Jimeng Sun. Doctor AI: Predicting ings of the Annual International Conference of the IEEE
Clinical Events via Recurrent Neural Networks. In Machine Engineering in Medicine and Biology Society, EMBS, pages
Learning for Healthcare Conference, pages 301–318, 2015. 5939–5942, 2012.
[34] Curtis E Kennedy, Noriaki Aoki, Michele Mariscalco, and [51] Zachary C. Lipton, John Berkowitz, and Charles Elkan. A
James P Turley. Using Time Series Analysis to Predict Critical Review of Recurrent Neural Networks for Sequence
Cardiac Arrest in a PICU. Pediatric critical care medicine Learning. 2015.
: a journal of the Society of Critical Care Medicine and the [52] Rasmussen and Williams. Gaussian Processes for Machine
World Federation of Pediatric Intensive and Critical Care Learning. The MIT Press, 2006.
Societies, 16(9):332–9, 2015. [53] Marco A.F. Pimentel, David A. Clifton, and Lionel
[35] Marzyeh Ghassemi, Tristan Naumann, Thomas Brennan, Tarassenko. Gaussian process clustering for the functional
David a Clifton, and Peter Szolovits. A Multivariate Time- characterisation of vital-sign trajectories. In IEEE Interna-
series Modeling Approach to Severity of Illness Assessment tional Workshop on Machine Learning for Signal Processing,
and Forecasting in ICU with Sparse, Heterogeneous Clinical MLSP, 2013.
Data. In Proceedings of the Twenty-Ninth AAAI Conference [54] Glen Wright Colopy, Stephen J. Roberts, and David A.
on Artificial Intelligence, pages 446–453, 2015. Clifton. Gaussian Processes for Personalized Interpretable
[36] Yu Cheng, Fei Wang, Ping Zhang, and Jianying Hu. Risk Volatility Metrics in the Step-Down Ward. IEEE Journal of
Prediction with Electronic Health Records: A Deep Learning Biomedical and Health Informatics, 2019.
Approach. In Proceedings of the 2016 SIAM International [55] Robert Dürichen, Marco A F Pimentel, Lei Clifton, Achim
Conference on Data Mining. Society for Industrial and Ap- Schweikard, and David A. Clifton. Multitask Gaussian
plied Mathematics, 2016., pages 432–440, 2016. processes for multivariate physiological time-series analysis.
[37] Riccardo Miotto, Li Li, Brian A. Kidd, and Joel T. Dudley. IEEE Transactions on Biomedical Engineering, 62(1):314–
Deep Patient: An Unsupervised Representation to Predict 322, 2015.
the Future of Patients from the Electronic Health Records. [56] Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio
Scientific reports, 6(April):26094, 2016. Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye
[38] Jonathan Rubin, Cristhian Potes, Minnan Xu-Wilson, Junzi Teh. Neural Processes. 2018.
Dong, Asif Rahman, Hiep Nguyen, and David Moromisato. [57] T Jayalakshmi and A. Santhakumaran. Statistical Normal-
An ensemble boosting model for predicting transfer to the pe- ization and Back Propagationfor Classification. International
diatric intensive care unit. International Journal of Medical Journal of Computer Theory and Engineering, 3(1):89–93,
Informatics, 112(January):15–20, 2018. 2011.
[39] Huan Song, Deepta Rajan, Jayaraman J. Thiagarajan, and
[58] Patrick Schwab, Gaetano Scebba, Jia Zhang, Marco Delai,
Andreas Spanias. Attend and Diagnose: Clinical Time Series
and Walter Karlen. Beat by Beat: Classifying Cardiac
Analysis using Attention Models. arXiv, 2017.
Arrhythmias with Recurrent Neural Networks. arXiv, 2017.
[40] Joseph Futoma, Sanjay Hariharan, and Katherine Heller.
[59] Harini Suresh, Nathan Hunt, Alistair Johnson, Leo Anthony
Learning to Detect Sepsis with a Multitask Gaussian Process
Celi, Peter Szolovits, and Marzyeh Ghassemi. Clinical Inter-
RNN Classifier. In Proceedings of the 34th International
vention Prediction and Understanding using Deep Networks.
Conference on Machine Learning, 2017.
arXiv, pages 1–16, 2017.
[41] Edward Choi, Andy Schuetz, Walter F. Stewart, and Jimeng
Sun. Using recurrent neural network models for early detec- [60] Narges Razavian, Jake Marcus, and David Sontag. Multi-
tion of heart failure onset. Journal of the American Medical task Prediction of Disease Onsets from Longitudinal Lab
Informatics Association, 24(2):361–370, 2017. Tests. In Proceedings of the 1st Machine Learning for
[42] Alvin Rajkomar and Others. Scalable and accurate deep Healthcare Conference, pages 1–27, 2016.
learning for electronic health records. Nature Digital [61] Youngduck Choi, Chill Yi-i Chiu Ms, and David Sontag.
Medicine, 1(1):1–10, 2018. Learning Low-Dimensional Representations of Medical Con-
[43] Joon myoung Kwon, Youngnam Lee, Yeha Lee, Seungwoo cepts. AMIA Joint Summits on Translational Science pro-
Lee, Hyunho Park, and Jinsik Park. Validation of deep- ceedings, pages 41–50, 2016.
learning-based triage and acuity score using a large national [62] Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles,
dataset. PLoS ONE, 2018. Catherine Coffey, Michael Thompson, James Bost, Javier
[44] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Rep- Tejedor-Sojo, and Jimeng Sun. Multi-layer Representation
resentation learning: A review and new perspectives. IEEE Learning for Medical Concepts. Proceedings of the 22nd ACM
Transactions on Pattern Analysis and Machine Intelligence, SIGKDD International Conference on Knowledge Discovery
35(8):1798–1828, 2013. and Data Mining - KDD ’16, pages 1495–1504, 2016.
[45] Jimeng Sun, Fei Wang, Jianying Hu, and Shahram Ed- [63] Hugo Larochelle, Yoshua Bengio, Jérôme Louradour, and
abollahi. Clustering Overly-Specific Features in Electronic Pascal Lamblin. Exploring strategies for training deep neural
Medical Records. ACM SIGKDD Explorations Newsletter, networks. Journal of Machine Learning Research, 2009.
14(1):16, 2012. [64] George E. Dahl, Dong Yu, Li Deng, and Alex Acero.
[46] G. Skolidis, R. H. Clayton, and G. Sanguinetti. Automatic Context-dependent pre-trained deep neural networks for
Classification of Arrhythmic Beats Using Gaussian Processes. large-vocabulary speech recognition. IEEE Transactions on
Computers in Cardiology, 35:921–924, 2008. Audio, Speech and Language Processing, 2012.
[47] Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, and Teh [65] MD. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin,
Ying Wah. Time-series clustering - A decade review. In- and Hamid Laga. A Comprehensive Survey of Deep Learning
formation Systems, 53(October 2016):16–38, 2015. for Image Captioning. ACM Computing Surveys, 2019.
[48] John A. Quinn, Christopher K.I. Williams, and Neil McIn- [66] Lindsay I Smith. A tutorial on Principal Components Anal-
tosh. Factorial switching linear dynamical systems applied ysis Introduction. Statistics, 2002.
to physiological condition monitoring. IEEE Transactions [67] Paul Sajda. Machine Learning for Detection and Diagnosis
on Pattern Analysis and Machine Intelligence, 31(9):1537– of Disease. Annual Review of Biomedical Engineering, 8:537–
1551, 2009. 65, 2006.
[49] Ioan Stanculescu, Christopher K I Williams, and Yvonne [68] Hayden Wimmer and Loreen Powell. Principle Component
Freer. A Hierarchical Switching Linear Dynamical System Analysis for Feature Reduction and Data Preprocessing in
Data Science. In Proceedings of the Conference on Informa- [88] Christopher J.C. Burges. A tutorial on support vector ma-
tion Systems Applied Research, pages 1–6, 2016. chines for pattern recognition. Data Mining and Knowledge
[69] Songhee Cheon, Jungyoon Kim, and Jihye Lim. The Use of Discovery, 1998.
Deep Learning to Predict Stroke Patient Mortality. Inter- [89] Anneleen Daemen, Dirk Timmerman, Thierry Van den
national journal of environmental research and public health, Bosch, Cecilia Bottomley, Emma Kirk, Caroline Van Hols-
16(11), 2019. beke, Lil Valentin, Tom Bourne, and Bart De Moor. Im-
[70] Denis Krompaß, Cristóbal Esteban, Volker Tresp, Martin proved modeling of clinical data with kernel methods. Arti-
Sedlmayr, and Thomas Ganslandt. Exploiting Latent Em- ficial Intelligence in Medicine, 54(2):103–114, 2012.
beddings of Nominal Clinical Data for Predicting Hospital [90] Yukun Chen, Robert J Carroll, Eugenia R McPeek Hinz,
Readmission. KI - Künstliche Intelligenz, 29(2):153–159, Anushi Shah, Anne E Eyler, Joshua C Denny, and Hua Xu.
2015. Applying active learning to high-throughput phenotyping
[71] A. Hyvärinen and E. Oja. Independent component analysis: algorithms for electronic health records data. Journal of
Algorithms and applications. Neural Networks, 2000. the American Medical Informatics Association : JAMIA,
[72] Cristóbal Esteban, Oliver Staeck, Yinchong Yang, and Volker 20(e2):253–9, 2013.
Tresp. Predicting Clinical Events by Combining Static and [91] Benjamin Shickel, Patrick James Tighe, Azra Bihorac, and
Dynamic Information Using Recurrent Neural Networks. In Parisa Rashidi. Deep EHR: A Survey of Recent Advances
IEEE International Conference on Healthcare Informatics in Deep Learning Techniques for Electronic Health Record
(ICHI), pages 93–101, 2016. (EHR) Analysis. IEEE Journal of Biomedical and Health
[73] Magnus Sahlgren. The distributional hypothesis. Italian Informatics, pages 1–16, 2017.
Journal of Linguistics, 20(1):33–53, 2008. [92] Sepp Hochreiter and J Urgen Schmidhuber. Long Short-Term
[74] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Memory. Neural Computation, 9(8):1735–1780, 1997.
Distributed Representations of Words and Phrases and their [93] Zachary C. Lipton, David C. Kale, Charles Elkan, and Ran-
Compositionality arXiv : 1310 . 4546v1 [ cs . CL ] 16 Oct dall Wetzel. Learning to Diagnose with LSTM Recurrent
2013. arXiv preprint arXiv:1310.4546, 2013. Neural Networks. In Proceedings of ICLR 2016, pages 1–18,
[75] Edward Choi, Mohammad Taha Bahadori, Le Song, Wal- 2015.
ter F. Stewart, and Jimeng Sun. GRAM: Graph-based [94] Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You,
Attention Model for Healthcare Representation Learning. Tong Sun, and Jing Gao. Dipole: Diagnosis Prediction
arXiv, pages 1–15, 2016. in Healthcare via Attention-based Bidirectional Recurrent
[76] Cristóbal Esteban, Danilo Schmidt, Denis Krompaß, and Neural Networks. In Proceedings of the 23rd ACM SIGKDD
Volker Tresp. Predicting sequences of clinical events by using International Conference on Knowledge Discovery and Data
a personalized temporal latent embedding model. Proceed- Mining, 2017.
ings - 2015 IEEE International Conference on Healthcare [95] Tingting Zhu, Glen Wright Colopy, Clare MacEwen, Kather-
Informatics, ICHI 2015, pages 130–139, 2015. ine Niehaus, Yang Yang, Chris W. Pugh, and David A.
[77] Edward Choi, Mohammad Taha Bahadori, Joshua A. Kulas, Clifton. Patient-Specific Physiological Monitoring and Pre-
Andy Schuetz, Walter F. Stewart, and Jimeng Sun. RETAIN: diction Using Structured Gaussian Processes. IEEE Access,
An Interpretable Predictive Model for Healthcare using Re- 7:58094–58103, 2019.
verse Time Attention Mechanism. In NIPS Proceedings, [96] S. Kullback and R. A. Leibler. On Information and Suffi-
2016. ciency. The Annals of Mathematical Statistics, 2007.
[78] Aaron Courville Ian Goodfellow, Yoshua Bengio. Deep
[97] Marco A.F. Pimentel, David A. Clifton, Lei Clifton, and
Learning Book. Deep Learning, 2015.
Lionel Tarassenko. A review of novelty detection. Signal
[79] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua
Processing, 99:215–249, 2014.
Bengio, and Pierre Antoine Manzagol. Stacked denoising
[98] Muhammad Aurangzeb Ahmad, Ankur Teredesai, and Carly
autoencoders: Learning Useful Representations in a Deep
Eckert. Interpretable machine learning in healthcare. In Pro-
Network with a Local Denoising Criterion. Journal of Ma-
ceedings - 2018 IEEE International Conference on Healthcare
chine Learning Research, 2010.
Informatics, ICHI 2018, 2018.
[80] Zhengping Che, Sanjay Purushotham, Robinder Khemani,
and Yan Liu. Distilling Knowledge from Deep Networks with [99] Tom Fawcett. An introduction to ROC analysis. Pattern
Applications to Healthcare Domain. 2015. Recognition Letters, 27(8):861–874, 2006.
[81] Carl Doersch. Tutorial on Variational Autoencoders. 2016. [100] Gary B. Smith, David R. Prytherch, Paul E. Schmidt, and
[82] Phuoc Nguyen, Truyen Tran, Nilmini Wickramasinghe, and Peter I. Featherstone. Review and performance evaluation
Svetha Venkatesh. Deepr: A Convolutional Net for Medical of aggregate weighted ’track and trigger’ systems. Resusci-
Records. IEEE Journal of Biomedical and Health Informat- tation, 77(2):170–179, 2008.
ics, 21(1):22–30, 2017. [101] Peter J. Watkinson, Marco A.F. Pimentel, David A. Clifton,
[83] A. J. Scott, D. W. Hosmer, and S. Lemeshow. Applied and Lionel Tarassenko. Manual centile-based early warning
Logistic Regression. Biometrics, 2006. scores derived from statistical distributions of observational
[84] Evangelia Christodoulou, Jie Ma, Gary S. Collins, Ewout W. vital-sign data. Resuscitation, 129(June):55–60, 2018.
Steyerberg, Jan Y. Verbakel, and Ben Van Calster. A [102] You Jin Kim, Yun-Geun Lee, Jeong Whun Kim, Jin Joo Park,
systematic review shows no performance benefit of machine Borim Ryu, and Jung-Woo Ha. High Risk Prediction from
learning over logistic regression for clinical prediction models. Electronic Medical Records via Deep Attention Networks. In
Journal of Clinical Epidemiology, 110:12–22, 2019. NIPS Proceedings, 2017.
[85] Elsa Loekito, James Bailey, Rinaldo Bellomo, Graeme K. [103] Marko Hoikka, Tom Silfvast, and Tero I. Ala-Kokko. Does the
Hart, Colin Hegarty, Peter Davey, Christopher Bain, David prehospital National Early Warning Score predict the short-
Pilcher, and Hans Schneider. Common laboratory tests term mortality of unselected emergency patients? Scandi-
predict imminent death in ward patients. Resuscitation, navian Journal of Trauma, Resuscitation and Emergency
84(3):280–285, 2013. Medicine, 2018.
[86] Stuart W. Jarvis, Caroline Kovacs, Tessy Badriyah, Jim [104] Santiago Romero-Brufau, Jeanne M. Huddleston, Gabriel J.
Briggs, Mohammed A. Mohammed, Paul Meredith, Paul E. Escobar, and Mark Liebow. Why the C-statistic is not infor-
Schmidt, Peter I. Featherstone, David R. Prytherch, and mative to evaluate early warning scores and what metrics to
Gary B. Smith. Development and validation of a decision tree use. Critical Care, 19(1):19–24, 2015.
early warning score based on routine laboratory test results [105] David R. Prytherch, Gary B. Smith, Paul E. Schmidt, and
for the discrimination of hospital mortality in emergency Peter I. Featherstone. ViEWS-Towards a national early
medical admissions. Resuscitation, 84(11):1494–1499, 2013. warning score for detecting adult inpatient deterioration.
[87] Thomas Hofmann, Bernhard Schölkopf, and Alexander J. Resuscitation, 81(8):932–937, 2010.
Smola. Kernel methods in machine learning. Annals of [106] Zachary C. Lipton. The mythos of model interpretability.
Statistics, 36(3):1171–1220, 2008. Communications of the ACM, 61(10):35–43, 2018.
[107] Finale Doshi-Velez and Been Kim. Towards A Rigorous
Science of Interpretable Machine Learning. 2017.
[108] Zhengping Che, Sanjay Purushotham, Robinder Khemani,
and Yan Liu. Interpretable Deep Models for ICU Outcome
Prediction. AMIA ... Annual Symposium proceedings. AMIA
Symposium, 2016:371–380, 2016.
[109] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin.
"Why should i trust you?" Explaining the predictions of any
classifier. In Proceedings of the ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 2016.
[110] Shane Nanayakkara, Sam Fogarty, Michael Tremeer, Kelvin
Ross, Brent Richards, Christoph Bergmeir, Sheng Xu, Dion
Stub, Karen Smith, Mark Tacey, Danny Liew, David Pilcher,
and David M. Kaye. Characterising risk of in-hospital
mortality following cardiac arrest using machine learning: A
retrospective international registry study. PLoS Medicine,
15(11):1–16, 2018.
[111] Yin Lou, Rich Caruana, and Johannes Gehrke. Intelligible
models for classification and regression. Proceedings of the
ACM SIGKDD International Conference on Knowledge Dis-
covery and Data Mining, pages 150–158, 2012.
[112] Cynthia Rudin. Stop explaining black box machine learning
models for high stakes decisions and use interpretable models
instead. Nature Machine Intelligence, 2019.
[113] Daryl Jones, Imogen Mitchell, Ken Hillman, and David Story.
Defining clinical deterioration. Resuscitation, 84(8):1029–
1034, 2013.

DR APJ Abdul Kalam
100% (3)
DR APJ Abdul Kalam
1 page
Unit 14 Automated Assembly Systems
No ratings yet
Unit 14 Automated Assembly Systems
23 pages
The Chemistry of The Colorful Fire
No ratings yet
The Chemistry of The Colorful Fire
9 pages
Budget Management Thesis
100% (3)
Budget Management Thesis
5 pages
Predictive Analytics Healthcare Clinical Practice
No ratings yet
Predictive Analytics Healthcare Clinical Practice
3 pages
Thesis Presentation
No ratings yet
Thesis Presentation
35 pages
EOA 61 CMP User Manual
50% (2)
EOA 61 CMP User Manual
21 pages
En ENBSP SDK EN eNBSP SDK Programmer's GuideProgrammer's Guide
No ratings yet
En ENBSP SDK EN eNBSP SDK Programmer's GuideProgrammer's Guide
56 pages
Corretion in Atwoods Machine-Physics Teacher
No ratings yet
Corretion in Atwoods Machine-Physics Teacher
4 pages
YKK
50% (2)
YKK
18 pages
MSC Indonesia Communication Matrix - 2024_2
No ratings yet
MSC Indonesia Communication Matrix - 2024_2
6 pages
Mortality Prediction Ner Multi Mod Al
No ratings yet
Mortality Prediction Ner Multi Mod Al
6 pages
Opportunities in Machine Learning For Healthcare: Preprint. Under Review
No ratings yet
Opportunities in Machine Learning For Healthcare: Preprint. Under Review
16 pages
Emergency Department Triage Prediction Ofclinical Outcomes Using Machine Learning Models PDF
No ratings yet
Emergency Department Triage Prediction Ofclinical Outcomes Using Machine Learning Models PDF
13 pages
rowley2002
No ratings yet
rowley2002
8 pages
Informatics 07 00025 v2
No ratings yet
Informatics 07 00025 v2
18 pages
sdbbs-purohit_dakshina_2023_members17sep24
No ratings yet
sdbbs-purohit_dakshina_2023_members17sep24
5 pages
Aer.p20171084 2
No ratings yet
Aer.p20171084 2
5 pages
Machine Learning For Clinical Outcome Prediction
No ratings yet
Machine Learning For Clinical Outcome Prediction
11 pages
Patient Mortality Prediction Using Machine Learning and Artificial
No ratings yet
Patient Mortality Prediction Using Machine Learning and Artificial
7 pages
atm-07-23-796
No ratings yet
atm-07-23-796
96 pages
1 s2.0 S1532046420301179 Main
No ratings yet
1 s2.0 S1532046420301179 Main
7 pages
d41586-025-01946-8
No ratings yet
d41586-025-01946-8
3 pages
5427-Article Text-8652-1-10-20200511
No ratings yet
5427-Article Text-8652-1-10-20200511
8 pages
s41746-024-01235-0
No ratings yet
s41746-024-01235-0
10 pages
Httpspdf.sciencedirectassets.com2711611 s2.0 S1386505623X000571 s2.0 S1386505623001028main.pdfx Amz Security Token=IQoJb
No ratings yet
Httpspdf.sciencedirectassets.com2711611 s2.0 S1386505623X000571 s2.0 S1386505623001028main.pdfx Amz Security Token=IQoJb
14 pages
Medical Coding With Clinical Notes
No ratings yet
Medical Coding With Clinical Notes
13 pages
33932803
No ratings yet
33932803
7 pages
Extracting Diagnosis Pathways From Electronic Health Records 2t6rainp
No ratings yet
Extracting Diagnosis Pathways From Electronic Health Records 2t6rainp
17 pages
Using Electronic Health Records to Facilitate Precisio 2024 Biological Psych
No ratings yet
Using Electronic Health Records to Facilitate Precisio 2024 Biological Psych
11 pages
AI-Powered Early Warning Systems for Clinical
No ratings yet
AI-Powered Early Warning Systems for Clinical
8 pages
NKDHS-CoMET-2311_01_250213_101034
No ratings yet
NKDHS-CoMET-2311_01_250213_101034
7 pages
Validation of Prediction Models For Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data
No ratings yet
Validation of Prediction Models For Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data
12 pages
Ayurveda 2
No ratings yet
Ayurveda 2
33 pages
Artigo Métodos ML Saúde
No ratings yet
Artigo Métodos ML Saúde
18 pages
Biotechnology
No ratings yet
Biotechnology
57 pages
Clinical Intervention Prediction
No ratings yet
Clinical Intervention Prediction
16 pages
Mortality Prediction Analysis
No ratings yet
Mortality Prediction Analysis
7 pages
Deep Learning Thesis - Es.en
No ratings yet
Deep Learning Thesis - Es.en
45 pages
nejmp1702071
No ratings yet
nejmp1702071
3 pages
Patient Centred Variables With Univariateassociations With Unplanned ICU Admissiona Systematic Review
No ratings yet
Patient Centred Variables With Univariateassociations With Unplanned ICU Admissiona Systematic Review
9 pages
(Velho)Interpretable-risk-models-for-Sleep-Apnea-and-Coronary_2022_Expert-Systems-w
No ratings yet
(Velho)Interpretable-risk-models-for-Sleep-Apnea-and-Coronary_2022_Expert-Systems-w
9 pages
Opportunities in Machine Learning For Healthcare: Preprint. Under Review
No ratings yet
Opportunities in Machine Learning For Healthcare: Preprint. Under Review
16 pages
I - H C D P M: Dvancing N Ospital Linical Eterioration Rediction Odels
No ratings yet
I - H C D P M: Dvancing N Ospital Linical Eterioration Rediction Odels
16 pages
Vibration Refers To Mechanical Oscillations About An Equilibrium Point
No ratings yet
Vibration Refers To Mechanical Oscillations About An Equilibrium Point
5 pages
ICMLWS
No ratings yet
ICMLWS
10 pages
An Optimized Machine Learning Model Accurately Predicts In-Hospital Outcomes at Admission To A Cardiac Unit
No ratings yet
An Optimized Machine Learning Model Accurately Predicts In-Hospital Outcomes at Admission To A Cardiac Unit
14 pages
DC Motors
No ratings yet
DC Motors
3 pages
CMRP0218
No ratings yet
CMRP0218
14 pages
Chapter 6: MEASUREMENT: Conceptual Framework in Financial Reporting
No ratings yet
Chapter 6: MEASUREMENT: Conceptual Framework in Financial Reporting
24 pages
UNIT 5 HEALTHCARE ANALYTICS GPT O4 REASONING
No ratings yet
UNIT 5 HEALTHCARE ANALYTICS GPT O4 REASONING
29 pages
Explainable ML framework for Lung cancer
No ratings yet
Explainable ML framework for Lung cancer
10 pages
1-s2.0-S235264832500042X-main
No ratings yet
1-s2.0-S235264832500042X-main
15 pages
Unit 5 notes
No ratings yet
Unit 5 notes
17 pages
Hca Unit - 3 Answers
No ratings yet
Hca Unit - 3 Answers
19 pages
Scalable and accurate deep learning for electronic health
No ratings yet
Scalable and accurate deep learning for electronic health
30 pages
Multitask Learning and Benchmarking With Clinical Time Series Data
No ratings yet
Multitask Learning and Benchmarking With Clinical Time Series Data
18 pages
Elucidating Discrepancy in Explanations of
No ratings yet
Elucidating Discrepancy in Explanations of
5 pages
_82
No ratings yet
_82
8 pages
Clairvoyance A Pipeline Toolkit for Medical Time Series Author Daniel Jarrett,Jinsung Yoon,Ioana Bica
No ratings yet
Clairvoyance A Pipeline Toolkit for Medical Time Series Author Daniel Jarrett,Jinsung Yoon,Ioana Bica
16 pages
3531326
No ratings yet
3531326
29 pages
Predicting Inpatient Flows
No ratings yet
Predicting Inpatient Flows
43 pages
Healthcare Predictive Analytics Using Machine Learning and Deep Learning Techniques: A Survey
No ratings yet
Healthcare Predictive Analytics Using Machine Learning and Deep Learning Techniques: A Survey
45 pages
Manufacturing Process I 19MEE202 - Bulk Metal Forming
No ratings yet
Manufacturing Process I 19MEE202 - Bulk Metal Forming
88 pages
Combining Structured and Unstructured Data For Predictive Models: A Deep Learning Approach
No ratings yet
Combining Structured and Unstructured Data For Predictive Models: A Deep Learning Approach
11 pages
Python Model
No ratings yet
Python Model
26 pages
Survey A11
No ratings yet
Survey A11
22 pages
A Study on Predictive Algorithms in Heal
No ratings yet
A Study on Predictive Algorithms in Heal
7 pages
Medical Condition Diagnosis Through The Application of Machine Learning To Ehrs
No ratings yet
Medical Condition Diagnosis Through The Application of Machine Learning To Ehrs
6 pages
01 WS3 Henke 20170328 Using AI To Prevent Healthcare Errors From Occuring
No ratings yet
01 WS3 Henke 20170328 Using AI To Prevent Healthcare Errors From Occuring
26 pages
Profiling-the-fraudster-Think-like-a-thief-to-catch-a-thief-by-Ms-Angela-Ngava
No ratings yet
Profiling-the-fraudster-Think-like-a-thief-to-catch-a-thief-by-Ms-Angela-Ngava
29 pages
Expert-Augmented Machine Learning: Significance
No ratings yet
Expert-Augmented Machine Learning: Significance
7 pages
Jinal: Types of Papad Available With Lajawab Papad
No ratings yet
Jinal: Types of Papad Available With Lajawab Papad
6 pages
The New Income Tax Chapter 13: COLLECTION of Tax: Print
No ratings yet
The New Income Tax Chapter 13: COLLECTION of Tax: Print
4 pages
Gyokov Solutions - G-NetLook For Android OS
No ratings yet
Gyokov Solutions - G-NetLook For Android OS
4 pages
Celex 32016L1629 en TXT
No ratings yet
Celex 32016L1629 en TXT
59 pages
predictive health analytics
No ratings yet
predictive health analytics
47 pages
Notes Natural Vegetation
No ratings yet
Notes Natural Vegetation
3 pages
Korean Wave
No ratings yet
Korean Wave
12 pages
Leveraging Clinical Time-Series Data For Prediction: A Cautionary Tale
No ratings yet
Leveraging Clinical Time-Series Data For Prediction: A Cautionary Tale
10 pages
Dharmshala McLeodganj 5 Days 4 Nights R1
No ratings yet
Dharmshala McLeodganj 5 Days 4 Nights R1
6 pages
Markov Models in Medical Decision Making - A Practical Guide - Med Decis Making-1993-Sonnenberg-322-38
No ratings yet
Markov Models in Medical Decision Making - A Practical Guide - Med Decis Making-1993-Sonnenberg-322-38
17 pages
LTM190M2 L31 PDF
No ratings yet
LTM190M2 L31 PDF
39 pages
Handout - DRRM CC Terminologies
No ratings yet
Handout - DRRM CC Terminologies
3 pages
Prayanam V1.0
No ratings yet
Prayanam V1.0
21 pages
Latest Seminar Report Yash Ingole
No ratings yet
Latest Seminar Report Yash Ingole
35 pages
1 3 Reading Interpreting and Applying Specifications and Manual
100% (1)
1 3 Reading Interpreting and Applying Specifications and Manual
17 pages
Is 302 2 6 2009 PDF
No ratings yet
Is 302 2 6 2009 PDF
25 pages
Arterial hypertension in clinical practice: study and analysis of biotechnological and telemedicine models
From Everand
Arterial hypertension in clinical practice: study and analysis of biotechnological and telemedicine models
Michele Karaboue
No ratings yet
Clinical Trial Management – an Overview
From Everand
Clinical Trial Management – an Overview
Editor IJSMI
No ratings yet

2020 Rbme Fs

Uploaded by

2020 Rbme Fs

Uploaded by

Machine Learning for Clinical Outcome Prediction

Farah Shamout, Tingting Zhu, and David A. Clifton

[25][26][27][28][29][30][31][32][33][34][35] [7] [36][37][38] [6] [39][40][41][42][43] [4] [15]

become increasingly popular for further use in classifica-

Predictor Variables Feature Representation

Model Outcome References

You might also like