Data from the Telecare Nord Heart Failure study were made available for this study, this included monitoring data collected as well as self-reported sociodemographic data. The recruitment, data collection and study protocol has been described elsewhere [18, 19]. The data have previously been used in studies reporting the potential health [18], health-economic [20] and equity in health [21] benefits of telemonitoring heart failure patients. However, the exploratory nature of the previous study necessitates a proper model development study to confirm the findings.
Data from the 126 patients with chronic heart failure that received telemonitoring were included in this study. The patients were followed from inclusion in the Telecare Nord Heart Failure project until censoring or the 17th of December 2018. For each patient, all at-home measurements of weight, blood pressure and pulse were collected for the period in which the patient received the telemedicine intervention. The Danish National Patient Registry, a high-quality registry which contains all hospital contacts for Danish citizens [22], was used to collect all recorded hospitalizations during the period of telemedicine intervention for each included patient.
To facilitate efficient decision support, our model development emphasized continuous prediction of hospitalization risk, using a sliding window approach [23] where each full set oF-measurements corresponds to a new window.
An observation was defined as a full set of four biometric measurements; thus, observations with missing data in one of the four measurements were excluded from this study. Additionally, the first four weeks of inclusion in the telemedicine study were excluded for each patient. In total, 11,575 observations were included in this study.
The study is reported in accordance with the “Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD)” guidelines [24]. In accordance with guidance from the Danish Ethics Committee, no ethical approval was needed for this study.
2.2 OutcomeWe aimed to develop a model that predicts risk of clinically significant deterioration with enough lead time for clinicians to act upon the warning. To this end, data from the Danish National Patient Registry were used to identify all nonelective hospitalizations occurring in the patient group. For each observation of a full set of biometric values, the admission time and non-elective indicator from the registry, was used to construct a label indicating the presence or absence of a non-elective hospitalization within the 14 days following the measurements. Figure 1 illustrates the labelling process, with the green bars representing hospitalization-absent observations while the red bars represent a hospitalization present observation.
Fig. 1The figure illustrates the sliding window approach used to define observations. Each observation of a full set of measurements from patients with no hospitalizations in the prediction horizon (the following 14 days) were labelled "No Event" (green), while measurements preceding a hospitalization (Red) were labelled "Hospitalization”
2.3 PredictorsTo train the machine learning models, a feature set based on the biometric measurements was constructed. Two sets of features were evaluated: a basic feature set and a derived feature set. In the basic feature set, only the immediate weight, pulse, systolic and diastolic measurements were included.
In the derived feature set, aggregate features based on longer time periods were derived for each observation, to capture long-term trends. These features quantifying the time-dependent patterns of each of the biometric measurements were based on the eight weeks prior to the observation of a full set oF-measurements. The features included were the average and the rate of change in the four weeks preceding the observation, the average and the rate of change in the eight to fourth weeks prior to the observation and the difference in the average and the rate of change between the two four-week periods. The derived features were included in addition to the raw measurements included in the basic feature set. The formula used for each feature are shown in Table 1.
Table 1 Overview of the included featuresSociodemographic characteristics and current month were included as features in both models, these features have previously been shown to be important predictors of hospitality in chronic heart failure patients [25]. Based on self-reported surveys patient age, gender and "New York Heart Association (NYHA)" class was included. Four patients did not report NYHA class in the baseline survey, these patients were defined as NYHA class 1 for the purposes of this study. All sociodemographic variables were fixed to their value at baseline. Both models included a dummy coded feature for the current month, to account for potential seasonality in chronic heart failure hospitalization.
2.4 SoftwareAll data management and model development were computed in R (R version 4.2.1, [26]) using RStudio IDE (RStudio 2023.06.0 + 421 "Mountain Hydrangea",[27].
Data management was handled using the “Tidyverse” package compilation (version 1.3.2, [28]) and the “Tsibble” package (version 1.1.1, [29]). The “Tidymodels” package was used to develop and validate individual models (version 1.00, [30]). The RuleFit model was implemented from the “rules” package (version 1.00, [31]).
2.5 ModelsThe choice of machine learning model framework is a crucial element of any machine learning project. To assess the potential of weekly measurements in developing decision support systems for telemonitored heart failure patients, several different models were trained, and their performance was compared. To enhance the clinical acceptance and use of the decision support system, machine learning model frameworks with a higher degree of interpretability were chosen over a “black box” oriented approach such as a neural network. All models were implemented using the Tidymodels environment and the associated Parsnip package (version 1.0.0, [30]).
2.5.1 Logistic regressionA lasso regularized logistic regression was used to estimate the performance of a very interpretable model. The “Glmnet” engine was used with the mixture parameter fixed at 1 (indicating a lasso regression) and the size of the penalty “penalty” set as a tunable hyperparameter (range [1*10–10; 1]).
2.5.2 Random forestA random forest model was included as a less interpretable model that has shown good performance in similar studies. The “Ranger” engine was used with the number of random variables selected for prediction “mtry”, number of constructed trees “trees” and minimum size of node “min_n “ parameters set as tunable hyperparameters (ranges [1;63], [20;2000], [2;40], respectively).
2.5.3 RuleFitA RuleFit model was included as a possible compromise between the interpretability of the logistic regression model and the possible performance improvement of the random forest model. RuleFit has been suggested as a machine learning model that combines the interpretability of simple decision trees with the improved performance of more complex models [32].
Training of a RuleFit model consists of two sequential steps, the rules generating step and the regularized regression step.
During the rules generating step, a tree-based algorithm is used to construct a large number of shallow decision trees. Rules are then collected from each tree by following each possible path through the tree. This generates a large number of rules that are then added as dichotomous variables to the dataset containing the original features.
The second step is characterized by using a regularized regression model to estimate coefficients for all variables in the rule-enhanced dataset. By using regularized regression with a lambda above 0, only a small subset of the rules generated in the first step will have a coefficient above 0, which improves interpretability and model performance [32].
The model was implemented using the xrf engine from the “Rules” package, which uses XGBoost to construct the trees in the first step. The parameters for the proportion of randomly selected predictors “mtry”, number of constructed trees “trees”, maximum depth of trees “tree_depth” and size of penalty “penalty” were included as tunable hyperparameters (ranges [0.1;1], [5, 100], [1, 10], [−10, 0], respectively).
2.6 Feature preprocessingAll features were normalized to a mean of zero and a standard deviation of one to facilitate proper Lasso penalization of variables with differing scales [33].
2.7 TrainingA grid of the potential values for the hyperparameters of each respective model was evaluated in a fivefold grouped cross-validation framework. In accordance with best practice for cross-validation on hierarchical, repeated measures, data [34], we used the patient ID as the grouping variable for the observations. To account for the imbalanced class distribution, downsampling was used with the sampling ratio included as a tunable hyperparameter (range [0,5;3]).
The best performing set of hyperparameters for each respective model was selected based on their F-measure statistic.
2.8 Evaluation metricsAll models were evaluated on their predictive performance in a fivefold grouped cross-validation framework. The F-measure, “Receiver Operating Characteristic—Area Under the Curve” (ROC-AUC), Precision-recall AUC, accuracy, sensitivity and specificity of each model are presented in the results section. Due to the imbalanced data, the model with the higher F-measure was defined as the best model.
Comments (0)