Introduction: No mortality risk prediction model has previously been validated for cardiac surgery in Indonesia. This study aimed at validating the EuroSCORE II and Age Creatinine Ejection Fraction (ACEF) score as predictors for in-hospital mortality after cardiac surgery a in tertiary center, and if necessary, to recalibrate the EuroSCORE II model to our population.
Methods: This study was a single-center observational study from prospectively collected data on adult patients undergoing cardiac surgery from January 2006 to December 2011 (n = 1833). EuroSCORE II and ACEF scores were calculated for all patients to predict in-hospital mortality. Discrimination was assessed using the area under the curve (AUC) with a 95% confidence interval. Calibration was assessed with the Hosmer–Lemeshow test (HL test). Multivariable analysis was performed to recalibrate the EuroSCORE II; variables with P < 0.2 entered the final model.
Results: The in-hospital mortality rate was 3.8%, which was underestimated by the EuroSCORE II (2.1%) and the ACEF score (2.4%). EuroSCORE II (AUC 0.774 (0.714–0.834)) showed good discrimination, whereas the ACEF score (AUC 0.638 [0.561–0.718]) showed poor discrimination. The differences in AUC were significant (P = 0.002). Both scores were poorly calibrated (EuroSCORE II: HL test P < 0.001, ACEF score: HL test P < 0.001) and underestimated mortality in all risk groups. After recalibration, EuroSCORE II showed good discrimination (AUC 0.776 [0.714– 0.840]) and calibration (HL test P = 0.79).
Conclusions: EuroSCORE II and the ACEF score were unsuitable for risk prediction of in-hospital mortality after cardiac surgery in our center. Following recalibration, the calibration of the EuroSCORE II was greatly improved.
Keywords: ACEF score, cardiac surgery, EuroSCORE II, in-hospital mortality, recalibration, risk prediction
How to cite this article:Mortality risk prediction is useful before cardiac surgery. Accurate prediction enables clinicians to provide meaningful information before patient consent, helps identify patients needing preoperative optimization, assists in good allocation of resources, and is an adequate measure for quality assurance.[1]
Our center is a tertiary referral hospital, so the cases are usually complex. However, the human resources and facilities are limited compared with the number of patients, and a tool for mortality risk prediction is needed. No locally generated risk prediction model for mortality after cardiac surgery is available in our country. The EuroSCORE II, which was launched in 2011, was developed from a multicenter database in Europe and Asia.[2] It has been validated in many countries including the Asian countries,[3],[4],[5],[6],[7],[8],[9],[10],[11] and it is still showing good accuracy. A drawback is that many variables are needed for the calculation. The ACEF score is a simple model based only on age, kidney function, and left ventricular ejection fraction (LVEF).[12] Its accuracy was comparable to that of more complex scores in some studies.[6],[13],[14],[15] The simplicity of the ACEF score helps avoid “overfitting”, a problem that occurs when too many independent variables are applied in populations with few events.[13],[16] Calculation is also easier and multicollinearity is avoided.[12],[13]
The present study had two aims. First, to test if the EuroSCORE II and ACEF score can be used to predict in-hospital mortality after cardiac surgery in an Indonesian population. Second, if necessary, to achieve sufficient discrimination and calibration, to recalibrate the EuroSCORE II to obtain a more suitable model for our population.
Materials and MethodsThe study was performed as per the Helsinki Declaration. After approval from the institutional research ethics committee, data prospectively collected during cardiac surgery at our center from January 2006 to December 2011 were used. In-hospital mortality was defined as death occurring at any time after surgery during the primary hospital stay. In total, 1,833 patients aged 19–95 years with complete data were included. Patients who underwent the maze procedure or cardiac tumor resection were excluded because of too few cases. The ejection fraction was not available in 35 patients, so the ACEF score could not be calculated in those patients. The EuroSCORE II and ACEF score were calculated according to the original publications,[2],[12] but the variable “poor mobility” was missing in our database and could not be included. Receiver operating characteristic curves (ROC) producing an area under the curve (AUC) with 95% confidence intervals (CIs) were used to assess discrimination. Differences of AUC between the two scores were assessed with DeLong's method. Calibration was evaluated using the Hosmer–Lemeshow test (HL test), which differentiates between expected and observed mortality at each decile of risk. For the HL test, significant results indicate poor calibration.
Due to small numbers, thoracic surgery was excluded. Several variables were recoded before recalibration of the EuroSCORE II. New York Heart Association (NYHA) groups were recoded into two groups: NYHA I and II patients versus NYHA III and IV patients. For renal impairment, patients on dialysis were grouped with those with severe renal impairment. For pulmonary hypertension, patients with severe and moderate forms were placed in the same group. The weight of intervention was reduced to three categories, placing “two procedures” and “three procedures” in the same group. The urgency of the procedure was reduced to two categories, combining patients with “urgent”, “emergency”, and “salvage” surgery. Very severely reduced LVEF was merged with poor LVEF, resulting in three LVEF categories.
SPSS (version 20.0 SPSS Inc., Chicago, IL, USA) was used for descriptive statistics. Continuous data are presented as mean ± standard deviation (SD) and categorical data as numbers with percentages. Multivariable logistic regression including the EuroSCORE II variables was used to generate new odds ratios (ORs) for a recalibrated EuroSCORE II in our population. Variables with P value < 0.2 were included to achieve a simplified, recalibrated model. This threshold was chosen in order not to omit clinically important variables that were non-significant in the present study because of the study size. Logistic regression and AUC comparison were performed in STATA (version 13, StataCorp LLC, Texas, USA). P values < 0.05 were considered statistically significant.
ResultsThe baseline characteristics of the study patients (n = 1,833) are shown in [Table 1]. Mean age was 53 ± 12 years (range: 19–79 years), and 367 patients (20%) were females. Most patients (71%) underwent coronary artery bypass grafting (CABG).
In-hospital mortality occurred in 3.8% of the patients and was considerably underestimated both by the EuroSCORE II (2.1%) and ACEF score (2.4%). A total of 23 (33.3%) patients who died underwent emergency procedures. ROC curves are shown in [Figure 1]. The EuroSCORE II had good discrimination with AUC 0.774 (0.714–0.834), whereas the ACEF score had poor discrimination with AUC 0.638 (0.561–0.718). The differences in AUC were significant (P = 0.007).
Figure 1: Discrimination by risk prediction models for in-hospital mortality in cardiac surgery. Statistical comparisons: EuroSCORE II vs. ACEF score: P = 0.006, recalibrated EuroSCORE II vs. ACEF score: P = 0.002, EuroSCORE II vs. recalibrated EuroSCORE II: P = 0.91The HL test showed that both the EuroSCORE II (P < 0.001) and the ACEF score (P < 0.001) were poorly calibrated. [Figure 2] compares the mean observed and predicted probabilities of in-hospital mortality in each decile of risk.
Figure 2: Observed mortality vs. predicted probability of in-hospital mortalityThe following variables from the EuroSCORE II were considered informative in the recalibrated score (P < 0.2) [Table 2]: age (OR: 1.02), diabetes on insulin (OR: 8.44), chronic lung disease (OR: 4.66), endocarditis (OR: 11.43), NYHA class 3 or 4 (OR: 1.74), pulmonary hypertension (OR: 1.87), poor or very poor LVEF (OR: 2.68), urgent, emergency, or salvage procedure (OR: 11.84), and one procedure other than CABG (OR: 1.53).
Thus, the following variables were omitted from the recalibrated EuroSCORE II: gender, renal impairment, extracardiac arteriopathy, previous cardiac surgery, critical preoperative state, Canadian Cardiovascular Society (CCS) class 4 angina, and recent myocardial infarct. The discrimination of the recalibrated EuroSCORE II was good with an of AUC 0.777 (0.714–0.840), [Figure 1]. Predicted probabilities for mortality in risk deciles 1–7 were close to the observed mortality [Figure 2], whereas risk was still underestimated in deciles 8–10. The difference in AUC between the EuroSCORE II and the recalibrated EuroSCORE II was not significant (P = 0.92) but the difference between the ACEF score and the recalibrated EuroSCORE II was significant (P = 0.006). The total mortality rate was slightly underestimated by the recalibrated EuroSCORE II model (3.2% vs. observed mortality 3.8%). However, calibration was still good (HL test P = 0.79, [Figure 2]).
Because few patients had endocarditis (n = 5), a sensitivity assay was performed where the EuroSCORE II was recalibrated after omitting these patients. This model showed worse calibration (HL test P = 0.17) and similar discrimination (AUC 0.767 [0.701–0.832]) and was, therefore, discarded.
DiscussionThis study provides the first validation of the EuroSCORE II and ACEF score for in-hospital mortality prediction in an Indonesian population, showing that the EuroSCORE II gave good discrimination, whereas the ACEF score gave poor discrimination. The EuroSCORE II and ACEF score underestimated mortality in all risk groups, and both scores were poorly calibrated. Thus, none of these scores could be used without modifications. The recalibrated EuroSCORE II was better calibrated. It consisted of only nine variables which made it simpler without loss of predictive ability.
Risk prediction scores often perform worse in other populations with different characteristics and comorbidity, where their accuracy rarely exceed an AUC of 0.7 (considered as just acceptable) and calibration usually is bad in the lowest risk and highest risk patients.[7],[9],[14],[17] Recalibration of an existing validated risk score may be more useful and less complicated than developing an entirely new risk prediction model, but although the recalibrated EuroSCORE II showed good calibration in our population, discrimination was not improved.
The mortality rate was 3.8%, comparable to that of other Asian countries such as in China (2.5%),[18] and India (5.7%).[19] Other studies have also found varying results with EuroSCORE II. It has given good discrimination, especially in CABG patients (AUC >0.8),[3],[6],[8],[14] but in valve surgery[7] and high-risk patients,[20] it was less accurate. The EuroSCORE II also tends to underestimate risk in high-risk patients, possibly because it did not consider interaction between variables, especially between continuous variables and comorbidity. Including such interactions would allow for a more continuously increasing risk that corresponds better with observed mortality rates in high-risk patients with multiple comorbidities.[21]
Other factors potentially explaining the risk underestimation both with the EuroSCORE II and the ACEF score are differences in the perioperative setting. A delay in preoperative patient screening caused by few cardiac centers in Indonesia may contribute to the worsening of the patients' condition by the time they get operated. Moreover, access to percutaneous coronary interventions is still limited compared to the number of patients. Other limited resources such as number of cardiac surgeons, equipment, intensive care unit capacity and devices for hemodynamic support like extracorporeal membrane oxygenation, ventricular assist and intra-aortic balloon pumping compared to the number of patients may have influenced mortality.
The miscalibration of the original EuroSCORE II in our population may have several reasons:
Our population had different characteristics with respect to several variables. The patients were younger (53 vs. 65 years), had lower creatinine clearance (74.97 mL/min vs. 83.6 mL/min), were more often in critical preoperative state (9.8% vs. 4.1%), and a higher proportion had elective surgery (94% vs. 77%) than in the EuroSCORE II population.[2]
In our center, the proportion of CABG operations was higher (70.7%) than in the EuroSCORE II population that had a balanced proportion between CABG and valve procedures (46.3%). Fewer women need cardiac operations in Indonesia because they rarely smoke, and the proportions of patients with acute myocardial infarctions, chronic lung disease and renal failure are significantly higher in men.
Western risk scores often have limited predictive abilities in Eastern populations because of differences in racial, ethnic, geographic, and genetic factors. It also has been suggested that South Asian populations tend to have smaller coronary arteries, potentially resulting in poorer outcomes after CABG. Even if the underlying physiologic properties are similar in our population compared to other Asian populations, interactions between age, sex, smoking, and lipids in our country could be different from other Asian countries.[22]
Our findings confirmed that the accuracy of the EuroSCORE II was better than the ACEF score, as has been found in other studies.[6],[23],[24] The poor calibration of the ACEF score may be caused by the three risk factors, namely age, serum creatinine concentration, and LVEF not being the strongest predictors in our population.
With the recalibrated EuroSCORE II, underestimation of risk was smaller in risk deciles 1–7, and the HL test indicated good calibration. Thus, the recalibrated score better represented the local study population. Clinical judgment and experience will often help to identify the high-risk patients despite underestimation of the predicted risk by scoring.
Some variables in the EuroSCORE II were identified as important predictors in the recalibrated score [Table 2], including age. The low mean age in our population can be caused by a high proportion of cardiovascular disease in middle age caused by tobacco use (26%).[25] Age is also included in the ACEF score, partly to control for the duration of risk factor exposure. Furthermore, comorbidities often become more complex as age advances. Post-operative complications such as cardiac, pulmonary, and renal complications are more frequent with older age.[26]
The wide CI for the OR of diabetes and endocarditis was caused by the small number of patients with these comorbidities. Insulin-treated diabetes has been identified as an independent risk factor for mortality.[27] The quality of the vasculature in diabetic patients is poor, making grafting more difficult. Another study has identified that diabetic patients have more complex comorbidity such as myocardial infarctions, stroke, and peripheral artery disease.[27] In our study, diabetic patients had more post-operative complications such as renal failure that needed dialysis.
Chronic lung disease had larger OR in the recalibrated version of the EuroSCORE II, which may be strongly associated with smoking.[28] Chronic lung disease has previously been identified as a prominent independent predictor of mortality in cardiac surgery.[29] Chronic lung disease may give cardiovascular sequelae, including right ventricular dysfunction, pulmonary hypertension, coronary artery disease, and arrhythmias.[30] In the present study, patients in NYHA class III or IV more often had severe pulmonary hypertension or were in a critical state. Patients undergoing emergency procedures had high mortality (32.8%) as they mostly were in a critical situation. The operation category “single procedure other than CABG” included more cases undergoing thoracic and ascending aorta procedures with high mortality, so the mortality rate was higher than in patients undergoing two or more procedures.
Limitations of the study
The present study has some limitations. This was an observational single-center study carried out in a limited number of patients. Further studies with larger cohorts and multiple-center analysis are required to confirm our result. The incompleteness of variables for the EuroSCORE II could not be avoided since this study was based on data from 2006 to 2011, whereas the EuroSCORE II was launched in 2011 and contained variables that were not used in the previous versions.
ConclusionIn-hospital mortality in our population was underestimated by the EuroSCORE II and ACEF score. The EuroSCORE II, despite good discrimination, showed poor calibration. The ACEF score showed poor discrimination and calibration and could not be used for the prediction of in-hospital mortality. The recalibrated EuroSCORE II had substantially better calibration, especially for patients with low to medium risk, but discrimination did not change significantly. Recalibration of existing validated risk scores may be a more useful approach than developing local, novel risk scores for cardiac surgery.
Financial support and sponsorship
This work was supported by the Ministry of Research, Technology, and Higher Education, Indonesia under the PUPT grant (635/UN1-P.III/LT/DIT-LIT/2016).
Conflicts of interest
There are no conflicts of interest.
References
Correspondence Address:
Yunita Widyastuti
Department of Anesthesiology and Intensive Therapy Dr. Sardjito Hospital, Faculty of Medicine, Public Health and Nursing, Universitas Gadjah Mada, Jl Kesehatan No 1 Sekip Sinduadi Mlati Sleman Yogyakarta
Indonesia
Source of Support: None, Conflict of Interest: None
CheckDOI: 10.4103/aca.aca_297_20
Comments (0)