Risk prediction models for discrete ordinal outcomes: Calibration and the impact of the proportional odds assumption

1 INTRODUCTION

Risk prediction modeling is ubiquitous in the medical literature. Most of these prediction models are developed for dichotomous outcomes, estimating the risk that a condition is present (diagnostic) or will develop within a certain time horizon (prognostic). However, several clinically important outcomes are ordinal in nature, with a finite and often limited number of ordered categories. One example is the extent of coronary artery disease in symptomatic patients, for which Edlinger et al recently developed a risk prediction model.1 The diagnosis can be any of five increasingly severe conditions: no coronary artery disease, nonobstructive stenosis, one-vessel disease, two-vessel disease, or three-vessel disease. Another example is the modified Rankin scale to assess function recovery after stroke, as in a model by Risselada et al.2 This scale has seven ordered categories: death, severe disability, moderately severe disability, moderate disability, slight disability, no significant disability despite symptoms, or no symptoms at all. Such outcomes are often dichotomized, although we would generally not recommend that for the following reasons: (1) it leads to a loss of information, (2) the merged categories may require different clinical management, and (3) merging categories may result in an extremely heterogeneous “supercategory”.

The default statistical model for ordinal outcomes is the cumulative logit model with proportional odds (CL-PO), which is commonly referred to as “ordinal logistic regression”. Several alternative logistic models for ordered categories exist, such as the adjacent category, continuation ratio, and stereotype models, which make different assumptions about the structure of ordinality.3-10 Alternatively, the multinomial logistic regression (MLR) can be used for modeling ordinal and other multicategory outcomes, ignoring the ordinality of the outcome.

For dichotomous outcomes, there is a large body of methodological literature and guidance on how prediction models should be constructed and how their performance should be evaluated in terms of discrimination and calibration.11-17 Methods to assess discrimination and calibration have been extended to models for nominal outcomes.18-20 For ordinal outcomes, discrimination measures have been proposed but calibration has been barely addressed.21-24 Harrell and colleagues discussed the development of a risk prediction model for an ordinal outcome using CL-PO.21 Calibration was assessed for a dichotomized version of the outcome, such that the standard methods for binary outcomes could be applied. More research on calibration is required, in particular because calibration is the Achilles heel of prediction modeling.25

In this article, we study the performance of a variety of regression algorithms to develop prediction models for discrete ordinal outcomes. We (1) evaluate different approaches to investigate calibration, (2) study the impact of the proportional odds assumption on risk estimates and calibration statistics, and (3) explore the impact on overfitting when using simpler models that assume proportional odds vs more complex models without assuming proportional odds.

This article is structured as follows. Regression models for discrete ordinal outcomes are described in Section 2, and measures for predictive performance in Section 3. Section 4 presents a simulation study to assess the impact of model choice on estimated risks, model calibration, and overfitting, and to compare approaches to quantify model calibration. Section 5 presents a case study, and in Section 6 we discuss our findings.

2 REGRESSION MODELS FOR DISCRETE ORDINAL OUTCOMES 2.1 Regression models

We predict an outcome Y with K categories (k=1,…,K) using Q predictors Xq (q=1,…,Q), X=X1,…,XQT. For simplicity, we will assume in notation that the models are modeling the predictors as linear and additive effects, but our work can easily be generalized to allow for alternative functional forms and interaction terms.

2.1.1 Multinomial logistic regression A generic model for categorical outcomes is multinomial logistic regression (MLR), which ignores the ordinality of the outcome. MLR models the outcome as follows10:

logP(Y=k)P(Y=1)=αMLR,k+βMLR,kTX=LMLR,k(1)

for k=2,…,K and where βMLR,kT=βMLR,k,1,…,βMLR,k,Q and where L is called a linear predictor. One outcome category is used as the reference, and all other categories are contrasted with this reference category. We use Y=1 as the reference, but the choice does not affect the estimated risks. 2.1.2 Cumulative logit models The likely most commonly used regression model for ordinal outcomes is the cumulative logit with proportional odds (CL-PO)10:

logP(Y≥k)P(Y<k)=αCLPO,k+βCLPOTX=LCLPO,k,(2)

for k=2,…,K and where βCLPOT=βCLPO,1,…,βCLPO,Q. Due to the proportional odds assumption, every predictor effect is modeled using only one parameter, irrespective of k. This means that predictor effects are assumed constant over k on the log-odds scale. The model has K−1 intercepts. The cumulative logit model can also be formulated without the proportional odds assumption, leading to the CL-NP model10:

logP(Y≥k)P(Y<k)=αCLNP,k+βCLNP,kTX=LCLNP,k,(3)

for k=2,…,K and where βCLNP,kT=βCLNP,k,1,…,βCLNP,k,Q. Here, the predictor effects depend on k, such that K−1 parameters are estimated for each predictor. Note that CL-NP may lead to invalid models where the estimated risk that Y≥k is higher than the estimated risk that Y≥k−1.8, 10 2.1.3 Adjacent category models An alternative method to model ordinality is to target pairwise probabilities of adjacent categories, rather than cumulative probabilities. Assuming proportional odds, the adjacent category with proportional odds model (AC-PO) model is10

logP(Y=k+1)P(Y=k)=αACPO,k+βACPOTX=LACPO,k,(4)

for k=1,…,K−1 and where βACPOT=βACPO,1,…,βACPO,Q. Proportional odds in this setup refers to identical effects for moving up one category, instead of identical effects for every dichotomization of Y. The adjacent model setup can also be applied without the proportional odds assumption, leading to the adjacent category without proportional odds model (AC-NP):

logP(Y=k+1)P(Y=k)=αACNP,k+βACNP,kTX=LACNP,k,(5)

for k=1,…,K−1 and where βACNP,kT=βACNP,k,1,…,βACNP,k,Q. This model is equivalent to MLR. 2.1.4 Continuation ratio models Instead of cumulative or pairwise probabilities, conditional probabilities can be targeted. Continuation ratio models estimate the probability of a given outcome category conditional on the outcome being at least that category. The continuation ratio model with proportional odds assumptions (CR-PO) is10

logP(Y>k)P(Y≥k)=αCRPO,k+βCRPOTX=LCRPO,k,(6)

for k=1,…,K−1 and where βCRPOT=βCRPO,1,…,βCRPO,Q. Without proportional odds, the continuation ratio model is (CR-NP):

logP(Y>k)P(Y≥k)=αCRNP,k+βCRNP,kTX=LCRNP,k,(7)

for k=1,…,K−1 and where βCRNP,kT=βCRNP,k,1,…,βCRNP,k,Q. 2.1.5 Stereotype logistic model Anderson introduced a model that finds a compromise between MLR and AC-PO, by relaxing the proportional odds assumption on the level of the K−1 equations rather than on the level of each predictor separately.7 The stereotype logistic model (SLM) is written as:

logP(Y=k)P(Y=1)=αSLM,k+ϕkβSLMTX=LSLM,k,(8)

for k=2,…,K and where βSLMT=βSLM,1,…,βSLM,Q and Y=1 is used as the reference. The model estimates one coefficient per predictor, but estimates K−1 scaling factors ϕ. Every predictor coefficient is multiplied by ϕk. To avoid identifiability problems, a constraint has to be imposed on the scaling factors, which typically is that ϕ2=1. In principle, the model is an ordered model if the scaling factors are monotonically increasing or decreasing. While this could be imposed as an additional constraint during model fitting, it is not necessary and may cause computational problems.3, 7 2.2 A comparison of the number of parameters

For any particular application, the number of parameters (regression model coefficients including intercepts) of the above defined models varies. The models without a proportional odds assumption (MLR, CL-NP, AC-NP, CR-NP) require (Q+1)(K−1) parameters, models with proportional odds (CL-PO, AC-PO, CR-PO) require Q+K−1 parameters. SLM falls in between with Q+2K−3 parameters. Table 1 presents the number of parameters for illustrative values of Q and K.

TABLE 1. Number of parameters to be estimated for some values of K and Q Number of parameters K Q Proportional odds models Stereotype logistic model Nonproportional odds models 3 3 5 6 8 3 5 7 8 12 3 10 12 13 22 5 3 7 10 16 5 5 9 12 24 5 10 14 17 44 10 3 12 20 36 10 5 14 22 54 10 10 19 27 99 Abbreviations: K, number of outcome categories; Q, number of predictors. 3 PREDICTIVE PERFORMANCE MEASURES FOR DISCRETE ORDINAL OUTCOMES MODELS

The estimated risk of category k is denoted by P^k, with the estimated risk for individual i in a data set of size N (i=1,…,N) denoted as p^i,k. These risks are model-specific, conditional on X and the estimated model parameters. Hence P^k=PY=k|X,θ^⋅, where θ^⋅ includes all parameters estimated from the model of choice (Equations (1)-(8-1)-(8)). For example, θ^SLM includes all Q+2K−3 estimated intercepts, model coefficients and scaling factors. The Appendix provides more details on how to calculate the risks for the different types of models. Analogously, the estimated risk that the outcome category has at least value k is denoted as V^k=∑j=kKP^j, with the estimated risk for individual i denoted as v^i,k.

3.1 Calibration of risk models for ordinal outcomes 3.1.1 Calibration intercepts and slopes per outcome category or dichotomy A simple approach that capitalizes on the well-known calibration tools for binary outcomes, is to evaluate risk model calibration for every outcome category separately by defining a binary outcome Yk that equals 1 if Y=k and 0 otherwise.26 The calibration intercept and calibration slope can be computed by the following binary logistic calibration model:

logPYk=1PYk=0=ac+bc×logitP^k.(9)

The calibration slope equals bc, the calibration intercept equals ac when bc is fixed to 1. Alternatively, the outcome can be dichotomized as Y≥k (1 if Y≥k and 0 otherwise), and a calibration model for the dichotomized outcome can be defined as:

logPY≥k=1PY≥k=0=ad+bd×logitV^k.(10)

Calibration intercepts and slopes can be obtained as for Yk.

Due to the ordinal nature of the outcome, Y≥k may appear more sensible than Yk, although this may depend on the actual clinical decisions that the model is intended to support.

3.1.2 Model-specific calibration intercepts and calibration slopes

When making a prediction model for a binary outcome using standard maximum likelihood logistic regression, the calibration intercept and calibration slope are by definition 0 and 1 when evaluated on the development dataset (ie, the exact same dataset that was used to develop the prediction model).26 A model with intercept of 0 and slope of 1 has been defined as “weak calibration”.26 Thus, maximum likelihood binary logistic regression for a binary outcome is by definition weakly calibrated on the development dataset. When making a prediction model for an ordinal outcome, and assessing calibration per outcome category (Yk) or per outcome dichotomy (Y≥k) (Equations (9)-(10-9)-(10)), calibration intercepts and slopes are no longer 0 and 1 on the development dataset. Procedures with intercept 0 and slope 1 on the development dataset are possible, but depend on the regression model used to develop the prediction model for the ordinal outcome. Such procedures are therefore not generic, and we describe them for each ordinal regression model separately.

For MLR, the model-specific calibration model is of the following form18:

logP(Y=k)P(Y=1)=aMLR,k+∑j=2KbMLR,k,jL^MLR,j,(11)

for k=2,…,K and L^MLR,j are the linear predictors from the fitted MLR prediction model (Equation (1)). The calibration intercepts equal aMLR,k, when fixing the corresponding calibration slope bMLR,k,j=k to 1 and the remaining slopes bMLR,k,j≠k to 0. The calibration slopes equal bMLR,k,j=k, when fixing the remaining slopes bMLR,k,j≠k to 0. When this model is used to evaluate calibration of the MLR model on the development dataset, weak calibration holds: the calibration slopes are bMLR,k,j=k equal 1 and the calibration intercepts aMLR,k equal 0. See Van Hoorde and colleagues for further elaboration in the context of prediction models for nominal outcomes.18 For CL-PO, the K−1 linear predictors are identical except for the intercepts (Equation (2)). Hence for each linear predictor L^CLPO,j, j=2,…,K, separate CL-PO calibration models are fit as follows:

logP(Y≥k)P(Y<k)=aCLPO,k+bCLPO,jL^CLPO,j,withk=2,…,K.(12)

The calibration slopes equal bCLPO,j, and the calibration intercepts equal aCLPO,k=j when bCLPO,j is fixed to 1. Similarly, for fitted AC-PO and CR-PO prediction models (Equations (4) and (6)), K−1 separate AC-PO or CR-PO calibration models are fit for each linear predictor L^⋅,j, j=1,…,K−1:

AC−PO:logP(Y=k+1)P(Y=k)=aACPO,k+bACPO,jL^ACPO,j,withk=1,…,K−1,(13)

CR−PO:logP(Y>k)P(Y≥k)=aCRPO,k+bCRPO,jL^CRPO,j,withk=

Comments (0)

No login
gif