Probabilistic models of delay discounting: “Fixed-endpoint” psychometric curves improve plausibility and performance

Humans and other animals exhibit a near-universal tendency to subjectively devalue future rewards as a function of their delay, a phenomenon known as “delay discounting”. The steepness with which future rewards are discounted has been implicated in a range of psychiatric disorders (Amlung et al., 2019) and associated with addictive (Amlung et al., 2017, MacKillop et al., 2011) and otherwise unhealthy (Story, Vlaev, Seymour, Darzi, & Dolan, 2014) behaviours.

A long-standing problem has been to determine the form of the “discount function” describing the decline in a reward’s subjective value with respect to its delay, that is, to determine the function δ satisfying the following equation: uD=vDδ(t;p)where uD, the subjective value of a delayed reward D, is equal to its face value vD multiplied by a function δ of its delay t parameterized by some parameter vector p.1 For example, if t is equal to 1 year, and δ(t;p)=0.5, then $100 in 1 year has a subjective value of only $50. This means that an agent conforming to this discount function would be indifferent between $50 available immediately and $100 available in 1 year. Any function δ should satisfy the following constraints:

1.

δ(0;p)=1 (i.e., no delay implies no discounting)

2.

if t2>t1, then δ(t2;p)<δ(t1;p) (i.e., δ should be monotonically decreasing with respect to t)

Many functions satisfying these constraints have been proposed. The classical model of Samuelson (1937) describes exponential discounting: δ(t;k)=e−ktwhere k controls the rate at which subjective value declines with reward. In the exponential model, discounting compounds at a constant rate, namely e−k per unit time. However, empirically, people exhibit declining discount rates with increasing delays (Thaler, 1981). For example, the subjective value of a reward may decline 10% with a delay of 1 month but only 50% with a delay of 12 months, whereas a constant compound discount rate of 10% per month would mean that the subjective value of the reward is only 0.912≈12% of its face value.

An alternative function that can account for this declining rate of discounting is the following, describing hyperbolic discounting (Mazur, 1987): δ(t;k)=11+ktwhere k is again a parameter controlling the rate of discounting. It can be rewritten as δ(t;k)=e−log(1+kt)ttwhich describes exponential discounting with a compound discount rate e−log(1+kt)t that declines with increasing t. Hyperbolic discounting is widely used to quantify individual differences in delay discounting (Odum, 2011). However, it is found to not always provide an optimal fit to individual-level data (Franck, Koffarnus, House, & Bickel, 2015) and a wide range of alternative discount functions has been proposed. Table 1 provides a list of commonly used functions.

Given a discount function, it still remains to be specified how an individual will arrive at a given choice. A discount function specifies the “indifference point” at a certain delay (that is, the relative values of immediate and delayed rewards between which an individual is indifferent), but does not constrain the decision probabilities for reward values other than at the indifference point. For example, if δ(t;p) evaluates to 0.3 at t=50 for a given individual, we would expect this individual to be indifferent between (i.e., equally likely to choose) $30 now and $100 in 50 days, but we would have no precise expectation as to their probability of choosing, say, $40 now over $100 in 50 days. More precise expectations require a probabilistic model that specifies not only the degree to which a delayed reward is discounted, but also how the discounted value of the delayed reward is weighed against the value of an immediate reward to produce the relative propensities of selecting each of these. In the Fechner model (Becker, DeGroot, & Marschak, 1963), decisions are assumed to be subject to some processing error: P(I)=Pr(uI−uD<ɛ)where P(I) is the probability of selecting an immediate reward I, uI and uD are the subjective values of I and D respectively, and ɛ is a random variable. When ɛ follows logistic distribution with location parameter 0 and scale parameter 1γ, we arrive at the well-known logistic choice rule: P(I)=σγuI−uDwhere σ[η]=1+e−η−1 is the logistic function. More generally, choice rules of this type can be written as P(I)=FɛC(uI,uD);γwhere Fɛ[η;γ] is the cumulative distribution function (CDF) of the random variable ɛ parameterized by some “sharpness” parameter γ (such that the CDF becomes steeper at the median for higher values of γ), and C(uI,uD) is a function to compare uI with uD. When C(uI,uD)=uI−uD and Fɛ[η;γ] is the CDF of the normal distribution Φ[γη] with mean 0 and standard deviation 1γ, we arrive at the probit choice rule: P(I)=Φ[γ(uI−uD)]Alternatively, when C(uI,uD)=uIuD and Fɛ[η;γ] is the CDF of the log–logistic distribution with scale parameter 1 and shape parameter γ, we arrive at the power choice rule (Luce, 1959): P(I)=11+uIuD−γwhich is usually written as P(I)=uIγuIγ+uDγNote that, for the power choice rule, the decision maker is assumed to consider the relative rather than the absolute values of the rewards. Given a discount function parameterized by p and choice rule parameterized by γ, maximum likelihood estimation can be used to estimate optimal pˆ and γˆ for an individual’s decisions.

There are special cases where we can recover a discount function2 by manipulating the regression coefficients from a generalized linear model of decision behaviour, thereby avoiding nonlinear regression. Wileyto, Audrain-McGovern, Epstein, and Lerman (2004) introduce this approach for the hyperbolic discounting function, using the following logistic regression equation to model the probability of selecting the immediate reward: P(I)=σβ11−vDvI+β2tWhen P(I)=12, an individual is indifferent between the immediate and delayed rewards, meaning their subjective values are equal: uI=uDvI=vDδ(t;p)Thus, we can set P(I)=12 and replace vI by vDδ(t;p): 12=σβ11−1δ(t;p)+β2tThen we can rearrange the equation to obtain a discount function parameterized by the regression coefficients. Because σ[η]=12 only when η=0, we can write β11−1δ(t;p)+β2t=0Solving for δ(t;p), we arrive at δ(t;p)=11+β2β1twhich describes hyperbolic discounting with k=β2β1. As shown in Table 2, we can extend this approach to both exponential discounting and scaled exponential discounting, again estimating the parameters of these discount functions in terms of logistic regression coefficients. Unfortunately, this approach is not possible for all discount functions. For example, the inverse q-exponential discount function (Green & Myerson, 2004) cannot be written in a form that is linear in parameters (which is necessary for a discount function to be recoverable from a linear model).

All of the choice rules we have considered (logistic, probit, power), as well as the special cases using generalized linear regression (Table 2) suffer from a shortcoming: they make implausible allowances for edge cases where the immediate reward is nothing (vI=0) and/or where the immediate and delayed rewards have equal face values (vI=vD). For these cases, common sense would suggest that a probabilistic model of discounting should satisfy the following desiderata:

D1.

Something rather than nothing: when the immediate reward has a value of 0, it should never be chosen. i.e., vI=0 implies P(I)=0. For example, we would expect someone to never choose $0 now over $100 in a year.

D2.

Sooner rather than later: when the immediate and delayed rewards have equal dollar values, the immediate reward should always be chosen. i.e., vI=vD implies P(I)=1. For example, we would expect someone to always choose $100 now over $100 in a year.

Contrary to these desiderata, the logistic and probit choice rules always give values of P(I) strictly greater than 0 and less than 1 (Fig. 1). Similarly, for the power choice rule and the special linear cases listed in Table 2, P(I) is always less than 1, even when vI=vD (Figs. 1 & 2), contrary to D2.3 A goal of the current study is to offer probabilistic models of delay discounting that satisfy both D1 and D2, and to determine whether these provide a better description of discounting behaviour than those that do not. In these models, at the endpoint values of vIvD=0 and vIvD=1, the probability of selecting the immediate reward will be fixed at 0 and 1, respectively. Thus we will refer to models satisfying both desiderata as “fixed-endpoint” models and those satisfying neither or only one as “free-endpoint” models.

In practice, individuals will sometimes make decisions that do not adhere to D1 and D2. This is generally assumed to indicate inattention rather than genuine preference, and opportunities to deviate from these desiderata are sometimes built into experiments as attention checks (Athamneh et al., 2021, Craft et al., 2022, Lin et al., 2018, Pope et al., 2019, Stein et al., 2018). However, it is important to critically evaluate this assumption before building D1 and D2 into a model of discounting. Indeed, one can conceive of rationally intelligible reasons not to adhere to D1 and D2: for example, perhaps I know I would likely spend $100 that I received in a year on something harmful that will only become available then, or perhaps the disutility of waiting for 1 year more than outweights the utility of $100, and so I instead opt for nothing. Conversely, perhaps I want something that costs $100 and only becomes available in a year and I know I would likely spend $100 too early if I received it now, or perhaps I know I will enjoy anticipating receiving $100, and so I opt to postpone the reward (i.e., I choose $100 in a year over $100 now). Thus, another goal of this study is to examine the validity D1 and D2. If they are indeed valid desiderata for a descriptive model of discounting, participants who do not adhere to D1 and D2 should tend to show signs of inattention.

View original article

JOURNAL OF MATHEMATICAL PSYCHOLOGY

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Probabilistic models of delay discounting: “Fixed-endpoint” psychometric curves improve plausibility and performance

Comments (0)