- Email: [email protected]

Contents lists available at ScienceDirect

Journal of Environmental Economics and Management journal homepage: www.elsevier.com/locate/jeem

Risk aversion and adaptive management: Insights from a multi-armed bandit model of invasive species risk$ Michael R. Springborn n Department of Environmental Science and Policy, University of California Davis, 2104 Wickson Hall, Davis, CA 95616, United States

a r t i c l e in f o

abstract

Article history: Received 4 March 2013 Available online 10 July 2014

This article explores adaptive management (AM) for decision-making under environmental uncertainty. In the context of targeting invasive species inspections of agricultural imports, I find that risk aversion increases the relative value of AM and can increase the rate of exploratory action. While calls for AM in natural resource management are common, many analyses have identified modest gains from this approach. I analytically and numerically examine the distribution of outcomes from AM under risk neutrality and risk aversion. The inspection decision is framed as a multi-armed bandit problem and solved using the Lagrangian decomposition method. Results show that even when expected gains are modest, asymmetry in the distribution of outcomes has important implications. Notably, AM can serve to buffer against large losses, even if the most likely outcome is a small loss. & 2014 Elsevier Inc. All rights reserved.

Keywords: Adaptive management Approximate dynamic programming Multi-armed bandit Decision-making under uncertainty Bayesian learning Invasive species Risk aversion

Introduction Most environmental management problems involve taking action under uncertainty. Doremus (2007) argues that uncertainty is the “signature challenge” of resource policy. The role of parametric or model uncertainty in particular is a “dominant” issue in the economics of climate change (Kelly and Kolstad, 1999). It features prominently in the dynamic management of many planning problems such as water resources (Freeze et al., 1990), fisheries (Sainsbury et al., 1997), and invasive species (Eiswerth and van Kooten, 2002). In contrast to stochasticity or noise, model uncertainty can be reduced over time through learning. Learning in environmental management may occur in several ways. Engaging in controlled R&D, separated from the system under management, is one way to gather information. For example, Kaplan et al. (2003) consider a dynamic learning situation in which pollution abatement effort can be diverted to collecting information about the contribution of different sources to the problem. Information acquisition and the resulting management (abatement) actions are treated as separate activities—opportunities to learn while undertaking abatement are not considered. However, hands-on management experience itself is often the primary way in which uncertainty in environmental models is reduced (Walters, 1986). This reasoning grounds the concept of adaptive management (AM) as outlined in foundational discussions by Holling (1978) and Walters and Hilborn (1978), which emphasized the need to “break out of the passive mode and learn to treat the acquisition of functional information” as deliberatively “experimental” (Walters and Hilborn, 1978, p. 183). ☆ This research was supported by NSF/IGERT Program Grant DGE-0114437 and the U.S. Department of Agriculture's (USDA) Economic Research Service (ERS). The research benefited from discussion and data assistance from Donna Roberts and Peyton Ferrier of the USDA's ERS. n Fax: þ1 530 752 3350. E-mail address: [email protected]

http://dx.doi.org/10.1016/j.jeem.2014.05.004 0095-0696/& 2014 Elsevier Inc. All rights reserved.

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

227

Despite many calls for AM in natural resource management (Doremus, 2010), in the rare cases where such optimal endogenous learning is explicitly modeled the returns have been modest when compared to a “passive” AM approach in which there is no experimentation but policies are adjusted as information is incidentally revealed. For example, in the context of species translocation, Rout et al. (2009) find a relative expected gain from AM of 1–3%. Bond and Loomis (2009) explore the use of AM for water pollution management and estimate a 0.1% improvement. However, consideration of the dynamics of choice and learning under AM raises the question of whether a focus on the expected value of AM might be missing important elements of the story. The immediate opportunity cost of AM is given by the difference in reward between the best non-learning choice and the exploratory AM choice. Such AM opportunity costs are limited in the long run since exploration typically attenuates with learning. The benefit of AM lies in reducing future management error from imperfect understanding of a system, which is potentially large and ongoing. This asymmetry in upside risk versus downside risk suggests that the distribution of possible returns from AM might also be skewed. Furthermore, if beliefs about system mechanics are unbiased, then the most likely outcome might be a modest net loss, e.g. when exploration confirms initial notions of the best option. In this article, I explore the determinants and distribution of returns to AM, first through a simple two-period analytical model, and second through a fully dynamic numerical case study. The analysis is motivated by research questions relating to the existence and form of asymmetry in the distribution of outcomes under AM. If outcomes are indeed asymmetric, (1) what are the implications for the expected versus most likely outcome, and (2) what is the potential for AM to buffer against large losses? The latter prospect leads to further questions: what role does risk aversion play in determining both the extent of exploration and the relative gains from AM? A misconception among some economists and others is that an increase in risk aversion will increase the value of information (Eeckhoudt and Godfroid, 2000). However, Hilton (1981) shows using a simple example that increased risk aversion can result in a lower value of information. Hilton and those who have reiterated the result focus on the value of perfect information. In contrast, AM processes typically involve incremental learning. Furthermore, optimal AM involves consideration of the opportunity cost of any exploratory action. In the context of fisheries management, Hilborn and Walters (1992) have argued that risk aversion “will make experimental or probing adaptive policies look less worthwhile” (p. 498). This argument follows from the observation that risk aversion enhances the opportunity cost of an exploratory policy. However, if learning reduces the likelihood of large losses, one might expect the potential benefits of AM (the value of information) to also increase. While the AM literature is lacking in explicit analysis of risk aversion, it is acknowledged qualitatively that this element could play an important role (e.g., Groeneveld et al., 2013). One possible explanation for the modest expected returns from AM summarized above (relative to a passive learning approach) is that AM is not differentially informative. As explained by Walters (1986), it is possible in certain cases for a passively adaptive policy to “produce essentially as informative a sequence” as AM (p. 249). Springborn and Sanchirico (2013) illustrate this in a fisheries context, finding that returns to an exploratory policy are small unless a departure from the passively adaptive policy is required to facilitate learning. I explore conditions under which AM fails or succeeds in producing a differentially informative path and thus net gains from exploration. To assess the research questions above, the value of AM is explored in the context of adaptive allocation of scarce trade inspection resources for invasive species. Agricultural and environmental systems are vulnerable to the introduction of pests and diseases which arrive mainly via the movements of traded goods. Despite a large scale effort to mitigate risk through border inspections in the U.S., resources for inspection are constrained. Thus the need to gather and use information to target inspections is widely acknowledged. Addressing the decision problem of targeting, for example based on commodity and origin, is complicated by imperfect information over the actual hazard posed by any particular source. In general terms, the dynamic management problem of interest here involves repeatedly allocating a given pool of management resources across a set of alternatives, each with an uncertain probability of providing a reward. Examples include allocating land parcels to different treatments, water resources to various demands, and compliance inspections across potential targets. Each of these involves apportioning units of constrained resources (parcels of land, units of water, inspections) to a particular management option (land use, water use, inspection target) generating both an immediate payoff and information for future decisions. Fully adaptive management has an intuitive appeal but can be difficult to implement in reality (Lee, 1999; Doremus, 2007). For example, when there exists only a rough sense that one management option has greater learning value than another, the value of learning may be only subjectively assessed and not immediately comparable with the opportunity cost of exploration (e.g. McDaniels, 1995). Another typical limitation is to largely constrain the potential true states of nature. For example, Bond and Loomis (2009) advance the typical treatment of a shallow lake pollution model to incorporate learning, but limit uncertainty over a nutrient threshold to two possible levels. Other examples of using a limited set of alternatives include applications to waterfowl (Johnson, 2011) and fisheries (Sainsbury et al., 1997). Another common simplifying constraint imposed on AM problems is to limit the information gathering phase, e.g. either to occur exclusively in the first period (e.g. Costello and Karp, 2004) or until a subjective level of confidence is achieved (e.g. Sainsbury et al., 1997). The stochastic dynamic optimization framework illustrated here contributes to the adaptive resource management literature by allowing for learning about multiple parameters that may take any value in a continuous range while explicitly estimating the value of information from exploration. A useful framework for addressing the exploitation–exploration tradeoff in this context is the multi-armed bandit (MAB) model, in which a decision maker chooses between alternatives with reward processes that operate like slot machines or

228

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

“one-armed bandits” (Weber, 1992). The true probability of receiving a reward from each alternative arm is not perfectly known. However, observing successive pulls of various levers reduces uncertainty over parameters characterizing the chance of success. The MAB model has been used in many settings,1 but this article is, to my knowledge, the first operational application to an environmental AM problem. While MAB problems are often intractable given the curse of dimensionality, methods like approximate dynamic programming which identify an optimal solution to an approximation of the problem can offer a way forward. The approximate dynamic programming technique used here is Lagrangian decomposition (Fisher, 2004). The technique decomposes a computationally prohibitive optimization problem into manageable subproblems where alternatives are solved independently and then recombined. Analytical and numerical results show that even when expected returns to AM are modest, AM can indeed play an important role in buffering against large losses given asymmetry in outcomes. At the same time, expectations for performance should be tempered by the observation that the most likely outcome from AM is a small increase in losses, for example when limited but costly exploration confirms initial expectations. Further, even though risk aversion increases the relative value of AM, the effect on the extent of exploration is mixed. While numerical results show that risk aversion drives faster exploration in the case study explored here, results from the analytical model suggest that this may not hold in general. To illustrate the AM allocation problem in the context of invasive species mitigation, I use an import inspections data set provided by the Animal and Plant Health Inspection Service (APHIS) of the U.S. Department of Agriculture (USDA). The records include the outcomes of approximately a decade of fruit and vegetable inspections for exotic pests at U.S. ports (1996–2006). During that time, over seven million shipments were inspected at 144 U.S. ports of entry. Chayote, olives, oranges and tomatoes are examples of the 284 commodities arriving from over 190 different countries. Almost 62 thousand shipments (0.8%) were found to be infested with a pest species of concern. In the next section, I formally describe the integrated mitigation-learning decision problem in a basic MAB framework and present results from a simple example to develop intuition. The setup for the decision problem closely follows Springborn et al. (2010). However the optimal endogenous learning solution illustrated here departs significantly from the simple, randomized learning strategy examined by Springborn et al. (2010). A Bayesian learning model and adaptive control framework Let pj represent the probability that the production techniques of a particular source j (commodity–country pair) result in the infestation of a shipment.2 Since this hazard rate is not known with certainty, let beliefs about the true value of pj be represented by a beta distribution. This distribution characterizes densities over the interval [0,1] with flexibility, is easily updated in a Bayesian way as observations accrue and is commonly used for similar processes. Each inspection of a shipment is modeled as a Bernoulli trial where pj is the probability of that a shipment is infested. For simplicity it is assumed that inspections always identify infestations when they are present. Let stj and ftj represent the so-called “hyperparameters” of the beta distribution, describing beliefs over the hazard t parameter pj at time t: pj Betaðstj ; f j Þ. Applying Bayes' rule to update this distribution with observed inspection results in very simple information dynamics. If the Bernoulli trial leads to a “success” (infestation present) then the failure parameter tþ1 t is unchanged (f j ¼ f j ) and the success parameter is incremented by one: stj þ 1 ¼ stj þ1. Conversely, given a “failure” (clean t þ1 t shipment) the success parameter is unchanged (stj þ 1 ¼ stj ) and the failure parameter is incremented by one: f j ¼ f j þ 1. As the sum of the two hyperparameters increases with observations, the variance of beliefs over pj falls (uncertainty is t reduced). The sum of the beta hyperparameters (stj þf j ) is often loosely referred to as the “sample size” because each observation of a Bernoulli trial augments the sum by one, typically increasing the concentration of beliefs. The initial framework, with simple Bayesian learning, is similar to that of Bertsimas and Mersereau (2007) who consider the adaptive learning of marketing message effectiveness. Below I formally describe the Bayesian adaptive control problem with the following notation:

J: The number of import sources (country–commodity pairs). Individual import sources will be indexed by j. K: The total number of inspections allocated simultaneously in a period. The classic MAB problem restricts the number of pulls to one (K¼ 1).

s~ j , f~ j : The initial hyperparameters of the beta distribution over pj. These values fully specify the subjective distribution of prior beliefs and can be established based on any information available at the outset (e.g. risk analyses and pre-existing inspection data). 1 Examples include experiment choice in clinical trials, path selection in network routing, setting prices to uncover market demand, job selection and others. See Bergemann and Välimäki (2006) for an overview. 2 Although the unit of inspections at U.S. ports is typically a bill of lading or manifest which may include multiple commodities, the focus in this article is on inspections of single-commodity shipments. When port inspections for possible pest risk are conducted, imports may be identified as problematic in several different ways. Containers or products may be found to be contaminated with unwanted material, including leaves and soil. Import documents may be found to have discrepancies with the accompanying shipments. Attention here is restricted to the single category of finding an actionable pest in, on, or with the product as the primary direct measure of infestation hazard.

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

229

stj, ftj: The updated hyperparameters of the beta distribution, incorporating all observations up to time t. xtj: The control or decision variable indicating the number of times a shipment from source j is inspected in period t.

The vector of controls at period t is given by xt ¼ ðxt1 ; …; xtJ Þ. The level of the decision variable is naturally constrained by the number of shipments received from j at time t: xtj r x tj ; 8 j ¼ 1; …; J. The classic MAB problem imposes the constraint x t ¼ 1 (i.e. each arm is available only once per period). ytj: The number of shipments from source j in time t which were inspected and found to be infested.

The dynamic programming framework includes the state, control variables, randomness, dynamics and rewards (Hawkins, 2003; Bertsimas and Mersereau, 2007):

State: The state of the system in period t is fully described by the vector of hyperparameters (of the beta distribution) for t

t

all sources: ðst ; f t Þ ¼ ðst1 ; …; stJ ; f 1 ; …; f J Þ. Control variables: The decision or control is given by xt ¼ ðxt1 ; …; xtJ Þ. The total number of inspections in time t is constrained by the number of available inspections: ∑Jj ¼ 1 xtj ¼ K. Randomness: Inspecting xtj shipments from source j results in the discovery of ytj infested shipments. Thus ytj is a betabinomial random variable with parameters xtj and pj, where beliefs over pj follow a beta distribution with parameters stj and ftj. tþ1 t Dynamics: Updating of the state of the system according to Bayes' rule is given by st þ 1 ¼ st þ yt and f ¼ f þxt yt . Reward: The expected payoff in a period is equal to the expected averted damages, d. Let cj represent the constant expected damage from failing to intercept an infested shipment from source j. Note that the expected value of a beta t random variable with parameters stj and ftj is stj =ðstj þ f j Þ. Given a risk neutral decision maker, the expected averted damage for period t is therefore given by " # " # ! h i J J J J stj t t t t t E ∑ dj ¼ E ∑ cj yj ¼ ∑ cj xj E pj ¼ ∑ cj xj ð1Þ t : stj þf j j¼1 j¼1 j¼1 j¼1 Et ½pj denotes the expected value of pj given beliefs at time t. Only the relative values of cj are needed as input to the decision model. While the expected damage from an infestation is an important uncertain decision parameter in its own right, I will set cj ¼1 in order to concentrate on learning about the hazard parameters, pj, for each source. Modeling and implications of a risk averse decision maker section are discussed in Section “Analytical insights from a two-source, twoperiod adaptive management problem” below.3

In each period t, the decision maker chooses how many shipments from each source j to inspect (i.e. a vector of controls, xt ) to maximize the total expected payoffs over an infinite horizon: ( n

EU ¼ max t x

J

s:t:

J

∑

j¼1

∑ xtj ¼ K;

j¼1

xtj

!

stj t

stj þ f j

þ

1

∑

w ¼ tþ1

β

wt

J

∑ E

j¼1

xtj A f0; 1; 2; …; minfx tj ; Kgg;

" xw j

!#)

sw j w

sw j þf j

8 j ¼ 1; …; J;

ð2Þ

where the expectation is taken over all possible future paths for the state and control variables. Inspections are limited by the inspection budget (K) and by the number of available shipments from a given source (x tj ). Note that the first summand in Eq. (2) could be combined with the second for a more concise expression which sums over time w ¼ t; …; 1. The current period t expected payoff appears separately in Eq. (2) to highlight two ideas. First, it makes explicit the distinction between immediate and future returns to the current choice of inspections (xt ). The separation also makes clear that the time t state (st ; f t ) is known while future states (and hence the control vectors they motivate) enter in expectation. The programming problem above is essentially J independent dynamic optimization problems which are linked by the constraint over the number of available inspections (K) in a period (Hawkins, 2003, p. 28). This problem may, in principle, be solved over a finite horizon using backwards induction. In the final period T, the optimal action is to simply exhaust the inspection budget by inspecting shipments with the greatest expected probability of infestation. Then one can back up t through time and solve each period conditional on the state (st ; f ). For most problems of practical importance this direct solution method is computationally prohibitive. 3 In this model it is assumed that payoffs are independent. The application of a treatment (inspection) to a particular target does not affect payoffs from other concurrent or future decisions, i.e. the true underlying hazard parameters are fixed. A related literature considers pollution control as a game between polluters and a regulator with incomplete information regarding the cost of abatement and thus the propensity for non-compliance (e.g. Harford and Harrington, 1991; Garvie and Keeler, 1994). Game theoretic extensions are reserved for further discussion in the conclusion to allow focus on the informational model and dynamics of learning without strategic response.

230

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

The multi-armed bandit framework Initial traction for the MAB problem came from two key results by Gittins and Jones (1972). The first result is that there exists an index value for each arm which informs an optimal priority-index policy; in each period it is optimal to pull the arm with the highest index. The second result is that the index value for a particular arm depends only on the state of that arm (e.g. stj and ftj) and no other arm. The overall multi-armed bandit problem can therefore be decomposed into multiple independent single-armed problems. This dynamic allocation index is commonly called the Gittins Index (GI). While the priority-index result vastly simplifies the problem to one of calculating the individual GIs, it is still typically impossible to obtain an analytical solution (Bergemann and Välimäki, 2006). However, recent advances in approximate dynamic programming provide a way forward. Before describing the implementation of this numerical method, in the next section, to develop intuition I explore analytically a simplified two-source, two-period version of the problem. Analytical insights from a two-source, two-period adaptive management problem The objective of this section is to develop insight into the question of whether important richness in AM outcomes is lost when focusing on the expected value. I explore asymmetry in outcomes and the role of risk aversion in determining both the extent of exploration and the relative returns to AM. Consider a two-source, two-period problem (j A fA; Bg, t A fi; iig) in which a single inspection is allocated in each time step. I assume that pB, the probability of success (infestation) for source B, t is known with certainty. Beliefs on the unknown probability of success for source A are given by ptA Betaðst ; f Þ, where the t subscript A on the belief parameters has been suppressed. In period t, let x ¼1 represent the decision to inspect A and xt ¼0 denote inspection of B. When damages are linear in the number of accepted infested shipments (risk neutrality), maximizing averted damages via interceptions results in the same optimal choice as minimizing damages from accepted infested shipments. However, when the damage function reflects risk aversion, nonlinearity in payoffs requires focusing on the latter (minimization) form of the problem. To incorporate risk aversion, linear expected damages—given simply by pj—are extended to be quadratic. If A is inspected, then B is not inspected and expected damages are given by dðxt ¼ 1; mÞ ¼ mp2B þ pB , where the quadratic coefficient, m Z 0, indicates the relative importance of the nonlinearity. If B is inspected, then expected damages from the A t t t shipment are given by dðxt ¼ 0; mjst ; f Þ ¼ mEðp2A jst ; f Þ þEðpA jst ; f Þ. This functional form incorporates aversion to reducible uncertainty around the parameter pj but not aversion to randomness in the Bernoulli trial (infestation outcome) conditional on pj. This form was chosen since it captures aversion to the type of uncertainty of most interest (reducible) while maintaining tractability for both the analytical model and the numerical case study.4 i Given initial beliefs ðsi ; f Þ, the optimal decision in period i—and the role played by the expected value of information (VOI) gleaned from inspecting A (xi ¼1)—will depend on the level of pB: 8 1 even without incorporating the VOI if pB A ½0; p B > > < ð3Þ xin ¼ 1 only when incorporating the VOI if pB A ðp B ; p B Þ > > : 0 even when incorporating the VOI if p A ½p ; 1: B

B

The region over which incorporating the benefits of learning (VOI) serves to change the first period optimal decision from inspecting B (xin ¼ 0) to inspecting A (xin ¼ 1) is given by pB A ðp B ; p B Þ. Below this range, pB is low enough that even the immediate (non-learning) incentive to choose A is sufficient for it to dominate. Above the range (p B ; p B ), pB is sufficiently high that B still dominates even if the VOI is acknowledged. The boundaries of this range can be derived using the insight that, at these points, the decision maker is indifferent between A and B (i.e. their payoffs are equal), conditional on (1) ignoring learning when identifying the lower bound (p B ), and (2) incorporating learning when identifying the upper bound Avr Avr (p B ). Expressions for these threshold values under both risk neutrality (p Neu ; p Neu B ) and risk aversion (p B ; p B ) are derived in B Appendix “Derivation of the adaptive management exploratory range”. Given the model specification above, we can now consider the first question of whether an exclusive focus on expected returns from AM is warranted, i.e. whether there are insights to be gained from considering the distribution of potential i outcomes. Within the exploratory range ðp B ; p B Þ, let Dðxi ; mjsi ; f ; pB Þ represent expected two-period damages under a i decision in the first period that is either exploratory (x ¼1) or not (xi ¼0), and under risk neutrality (m¼0) or risk aversion (m 40). Explicit expressions for D are derived in Appendix “Derivation of the adaptive management exploratory range” i i i (Eqs. (12) and (13)). Expected (optimal) AM gains are given by ΔDðsi ; f ; pB ; mÞ ¼ Dðxi ¼ 0; mjsi ; f ; pB Þ Dðxi ¼ 1; mjsi ; f ; pB Þ. The expected reduction in losses from adopting optimal AM is strictly positive when pB lies within the exploratory range, i.e. ΔDðsi ; f i ; pB ; mÞ 40 if pB A ðp B ; p B Þ. However, as shown here in the baseline case of risk neutrality, we are not assured that the distribution of changes in losses from AM is symmetric, nor that hoped for reductions are even the most likely outcome. 4 ~ t ; mÞ ¼ mEðy2 jst ; f t Þ þ Eðyjst ; f t Þ, where y represents the number of accepted infested shipments. Another reasonable form for nonlinear damages is dðx ~ t ¼ 1; mÞ is equivalent to dðxt ¼ 1; mÞ since p is fixed. When inspecting B, if n shipments from A are When inspecting A and accruing damages from B, dðx B ~ t ¼ 0; mÞ ¼ mnp ð1 p þ np Þ þ np ¼ n½ð1 þ mÞp þ ðn 1Þmp2 . However, under the current scenario of one shipment (n¼ 1) the function accepted, then dðx A A A A A A ~ simplifies to a linear form: ½ð1 þ mÞpA . dðÞ is a quadratic function of pj, for all shipments levels above one. Risk aversion is modeled here using dðÞ instead in part to maintain the quadratic form of the function under the simplified setting of one shipment.

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

231 i

Proposition I. Conditional on the two-source, two-period problem, suppose the decision maker is risk neutral (m ¼0), f 4 1, and discounting is negligible (β ¼ 1). Then, when adopting AM leads to a strict decrease in expected losses, the most likely change in losses is strictly positive for all pB A ðp B ; p B Þ. Proof. Let the true infestation state for shipments from both A and B in the two periods be given by the random vector Z ¼ fziA ; ziiA ; ziB ; ziiB g, where ztj ¼ 1 indicates infested and ztj ¼ 0 indicates not infested. Under risk neutrality (m¼0) and for any realization of Z, greedy strategy losses are ziA þ βziiA , and AM strategy losses are ziB þ βðziA ziiB þð1 ziA ÞziiA Þ. Subtracting greedy from AM losses and simplifying provides

ΔDðZÞ ¼ ðziB ziA Þ þ βziA ðziiB ziiA Þ:

ð4Þ

Under the assumptions given, the set of possible outcomes is given by ΔD A f 2; 1; 0; 1g. A complete proof of the proposition requires showing that PrðΔD 4 0Þ PrðΔD o 0Þ 40, which is provided in Appendix “Proof of Proposition I”.5 □ The setting above also illustrates that AM can help to buffer against large losses. While AM carries a maximum downside risk of a 1 unit increase in losses, it sometimes results in a 2 unit decrease in losses (i.e. when Z ¼ fziA ¼ 1; ziiA ¼ 1; ziB ¼ 0; ziiB ¼ 0g). It is not possible for losses to increase by 2 under AM since observing a clean shipment in period 1 induces identical actions under AM and non-AM in the second period (inspect B). Given that AM buffers against large losses in the risk neutral case above, a natural question is whether gains from AM are i i greater under risk aversion. Mathematically, the question is whether ΔDðsi ; f ; pB ; m 40Þ 4 ΔDðsi ; f ; pB ; m ¼ 0Þ. This comparison is complicated by the fact that the exploratory range (p B ; p B ) varies depending on whether the decision maker is risk neutral or risk averse to the extent that the ranges may not even overlap. Thus, comparing relative gains from AM at a single, fixed value of pB is not informative. To enable a comparison, first note that gains from AM are zero (by definition) at the top of the exploratory region, p B , and greatest at the bottom of the exploratory region, p B .6 Gains from AM (ΔD) across the exploratory region can thus be visualized as a wedge, with maximum potential gains at p B that decline to zero by p B . The result of interest then is stated in the following terms. Proposition II. The maximum potential gains from AM under risk aversion are greater than those under risk neutrality, i.e. i ΔDðsi ; f i ; pB ¼ p Avr ; m 40Þ 4 ΔDðsi ; f ; pB ¼ p Neu ; m ¼ 0Þ. Furthermore this relative dominance in maximum potential gains under B B risk aversion is: (1) increasing in the weight placed on the non-linear component of the damage function, m; (2) increasing in the i discount factor, β; and (3) decreasing in the confidence of beliefs as described by the “sample size”, si þf . ; m 4 0Þ and ΔDNeu ¼ ΔDðsi ; f ; pB ¼ p Neu ; m ¼ 0Þ. Using the expressions for damages in Proof. Let ΔDAvr ¼ ΔDðsi ; f ; pB ¼ p Avr B B Eqs. (12) and (13) from Appendix “Derivation of the adaptive management exploratory range”, the difference in maximum potential gain is given by i

i

ΔDAvr ΔDNeu ¼ mβ½1 EðpA jsi ; f i Þ½Eðp2A jsi ; f i Þ Eðp2A jsi ; f i þ 1Þ 4 0:

ð5Þ

The components of the last factor in brackets in the product above are given by the second moment of a beta random i i i i i i variable: Eðp2A jsi ; f Þ Eðp2A jsi ; f þ1Þ ¼ ðsi =ðsi þ f ÞÞðsi þ 1Þðsi þ f þ 1Þ ðsi =ðsi þ f þ 1ÞÞðsi þ 1Þ=ðsi þ f þ2Þ 4 0.7 Since mβ 40 and i Avr Neu i i EðpA js ; f Þ A ½0; 1, we know that ΔD ΔD 4 0. Eq. (5) is clearly increasing in m and β. Finally, as si þ f approaches 2 i i 2 i i infinity, the last factor in brackets, EðpA js ; f Þ EðpA js ; f þ1Þ, approaches zero. □ It is intuitive that additional maximum potential gains from AM under risk aversion relative to risk neutrality are decreasing in the discount rate and confidence. This is because the same is true for non-relative gains from AM (ΔD) in general—the less uncertain and forward looking the decision maker, the less the expected value of information. This finding is in contrast to the typical conclusion reached in fisheries stock assessment, as noted in the introduction, that risk aversion will make exploration policies “look less worthwhile” (Hilborn and Walters, 1992, p. 498). Such an argument follows from a setting in which risk aversion enhances the opportunity cost of an experimental policy. This effect is also at work in the present setting where risk aversion enhances the opportunity costs of the experimental choice of inspecting A instead of B.8 However the results here show that this opportunity cost effect is outweighed by the increase in benefits (averted damages) from learning. Given that the maximum potential gains from AM are greater under risk aversion, we might expect that risk aversion is also associated with a greater propensity to take exploratory action. However, this is not necessarily the case as formalized in the following sense. 5 This result—that the most likely outcome of a shift to AM can be an increase in losses even when expected losses decrease—will still hold in part i when some of the assumptions are relaxed. For example, relaxing the constraint on fi, when ðsi ; f Þ ¼ ð1; 1Þ the most likely outcome is still an increase in losses for all pB A ðp B ; p B Þ ¼ ð0:500; 0:555Þ, except for the very bottom of the range where pB A ð0:500; 0:501Þ. 6 This latter observation follows from the fact that D under the non-exploratory decision (xi ¼ 0) is flat over pB while under the exploratory decision it is i upward sloping. Mathematically, this observation is shown by the sign of the derivative of Eqs. (12) and (13): dDðxi ¼ 0; mjsi ; f ; pB Þ=dpB ¼ 0 and i dDðxi ¼ 1; mjsi ; f ; pB Þ=dpB 4 0. i i i i 7 The positive sign on this component follows from the fact that si =ðsi þ f Þ4 si =ðsi þ f þ 1Þ and ðsi þ 1Þ=ðsi þ f þ 1Þ4 ðsi þ 1Þ=ðsi þ f þ 2Þ. 8 Mathematically, this can be expressed as dðxt ¼ 1; m4 0Þ ¼ mp2B þ pB 4 dðxt ¼ 1; m ¼ 0Þ ¼ pB .

232

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

Proposition III. Relative to a baseline setting of risk neutrality, risk aversion will not always increase the state range over which the VOI induces exploration. Proof. To show that risk aversion expands the range of states over which it is optimal to explore (i.e. switch from the Neu Avr Neu “greedy” option to an alternative based on the VOI), it would be necessary to show that p Avr B p B 4 p B p B . However, using the expressions for these terms specified in Appendix “Derivation of the adaptive management exploratory range” (Eqs. (10), (11), (15), and (16)) it can instead be shown by simple numerical example that this exploratory range can increase or decrease when shifting from risk neutrality to risk aversion. For example, suppose the quadratic coefficient is m¼ 1, the discount rate is 3% and the success parameter is si ¼1. Then, if the failure parameter is fi ¼3, the range of pB over which it is optimal to explore is larger under risk aversion (0:306 0:275 ¼ 0:031) than under risk neutrality (0:279 0:250 ¼ 0:029). However, if fi ¼ 1, the exploratory range is smaller under risk aversion (0:592 0:541 ¼ 0:051) than under risk neutrality (0:554 0:500 ¼ 0:054).9 □ While the two-period, two-source problem examined above is useful for basic insights, the full problem involves many sources and a longer time horizon. It should also allow for any expected variability of imports across time and between sources. This is important because the value of learning depends on the degree to which one expects to make use of that information to make management decisions in the future. A flexible approach should also allow for any reasonable number of inspections to occur in a given time period. In the next section, a flexible solution method using approximate dynamic programming is described that incorporates each of these extensions. Approximate dynamic programming: the Lagrange multiplier solution method Dynamic programming suffers from the curse of dimensionality when brought to bear on problems like the MAB. Approximate dynamic programming can provide traction by identifying an algorithm which provides an approximately optimal solution.10 Here I approximate the value function via Lagrangian decomposition (LD). A review of this method appears in Fisher (1981) and is updated in Fisher (2004). The intuition for LD follows from the insight that many difficult problems can be recast as a set of easier individual problems, tied together by a linking constraint (Fisher, 2004). The LD method relaxes the exact dynamic program by decoupling the system, in the present case by separating the multiple arms of the MAB optimization problem (as was done above in the classical MAB problem). LD has been applied to bandit problems by Castañon (1997) and Hawkins (2003) and specifically to a beta-Bernoulli bandit problem by Bertsimas and Mersereau (2007) in the context of directing marketing messages. The approximation approach described below closely follows the formulation of the latter. The Bellman equation for the problem described specified in Eq. (2) is given by J

V t ðst ; f Þ ¼ max ∑ xtj Et ½pj þ βEt ½V t þ 1 ðst þ yt ; f þxt yt Þ t t

x

J

s:t:

t

ð6Þ

j¼1

xtj A f0; 1; 2; …; minfx tj ; Kgg;

∑ xtj ¼ K;

j¼1

8 j ¼ 1; …; J;

t

where Et ½pj ¼ stj =ðstj þf j Þ under the Bayesian learning model described in Section “A Bayesian learning model and adaptive t control framework”. Expectation of the value function is taken with respect to pj and yj for all sources. The problem above J t would be separable by each arm j except for the coupling constraint, ∑j ¼ 1 xj ¼ K. The first step in the LD method involves substituting this constraint for a Lagrangian term in the objective function: ! V λt ðst ; f Þ ¼ max λ t t

x

s:t:

0

J

K ∑ xtj j¼1

J

þ ∑ xtj Et ½pj þ β Et ½V λt þ 1 ðst þ yt ; f þ xt yt Þ

xtj A f0; 1; 2; …; minfx tj ; Kgg;

t

j¼1

8 j ¼ 1; …; J:

ð7Þ

Here the simplifying assumption is made that the Lagrange multiplier, λ0, is constant over all states and future periods.11 The system is now decoupled and we can write the relaxed value function for each arm j individually as follows: λ λ 0 t t V^ t;j ðstj ; f j Þ ¼ max xtj ðEt ½pj λ Þ þ βEt ½V^ t þ 1;j ðstj þytj ; f j þ xtj ytj Þ t xj

s:t:

xtj A f0; 1; 2; …; minfx tj ; Kgg

ð8Þ

Let the decision period be represented by t¼0. The objective now is to select the Lagrange multiplier λ0 and solve the subproblem for each arm given by Eq. (8). While there are multiple methods for selecting λ0 (Hawkins, 2003), an intuitively 9

The exploratory ranges for pB in this example are relatively small because there is only a single period in which to benefit from any learning. See Si et al. (2004) for an overview. In reality there is likely to be time-varying sequence of Lagrange multipliers but it is currently computationally prohibitive to allow for that flexibility. 10 11

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

233

appealing approach is to solve (8) for the level of λ0 which induces a feasible decision, i.e. such that the original constraint ∑Jj ¼ 1 x0j ¼ K is satisfied (Bertsimas and Mersereau, 2007). Practically, the numerical solution involves: (1) specifying a vector of possible Lagrange multiplier values given by λ, (2) 0 identifying xnj 0 ðλÞ for each λ A λ, and (3) identifying the minimum feasible multiplier, λ A λ. An example providing graphical intuition for this approach is given in Appendix “A graphical depiction of the Lagrangian decomposition method”. While the LD approach simplifies the allocation problem by decomposing the problem, simply estimating xnj 0 ðλÞ by itself is computationally prohibitive. In principle one could use a backward induction approach over a desired time horizon T to explicitly calculate the value function in the decision period. Even for a single source, the state space for this problem grows quickly. Given K ¼8 and T¼ 5, the final state space includes over 1 billion possibilities (2KT ). For adaptive inspections (and most, save the smallest of such problems) an approximate solution to the problem of estimating individual demand is required. Following Bertsimas and Mersereau (2007), the individual demand problem is relaxed by using a limited look ahead horizon, H, beyond which the decision maker temporarily ignores the potential for further learning. At time H, the approximate value function for source j is estimated as λ H H H V^ H;j ðsH j ; f j Þ ¼ ðmaxf0; ðE ½pj λÞgÞðminfK; E½x j gÞðT HÞ:

ð9Þ

Thus, H periods ahead of the decision point, the expected value of the system is the benefit of an inspection net of opportunity cost (EH ½pj λ), multiplied by the lesser of available inspections (K) compared to the expected number of 12 shipments available (E½x H The option not to inspect is exercised when j ), and by the number of periods remaining before T. H E ½pj o λ, in which case the value is zero. Starting from this approximation, backward induction can be used over periods t ¼ H 1; …; 0, based on Eq. (8) to arrive at an approximate solution for individual demand, xjn0 ðλÞ.13 The dynamic programming problem above reflects a risk neutral decision maker. It is not feasible to fully incorporate a decision maker who is risk averse across cumulative outcomes across all sources. Such a payoff structure would introduce an additional dependency in rewards across sources, making the problem intractable. As a tractable proxy for risk aversion, I consider the reducible-risk averse decision maker discussed in the simplified theoretical problem in Section “Analytical insights from a two-source, two-period adaptive management problem”. Thus in the risk averse formulation, the risk neutral reward Et ½pj is replaced with Et ½p2j þEt ½pj , where the weight on the squared term has been set to m¼1. Numerical application Analysis of the data shows that assuming each inspection is an independent Bernoulli trial with a fixed underlying parameter, pj, as outlined in the simple Bayesian model from Section “A Bayesian learning model and adaptive control framework”, is highly unrealistic. This assertion leads to inflated certainty about a given hazard estimate. For example, for one particular commodity–country pair in the sample, the beta-Bernoulli infestation hazard estimate based on observations from 1996 through 2001 is 0.0001. However, in 2002 the proportion of shipments inspected and found infested was over 50 times greater. Given the distribution of beliefs over pj by the end of 2001, the probability of the 2002 outcome is less than one in one million. In this instance, ignoring temporal correlation in the data leads to overfitting and poor predictive performance. Correlated or clustered observations arise in many fields of study, typically due to repeated longitudinal sampling of subjects (Neuhaus et al., 1991) or spatial relationships between sites (e.g. McCarthy and Lindenmayer, 2000). In the present numerical context, it is reasonable to suppose that fluctuating ecological or production dynamics might lead to correlation in the infestation hazard rate for shipments from a particular source arriving near each other in time.

Extended Bayesian learning model A natural method to handle correlated observations is to specify a hierarchical model (Gelman et al., 2004). To provide a concrete analogy to bridge between the simple and hierarchical models, it is useful to think of an infestation as the result of flipping a weighted coin, where the weight corresponds to the infestation hazard rate. If an outcome is heads, the shipment is infested, if tails, it is clean. The simple Bayesian learning model of Section “A Bayesian learning model and adaptive control framework” posits that every random trial from source j involves flipping the same single coin from j. In doing so, we learn about a single probability, pj. The hierarchical model, depicted in Fig. 1, involves not a single coin but rather a jar of coins for source j. Each period, a new coin is selected from the distribution of coins in the jar, here assumed to follow a beta distribution. During that period we can learn about the weighting of that particular coin (ptj). Over time, learning about ptj from multiple periods allows us to learn about the true (fixed) distribution of coins for source j (specified by Beta(sj,fj)). 12 Note that the time superscript, t ¼H, is attached to the term x H j since this quantity may be time varying, while for pj the superscript is not attached since here beliefs may vary but pj itself is fixed over time. 13 To further reduce the dimensionality of the problem, attention is restricted to future states which, given current beliefs, have a probability of occurring of greater than 10 6.

234

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

Hierarchical infestation and inspection model: inspection outcome

t

t

t

y j ~ Binomial ( x j , p j )

infestation hazard rate

t

p j ~ Beta ( s j , f j )

Fig. 1. The extended hierarchical model of infestation hazard and inspection outcomes.

The challenge then is to model beliefs over the true values of sj and fj, which are updated as we accrue observations on outcomes from j. Thus the assertion of a true or fixed underlying hazard, pj, is replaced by the assumption that ptj, the probability of infestation in period t, is itself determined by a draw from a unique latent population distribution (jar) for each source j. t t The model is hierarchical in the sense that the observed outcome—yj infested shipments from xj inspections—is modeled t conditionally on the hazard rate parameter, pj , which itself is drawn from the population distribution. A monthly time step for the index t is used because the percentage of infested shipments in the data shows significant variation down to that level. The hierarchical model allows for learning at two time scales. Within a given month t, observations ytj provide information which tighten beliefs about the particular draw of ptj. This information about ptj then reduces uncertainty about the population distribution from which the ptj's are drawn, that is, the true underlying beta hyperparameters, sj and fj. Let π ðsj ; f j jy1j …ytj Þ represent the decision maker's beliefs about sj and fj given all observations through month t. The added flexibility of the hierarchical model comes with some cost in additional complexity in the information dynamics. Construction of an initial prior and calculation of the posterior distribution are described in detail in Appendix “A hierarchical Bayesian model for temporally correlated observations” and largely follow the empirical Bayesian approach of Gelman et al. (2004, p. 128). To economize on notation below, π t ðsj ; f j Þ will represent updated beliefs, π ðsj ; f j jy1j …ytj Þ. Solving for inspection demand To characterize the import pest risk process, monthly inspections data (described in the introduction) are merged with monthly measures of total imported shipments published by the U.S. Department of Commerce (USDOC, 2006). Because this matching of records is complicated by different methods for identifying commodities employed by the USDOC and APHIS, I focus attention in this application on inspections of one commodity, namely tomatoes.14 I restrict the data set to 23 exporting countries with at least one U.S. border inspection on record. Because Monte Carlo simulation is used to test the efficacy of the AM learning model, it is necessary to generate many plausible time series of “true” infested shipments for each source. To do this, updated beliefs over the distribution of the true hyperparameters for each source are calculated based on T ¼129 months of inspections data, π T ðsj ; f j Þ. For each source, the mean of this posterior distribution (s^ j ; f^ j ) is then taken as the estimate of the “true” set of hyperparameters. Next, for Monte Carlo analysis, multiple data samples were generated in two steps. For each simulation, first a series of ptj's for t ¼ 1; ‥; T is drawn to cover each month in the shipping record for each source j. Then an infestation process is created by conducting x tj Bernoulli random trials under parameter ptj, where x tj is the number of imported shipments in the data record for source j in month t. When optimal, the adaptive allocation method outlined in Section “Approximate dynamic programming: the Lagrange multiplier solution method” directs exploration beyond the most hazardous source in the decision period to enhance longrun performance. Many months might be necessary to demonstrate the advantages of this method. While implementing the adaptive decision algorithm for a single decision period is not computationally taxing, testing performance over both an extended time horizon and a large number of Monte Carlo simulations can be prohibitively time consuming. Thus, to facilitate the analysis, the inspection and trade process are adjusted in two ways. First, import shipments are scaled down such that in each period only a limited number of shipments are inspected (e.g. K ¼8) versus the actual typical range of hundreds per month. Second, because the number of inspections is constrained to be small, learning about a low rate of infestation (e.g. 1.5%) takes a particularly large number of periods. The empirical individual infestation rates are therefore scaled to bring the overall mean to 10%. Given this stylization (and the focus on a particular commodity) the objective of the numerical application is to draw out general insights rather than generate precisely calibrated empirical estimates for the system. Solving for inspections demand requires a forecast of future availability of shipments from each source. As a proxy for this shipping forecast, future availability of shipments for inspection, E½x tj over the 3-month lookahead horizon are set equal to the average shipping rate over the 12-month period centered around t.15 Since, under the hierarchical model, infestations 14 I wish to acknowledge the work of the Economic Research Service at USDA in relating the two databases, in particular the efforts of Donna Roberts and collaborators. 15 Of course managers would not know future import levels (t ¼ t þ 1; …; t þ 6) with certainty. The proxy used here is meant to generate a rough, constant projection that is informative but imperfect.

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

235

in a given period follow from a period-specific hazard level (ptj), inspections are particularly informative with respect to subsequent shipments in the same period. To take advantage of this correlated risk, each period is divided into four decision subperiods (weeks). In each subperiod, K/4 inspections are allocated over the various sources, observations on discovery of infestations are recorded and beliefs (π t ðsj ; f j Þ) are updated. The practical advantage of this approach is that results from inspections earlier in the period improve targeting of inspections later in the period, for example allowing for the concentration of effort on a source found to be particularly likely to be infested in a given period. Finally, the limited lookahead horizon is set to H¼ 3 months. To assess the relative gains from the adaptive dynamic programming (ADP) approach with optimal endogenous learning, the performance of two alternative management policies is also assessed. Both alternatives are “greedy” in the sense that they allocate inspections based only on the immediate returns in the decision period, a myopic approach which ignores the potential value of learning. The two alternatives differ only in how observations are assimilated by the decision maker. The first alternative is a greedy maximum likelihood estimate approach (GMLE) in which all observations from a source are treated as independent and the estimate of the hazard rate is given by the maximum likelihood estimate of pj (simply the historical proportion found infested). The second, greedy Bayesian approach (GBayes), takes advantage of the hierarchical learning model to estimate ptj. Since the two alternatives still involve updating, they are also adaptive, but only in a passive sense.

Results We might expect to observe in the short run that the performance of the ADP algorithm—in terms of cumulative interceptions of infested shipments—is worse than alternative approaches since some portion of effort could be directed towards actions with higher learning value and lower expected hazard. In turn, we might then expect the ADP algorithm to outperform the GMLE and GBayes alternatives once sufficient time has passed for the ADP approach to take advantage of the early investment made in learning. However, results show that this prediction will not necessarily hold. In a direct application of the three allocation approaches to inspections as described in the previous section, no significant difference is found in performance (average cumulative interceptions). In short, the reason is that when there is little chance for myopic policies to mistakenly fixate on suboptimal alternatives, a more sophisticated optimal endogenous learning approach will not generate an improvement. Here this dynamic becomes clear after assessing the interaction between the distribution of hazard levels across sources and shipping records. The empirical posterior expected hyperparameters based on the actual data are presented in the left panel of Fig. 2. Each point in the plot represents a particular source. Note that there are four relatively “rewarding” sources with an expected hazard level, E½p0j , greater than 0.01. In general, we might expect an active adaptive approach (ADP) to perform better to the extent that alternative approaches (GMLE or GBayes) mistake lower hazard sources for a preferred option. In this case, the availability of shipments for inspection prevents such confusion. Shipments from the two most rewarding sources are available for inspection in the vast majority of periods. Shipments from the next two sources are available in only a small portion of periods. The remaining sources each feature too small of an expected hazard to be confused by the simpler myopic approaches as a top option. Essentially, we see little difference between approaches due to idiosyncratic patterns of availability and hazard variation. For more insight into when an active adaptive approach would be of use, the analysis above is repeated with a stylized, hypothetical set of five sources, as depicted in the right panel of Fig. 2, under both a risk neutral and risk averse decision

Empirical source parameters

Stylized source parameters

400

400

300

300

200

200

100

100

0

0

0.01

0.02

0.03

0.04

0

0.3

0.4

0.5

0.6

Fig. 2. Source parameters used to generate an underlying “true” data record. The left panel shows empirical posterior expected levels of the summed hyperparameters (s^ j þ f^ j ) and hazard rate (E½pj ) for various sources of tomatoes. The right panel shows the same values for a hypothetical set of sources.

236

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

maker. Shipping levels are assumed to be constant such that each source is always available for inspection. As above, in each period, a total of 8 shipments may be inspected. Results under risk neutrality Outcomes averaged over 500 Monte Carlo simulations are shown in Fig. 3 for the risk neutral case. The change in the cumulative present value of damages (PVD) of ADP and GBayes relative to a GMLE baseline is plotted in Fig. 3A as a function of the time horizon. When the percentage change curve falls below zero this indicates that, relative to the GMLE baseline, the decision maker has achieved a reduction in the PVD using either the ADP or GBayes approach. Under risk neutrality, the ADP approach lags initially as exploration across different sources is relatively strong. In this example, after costly investment in learning over about 6 months, ADP performance begins to recover, eventually recouping the investment and surpassing both alternatives after 23 months. The GBayes performance eventually parallels that of ADP, indicating that the performance rate per period has equalized between the two. However, this per-period equity emerges only after a lag of around 150 months. While the relative gains of ADP and GBayes over GMLE are statistically significant, the practical difference in average performance is modest. After 200 months, ADP confers an advantage of just 0.6% relative to GBayes and 1% relative to GMLE. As discussed in the introduction, such a modest relative expected gain from taking an active AM approach under risk neutrality is consistent with typical results from the existing literature (Springborn and Sanchirico, 2013; Bond and Loomis, 2009; Rout et al., 2009). Although the average performance advantage of ADP over GMLE is modest, as suggested by analytical results in Section “Analytical insights from a two-source, two-period adaptive management problem”, a more nuanced story emerges from considering the distribution of outcomes. In Fig. 3B, the frequency distribution across simulations of changes in PVD under ADP relative to GBayes shows substantial asymmetry. Recall that one finding that emerged from the analytical model (under risk neutrality) was that even when an AM approach generates a strict decrease in expected losses, the most likely change in losses may be strictly positive (Proposition I). Results confirm that this remains the case in the full numerical model. While on average there is a decrease in PVD under ADP, the median outcome is actually a small increase in losses (0.3% relative to both GMLE and GBayes). Fig. 3B further illustrates that the most common outcome under ADP is a small increase in PVD (losses). However, the most compelling story is in the extremes. Results from the two-period model showed that AM holds the potential for buffering against large losses with modest opportunity costs. This is borne out in the numerical application. Relative to GMLE, ADP results in substantial increases in losses (4 5%) for only a very small fraction of cases (1%). However, ADP leads to substantial reductions in losses ( 4 5%) in 11.0% of simulations. Thus in this case, implementing the optimal endogenous learning approach can be thought of as a form of insurance. In most instances, a minor investment in learning confirms the desirability of choices made under less sophisticated but still informative strategies (e.g. GMLE). However, in approximately one of every nine realizations, ADP serves to protect against very poor performance ( o 5% difference). Results under risk aversion The asymmetry of outcomes under risk neutrality suggests that risk aversion may have a substantial impact on the value of ADP and the degree of exploratory action. A related analytical result from Section “Analytical insights from a two-source, two-period adaptive management problem” was that the reduction in losses from AM will be more substantial under risk

0.7

1 ADP GBay

0.6 0.5

frequency

avg. % change in PVD from GMLE baseline

0.5

0

−0.5

0.4 0.3 0.2

−1 0.1 −1.5

0

50

100

month

150

200

0

−10

−5

0

5

10

% change in PVD under ADP

Fig. 3. Results under risk neutrality: (A) the average percent change in PVD under ADP and GBayes over time relative to a GMLE baseline, and (B) the frequency distribution of the percent change in PVD under ADP relative to a GBayes baseline (500 Monte Carlo simulations).

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

0.7

2 ADP GBay

0

0.6 0.5

frequency

avg. % change in PVD from GMLE baseline

237

−2

−4

0.4 0.3 0.2

−6 0.1 −8

0

50

100 month

150

200

0

−40 −20 0 20 40 % change in PVD under ADP

Fig. 4. Results under risk aversion: (A) the average percent change in PVD under ADP and GBayes over time relative to a GMLE baseline, and (B) the frequency distribution of the percent change in PVD under ADP relative to a GBayes baseline (500 Monte Carlo simulations).

aversion than risk neutrality (Proposition II). Numerical results confirm that this is indeed the case in both level and percentage terms. For example, in percentage terms, the average ADP advantage over GMLE is seven times greater under risk aversion (7% versus 1%). The average ADP advantage over GBayes also climbs, though more modestly (2.5% versus 0.6%). Results under risk aversion are presented in Fig. 4. From Fig. 4A we also see that the costly period of investment in exploration under ADP apparent under risk neutrality (Fig. 3A) no longer holds under risk aversion—a reduction in the PVD is achieved even over a short time horizon. In the risk neutral case, there was substantial asymmetry in the frequency distribution across simulations of changes in PVD under ADP relative to GBayes. This result also holds under risk aversion, as apparent in Fig. 4B. However, average and median outcome are now of the same sign; both the expected and most likely changes are a decrease in the PVD. In addition, the insurance result is strengthened. Relative to GBayes, ADP results in a substantial increase in losses ( 4 5%) relative to GMLE for 10% of cases while leading to a substantial decrease in losses ( o 5%) in 21% of cases. This asymmetry is even stronger in the tails of the distribution: the chance of a large increase in PVD ( 4 10%) is much smaller (3%) than the chance of a large decrease in PVD ( o 10%), which is 12%. Finally, recall that analytical results from Section “Analytical insights from a two-source, two-period adaptive management problem” regarding the impact of risk aversion on the rate of exploration identified an ambiguity: Proposition III suggested that, relative to a baseline setting of risk neutrality, a risk averse decision maker would not necessarily increase the rate of exploration.16 This ambiguity arises from two competing effects: risk aversion enhances the VOI but also the opportunity cost of exploration. Numerical results show that across both specifications for risk preferences, the expected level of exploratory action initially lies in the range of 10–15% through the first several years before tapering off over time. However, exploration under risk aversion is approximately 5% higher than under risk neutrality through the early periods (20 months). From there, risk averse exploration is first almost equal (1% higher through month 100) and then lower (4% lower from month 101–200). Overall, the net effect of risk aversion in this setting is to increase the initial net returns to exploration so that exploration is concentrated in earlier periods. Risk aversion speeds up learning, though whether this results in a net increase in cumulative exploration over time will be sensitive to the time horizon considered.

Discussion In this article, I examine the distribution of the value of AM under risk neutrality and risk aversion. Results show that even when expected returns to AM are modest, asymmetry in the distribution of outcomes has important implications. First, AM can serve like a form of insurance in which it is rational for the decision maker to accept the most likely outcome is a small net loss because the exploratory approach buffers against potential large losses. Second, when heightened concern over large losses is explicitly incorporated through risk aversion the relative value of an AM approach increases. Third, numerical results show that risk aversion incentivizes more initial exploration in the case study explored here. However, analytical results illustrate that this will not necessarily hold in general. 16

An exploratory choice is defined as one in which a different source is chosen for inspection than the one that is myopically optimal.

238

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

These findings suggest that the strongest argument for adopting an optimal endogenous learning approach may not necessarily be found in the expected outcome but rather the way in which AM serves as insurance against extremely poor outcomes that can result without some level of active experimentation. However, if management actions under AM are not differentially informative (relative to a passive policy), as in the numerical case of empirically informed sources first examined, then AM will not confer a substantial advantage of any kind. The generality of the results with respect to risk aversion rests largely on the nature of the improvements from the use of AM. In this article the benefits of AM take the form of reducing a bad rather than increasing a good. Thus, large reductions in infested shipments are of relatively greater value under risk aversion than under risk neutrality. Alternatively, if the intention of AM is to eke out increases of a good, the value of large improvements is not enhanced under risk aversion. This article also demonstrates recently developed computational methods from stochastic and approximate dynamic programming to address a practical AM problem. An extended Bayesian learning model is adapted to show how this flexible approach can incorporate non-trivial stochastic processes. Future extensions could involve relaxing simplifying assumptions made here, including the assumption of a stationary infestation process. While the hierarchical model used allowed for variation in the infestation process over time, this variation was random (conditional on the hyperparameters). An argument could be made for allowing the infestation probability to move in a particular direction over time, either up or down as sanitary effort and environmental conditions change. This could be modeled as a “restless bandit” problem, e.g. as discussed by Whittle (1988) in the context of surveilling submarines or developing treatments for a virus which continues to evolve. Finally, it may also be the case that source countries respond strategically to inspection, thus resulting in an “adversarial bandit” problem (Auer et al., 1995).

Acknowledgments I am grateful to Christopher Costello, S. Rao Jammalamadaka and an anonymous reviewer for insightful feedback on earlier versions of this article. Helpful comments were provided by participants in the BESTNet/DIVERSITAS ecoSERVICES Workshop on the Economic and Ecological Science and Management of Invasive Species at Arizona State University. Appendix Derivation of the adaptive management exploratory range To solve for p B we must find the point at which the decision maker is indifferent between selecting A or B in the absence of any learning, i.e. where damages from inspecting each source for a single period are equal (since learning values for the i i future are ignored): mEðp2A jsi ; f Þ þ EðpA jsi ; f Þ ¼ mðp B Þ2 þ p B . In the risk neutral case (m ¼0) and the condition simplifies to i p Neu ¼ E pA si ; f B ¼

si i si þf

:

ð10Þ

In the risk averse case (m 4 0) we use the quadratic formula to solve for p B (eliminating the non-feasible negative root): vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ! ! u u si si þ 1 t 1 þ 1 þ 4m 1þm i i si þ f si þ f þ 1 : ð11Þ p Avr ¼ B 2m To solve for p B , we seek the point where expected damages over the two periods are equal whether or not the learning i option is chosen. Given the definition of p B , we know that while B is believed to be riskier in the first period (p B 4 EðpA jsi ; f Þ) what is learned from an inspection of A in the first period can change the updated risk ordering for the second period: i p B oEðpA jsi ; f ; xi ¼ 1; yi ¼ 1Þ, where yi ¼1 indicates the shipment was found to be infested. Note that Bayes rule implies that i i a success or failure leads to simple updating of either the s or f parameter: EðpA jsi ; f ; xi ¼ 1; yi ¼ 1Þ ¼ EðpA jsi þ1; f Þ i i and EðpA jsi ; f ; xi ¼ 1; yi ¼ 0Þ ¼ EðpA jsi ; f þ1Þ. This implies that two-period expected damages from the learning option are given by damages from B (the non-inspected shipment) in period i and discounted damages from the least risky source in period ii, conditional on expectations over learning: Dðxi ¼ 1; mjsi ; f ; pB Þ ¼ dðxi ¼ 1Þ þ βEðdðxii jyi ÞÞ ¼ mp2B þ pB þ β½EðpA jsi ; f Þ½mp2B þ pB i

i

i

þ ½1 EðpA js ; f

i

i i Þ½mEðp2A jsi ; f þ1Þ þ EðpA jsi ; f þ 1Þ:

ð12Þ

At p B , when B (the non-learning option) is chosen in the first period, B will also be the optimal choice in the second period. Two-period expected damages from the non-learning option are given by the discounted sum of damages from A in both periods (with no updating of beliefs): Dðxi ¼ 0; mjsi ; f Þ ¼ dðxi ¼ 0jsi ; f Þ þ βdðxi ¼ 0jsi ; f Þ ¼ ½mEðp2A jsi ; f Þ þ EðpA jsi ; f Þ½1 þ β : i

i

i

i

i

ð13Þ

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242 i

239

i

To identify p B we set Dðxi ¼ 1; mjsi ; f ; pB ¼ p B Þ ¼ Dðxi ¼ 0; mjsi ; f Þ and solve. As before with p B , we solve for the risk neutral case (m ¼0) directly and for the risk averse case (m 4 0) using the quadratic formula (eliminating the non-feasible negative root). To specify these two results concisely it is useful to define the following expression: ! ! i si si þ 1 f si si þ1 1þβ 1 þ m β 1 þm i i i i i si þ f si þ f si þf þ1 si þf þ 1 si þ f þ2 κ ðmÞ ¼ : ð14Þ i s 1þβ i si þ f The threshold conditions are then given by p Neu ¼ κ ðm ¼ 0Þ; B and p Avr B ¼

1 þ

ð15Þ

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 þ4mκ ðmÞ : 2m

ð16Þ

Proof of Proposition I Proving the proposition requires showing that PrðΔD 40Þ PrðΔD o0Þ 4 0. There is no known closed form expression for the distribution of ΔDðZÞ given the combination of non-identically distributed Bernoulli trials. However, it is feasible to specify the set of possible payoff outcomes and calculate the likelihood of each one. The condition on the difference in likelihood of positive and negative outcomes is given by PrðΔD 4 0Þ PrðΔD o 0Þ ¼ ðð1 EðpA ÞÞpB þ EðpA Þð1 EðpAþ ÞÞp2B Þ ðEðpA Þð1 pB Þ2 þ 2EðpA ÞEðpAþ Þð1 pB ÞpB Þ ¼ EðpA ÞEðpAþ Þp2B þ ðEðpA Þ½1 2EðpAþ Þ þ 1ÞpB EðpA Þ 4 0:

ð17Þ

Using the quadratic formula there is one positive and one negative root for pB for the condition in (17). The negative root condition is always satisfied since pB Z 0. The positive root condition is satisfied when pB 4

b þ

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 b þ4EðpA Þ2 EðpAþ Þ 2EðpA ÞEðpAþ Þ

p~ B ;

ð18Þ i

where b ¼ EðpA Þ½1 2EðpAþ Þþ 1. It is straightforward to prove by contradiction that this condition holds for f 4 1 and all i pB A ðp B ; p B Þ since p~ B Z p B implies that f r 1. A graphical depiction of the Lagrangian decomposition method For graphical intuition into the Lagrange multiplier decomposition method, the inspection allocation process is depicted for two hypothetical sources, A and B, and an inspection budget of K ¼7 in Fig. 5. To begin, individual demand for inspections for each source j A fA; Bg is determined for a range of potential values of the marginal opportunity cost of an observation (λ0). Then individual demand curves (left and middle panels) are aggregated and the “market clearing” price (minimum feasible

Lagrange multiplier,λ

1

Individual demand: Source A

0.8

Individual demand: Source B

Aggregate demand: A and B

0.6 0.4 0.2 0

E[pA]=0.20 0

5 0

x*A(λ)

E[pB]=0.15 10 0

5 0 B

x* (λ)

10 0

5 0 A

10 0 B

x* (λ) + x* (λ)

Fig. 5. Individual and aggregate demand for inspections (horizontal axis) for two hypothetical sources, A and B, over a range of Lagrange multiplier values (vertical axis). A solid horizontal line is drawn at λ0, the minimal level of λ which induces a feasible solution for an inspection budget of 7. Dotted horizontal lines show expected hazard levels.

240

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

E[pA]= 0.05

Lagrange multiplier,λ

0.5

E[pB]= 0.06

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0

0

1

2

3

E[pD]= 0.13

0.5

0

0

1

2

0

3

E[pE]= 0.27

0.5

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0

1

2

3

0

0

1

2

0

3

0

1

2

3

E[pF]= 0.33

0.5

0.4

0

E[pC]= 0.09

0.5

0

1

2

3

Individual demand for inspections Fig. 6. An example of numerical demand estimation for three inspections over six sources. A dash-dot line is drawn at 0.35, the highest Lagrange multiplier value which fully allocates the inspection budget. Individual dotted lines are drawn at E½ptj (where the time superscript is suppressed).

λ, solid horizontal line) is determined from the aggregate demand curve. The demand curves are drawn as convex, consistent with numerical results described below.17 Numerical estimates of individual demand for inspections under the ADP approach (see Section “Solving for inspection demand”) for a particular decision subperiod are presented in Fig. 6 for six countries, labeled A through F. This is the numerical analog of the stylized presentation of demand in Fig. 5. The output graphically shows how up to three inspections should be allocated over six alternative sources (for sake of illustration), ordered in the figure from lowest to highest expected hazard, E½ptj . Note that individual demand always lies on or above E½ptj (dotted line) given a non-negative value of information (which attenuates with each inspection). The expected immediate reward E½ptj is constant across inspections. The GMLE policy (see Section “Solving for inspection demand”) in this example would be to simply allocate all inspections to the source with the highest expected hazard, county F. The ADP policy is identified by the dash-dot line, drawn in both rows at the minimum feasible level of the shadow value of inspections, λ0. To use a market analogy, the dash-dot line represents the market clearing (shadow) price of inspections. At the highest Lagrange multiplier value which exactly allocates the inspection budget, two inspections are allocated to the most hazardous source, F, while the value of exploration shifts the final inspection to a shipment from country E. In the example presented in Fig. 6, the ADP policy generates a lower immediate expected reward (1n0.27 þ 2n0.33 ¼0.94) than the GMLE policy (3n0.33 ¼0.99). However, incurring this immediate opportunity cost is dynamically optimal given the expected value of information gathered. A hierarchical Bayesian model for temporally correlated observations The simple (non-hierarchical) Bayesian learning model takes advantage of the fact that the posterior (beta) distribution on pj follows the same probability distribution as that of the prior—a property known as conjugacy. Unfortunately the same 17 While in practice these demand curves will be step functions given the integer nature of inspections, they are depicted here in stylized form for simplicity. Numerically informed demand curves which do take the form of step functions are presented later in this section.

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

241

property does not hold under the hierarchical model. That is, there does not exist a sensible distribution for modeling beliefs on the hyperparameters (sj ; f j ) that updates in a straightforward fashion as Bernoulli trials on ptj are observed. We can, however, specify an equation using Bayes rule which captures updated beliefs on these hyperparameters after observing a series of outcomes. Let π ðsj ; f j Þ represent the prior distribution on the hyperparameters. The marginal posterior distribution of beliefs over (sj,fj) is characterized by18

π ðsj ; f j jy1j …ytj Þ p π sj ; f j

Γ ðsj þf j Þ Γ ðsj þ yuj ÞΓ ðf j þ xuj yuj Þ : Γ ðsj þf j þ xtj Þ u ¼ 1 Γ ðsj ÞΓ ðf j Þ t

∏

ð19Þ

Because the posterior cannot be simplified analytically to match a known probability distribution (i.e. conjugacy does not hold), the posterior density in Eq. (19) is computed numerically. Completing the extended Bayesian learning framework requires specifying subjective prior beliefs over the hyperparameters ðsj ; f j Þ. When constructing this prior it is more intuitive to consider reasonable initial beliefs over the mean hazard, p j ¼ sj =ðsj þ f j Þ, and “sample size”, ηj ¼ sj þf j (or concentration) of the distribution, rather than sj and fj directly (Gelman et al., 2004). For the commodity group of interest here, I assume that initially the decision maker has only an understanding of average outcomes in general and has low confidence in initial identical beliefs about each source. Initial beliefs regarding the mean hazard are given by gðp j Þ. Initial beliefs about how variable the dynamic hazard rates are given by hðηj Þ.19 These beliefs are diffuse in the sense that they are quickly dominated by observed data for a given source. Given the two independent assigned prior distributions over p j and ηj, a transformation to sj and fj and multiplication by the appropriate Jacobian provides the specification for initial prior beliefs over the hyperparameters: ! sj 1 π sj ; f j p ðsj þf j Þ g h sj þf j : ð20Þ sj þf j Following the empirical Bayesian approach of Gelman et al. (2004, p. 128), to facilitate numerical calculation of the posterior, Eq. (20) is reparamaterized in terms of the logit of the mean and the natural log of the sample size of the beta distribution, respectively logit(p j ) and log(ηj). The posterior density is then computed over a grid of values covering the effective range of ðp j ; ηj Þ. References Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R., 1995. Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: 36th Annual Symposium on Foundations of Computer Science, 1995. Proceedings. IEEE, New York, NY, pp. 322–331. Bergemann, D., Välimäki, J., 2006. Bandit problems. Discussion Paper No. 1551, Cowles Foundation, Yale University, New Haven, Connecticut. Bertsimas, D., Mersereau, A.J., 2007. A learning approach for interactive marketing to a customer segment. Oper. Res. 55 (6), 1120–1135. Bond, C., Loomis, J., 2009. Using numerical dynamic programming to compare passive and active learning in the adaptive management of nutrients in shallow lakes. Can. J. Agric. Econ./Rev. Can. Agroecon. 57 (4), 555–573. Castañon, D., 1997. Approximate dynamic programming for sensor management. In: Decision and Control, Proceedings of the 36th IEEE Conference on Decision and Control, vol. 2, pp. 1202–1207. Costello, C., Karp, L., 2004. Dynamic taxes and quotas with learning. J. Econ. Dyn. Control 28, 161–180. Doremus, H., 2007. Precaution, science, and learning while doing in natural resource management. Wash. Law Rev. 82, 547–579. Doremus, H., 2010. Adaptive management as an information problem. NCL Rev. 89, 1455. Eeckhoudt, L., Godfroid, P., 2000. Risk aversion and the value of information. J. Econ. Educ. 31 (4), 382–388. Eiswerth, M., van Kooten, G., 2002. The economics of invasive species management: uncertainty, economics, and the spread of an invasive plant species. Am. J. Agric. Econ. 84 (5), 1317–1322. Fisher, M., 1981. The Lagrangian relaxation method for solving integer programming problems. Manag. Sci. 27 (1), 1–18. Fisher, M.L., 2004. Comments on “The Lagrangian relaxation method for solving integer programming problems”. Manag. Sci. 50 (Suppl. 12), S1872–S1874. Freeze, R., Massmann, J., Smith, L., Sperling, T., James, B., 1990. Hydrogeological decision analysis. 1. A framework. Ground Water 28 (5), 738–766. Garvie, D., Keeler, A., 1994. Incomplete enforcement with endogenous regulatory choice. J. Public Econ. 55, 141–162. Gelman, A., Carlin, J., Stern, H., Rubin, D.B., 2004. Bayesian Data Analysis, second ed. Chapman and Hall, CRC, Washington, District of Columbia. Gittins, J.C., Jones, D.M., 1972. A dynamic allocation index for the sequential design of experiments. In: Gani, J. (Ed.), Progress in Statistics, North-Holland, Amsterdam. Groeneveld, R.A., Springborn, M., Costello, C., 2013. Repeated experimentation to learn about a flow-pollutant threshold. Environ. Resour. Econ., published early online, 1–21. Harford, J., Harrington, W., 1991. A reconsideration of enforcement leverage when penalties are restricted. J. Public Econ. 45 (3), 391–395. Hawkins, J., 2003. A Lagrangian Decomposition Approach to Weakly Coupled Dynamic Optimization Problems and its Applications (Ph.D. thesis). Operations Research Center, MIT. Hilborn, R., Walters, C.J., 1992. Quantitative Fisheries Stock Assessment: Choice, Dynamics and Uncertainty. Chapman & Hall, London. Hilton, R.W., 1981. The determinants of information value: synthesizing some general results. Manag. Sci. 27 (1), 57–64. Holling, C. (Ed.), 1978. Adaptive Environmental Assessment and Management, Wiley, Toronto, Ontario, Canada. Johnson, F., 2011. Learning and adaptation in the management of waterfowl harvests. J. Environ. Manag. 92 (5), 1385–1394. Kaplan, J.D., Howitt, R.E., Farzin, Y.H., 2003. An information-theoretical analysis of budget-constrained nonpoint source pollution control. J. Environ. Econ. Manag. 46, 106–130. Kelly, D., Kolstad, C., 1999. Bayesian learning, growth, and pollution—a comparison of alternative solution methods. J. Econ. Dyn. Control 23 (4), 491–518. Lee, K., 1999. Appraising adaptive management. Conserv. Ecol. 3 (2), 3.

18 See Gelman et al. (2004, Section 5.3) for a derivation. As is common in Bayesian applications, the numerator of the posterior, a normalizing constant, is omitted. 19 Specifically, a beta pdf, gðp j Þ, is specified with parameters α and β, where E½p j ¼ α=ðα þ βÞ ¼ 0:015 and α þ β ¼1. In addition a log-normal pdf, hðηj Þ, is specified with μ ¼ 5 and σ ¼1.

242

M.R. Springborn / Journal of Environmental Economics and Management 68 (2014) 226–242

McCarthy, M., Lindenmayer, D., 2000. Spatially-correlated extinction in a metapopulation model of Leadbeater's Possum. Biodivers. Conserv. 9 (1), 47–63. McDaniels, T., 1995. Using judgment in resource management: an analysis of a fisheries management decision. Oper. Res. 43 (3), 415–426. Neuhaus, J., Kalbfleisch, J., Hauck, W., 1991. A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. Int. Stat. Rev./Rev. Int. Stat. 59 (1), 25–35. Rout, T., Hauser, C., Possingham, H., 2009. Optimal adaptive management for the translocation of a threatened species. Ecol. Appl. 19 (2), 515–526. Sainsbury, K., Campbell, R., Lindholm, R., Whitelaw, A., 1997. Experimental management of an Australian multispecies fishery: examining the possibility of trawl-induced habitat modification. Global Trends: Fish. Manag. 20, 107–112. Si, J., Barto, A., Powell, W., Wunsch, D., Lendaris, G., Neidhoefer, J. (Eds.), 2004. Handbook of Learning and Approximate Dynamic Programming, Wiley-IEEE, Hoboken, NJ. Springborn, M., Costello, C., Ferrier, P., 2010. Optimal random exploration for trade-related non-indigenous species risk. In: Perrings, C., Mooney, H., Williamson, M. (Eds.), Bioinvasions and Globalization: Ecology, Economics, Management, and Policy, Oxford University Press, Oxford, UK, pp. 127–144. Springborn, M., Sanchirico, J.N., 2013. A density projection approach for non-trivial information dynamics: adaptive management of stochastic natural resources. J. Environ. Econ. Manag. 66 (3), 609–624. U.S. Department of Commerce, 2006. Bureau of the Census, Data User Services Division, U.S. Imports of Merchandise (monthly). CD-ROM. Walters, C., 1986. Adaptive Management of Renewable Resources. MacMillan Pub. Co, New York, NY. Walters, C., Hilborn, R., 1978. Ecological optimization and adaptive management. Annu. Rev. Ecol. Syst. 9 (1), 157–188. Weber, R., 1992. On the Gittins index for multiarmed bandits. Ann. Appl. Probab., 1024–1033. Whittle, P., 1988. Restless bandits: activity allocation in a changing world. J. Appl. Probab. 25, 287–298. (A Celebration of Applied Probability).