Measuring the Validity and Performance of Energy Models

Measuring the Validity and Performance of Energy Models

Measuring the Validity and Performance of Energy Models W. C. Labys and C.-W. Yang College of Mineral and Energy Resources, West Virginia University, ...

1MB Sizes 0 Downloads 7 Views

Measuring the Validity and Performance of Energy Models W. C. Labys and C.-W. Yang College of Mineral and Energy Resources, West Virginia University, USA

This paper is concerned with approaches to validating and judging the performance of energy models. Recently there has been a proliferation of energy models of a number of different types. These models have not only been specified to deal with different aspects of industry and market behavior but also have involved a wide range of commodity modeling methodologies. Several attempts have been made to classify and to compare the various policy modeling areas as well as the methodological approaches. This paper does not retrace this ground. Rather it points to criteria capable of determining the validity or performance of the various models that have been built. Different validation techniques are offered for econometric time-series models as compared to input-output or mathematical programming models. Finally, suggestions are made for including validation measures in future energy modeling studies.

I.

VALIDATION RATIONALE

There are a number of reasons as to why the validation of energy models should become a matter of increasing importance. The first of these concerns the proliferation of energy models and attempts to survey and compare these models. Among more recent attempts are those of House (1979), Manne et al, (1979), Roberts (1979), Parikh and Gordon (1978), UK Department of Energy (1978), Epple (1978), Gordon (1976), Charpentier (1974), Searl (1973), Energy Modeling (1973), and Limaye (1973). While most of these surveys have compared energy models on the basis of their characteristics and capabilities, only one or two have attempted to contrast the models according to their validity or performance. It would seem that the availability and use of validation criteria would provide a rigorous standard for comparing the quality of energy models. A second reason for validation of energy models is to determine the trust that a policy maker, planner, or forecaster can place in a particular model. A model that reproduces closely data on observed past behavior of a reference system or economic structure gains credibility and wins the confidence of potential users. In this respect, Greenberger et al, (1976, p. 70) make a distinction between verification and validation.

219

220

W. C. Labys and C.-W. Yang

"Verification is a test of whether the model has been synthesized exactly as intended. Verification of a model indicates that it has been faithful to its conception, irrespective of whether or not it and its conception are valid. Validation, in contrast, is a test of whether the model is an adequate representation of the elements and relationships of the reference system that are important to experiments planned with the model. Validation is not a general seal of approval; it is an indication of a level of confidence in the model 1 s behavior under limited conditions and for specific purposes — a check on its operational agreement with the reference system." That validation of energy models has not been widespread appears to stem partially from professional cynicism and partially from the lack of stringent tests being available to anyone except perhaps econometricians. Concerning the former attitude, House (1979, p. 168) states that anyone who says that they are able to predict turning points accurately with a policy model is "either naive, wrong or a charlatan." Such a promise would be equivalent to foretelling the future. He also adds, "even a model that is so designed that it ? apes T the past completely has nothing in its making that gives one any confidence that it will be able to forecast the future." Attempts to validate models, according to House, are falsely based on experience deriving from the natural and physical sciences. Discovered natural laws are such that validation can easily occur just by structuring the modeling methodology as to ensure that it will produce results that conform to the measured data. But in the social sciences, the discovered laws are not natural nor exact. Thus, replication of the past says little about the replication of occurrences that deviate from the historical circumstances, either in the past or in the future. Concerning the availability of stringent validation tests, Greenberger et al, (1976) stress that there is no uniform procedures for model validation. The authors emphasize that since all models are simplifications of the reference system or economic structure. They are never entirely valid in the sense of being fully supported by objective truth. The most appropriate expressions of model performance are that it be "useful," "illuminating," "convincing," or "inspire confidence" rather than it be "valid." Such confidence can be increased by having a model reproduce past behavior of the reference system, by exploring its sensitivity or response to perturbations, by critically examining the premises and theories on which it is based, and finally by putting it to use. However, this point of view should not be one of performing validation but rather of supplanting it. Such an approach to increasing confidence exists because modeling methodologies such as system dynamics, engineering-programming and operations research may not rest on formal data or statistically estimated coefficients. The econometrician, in turn, insists on validation utilizing statistical tests. Fromm (1973) emphasizes that models can only improve policy decision making if the premises on which the models are constructed are made explicit and the systems are subjected to a battery of validation tests. "All too frequently what purports to be a model based on theory and evidence is nothing more than a scholars preconception and prejudices cloaked in the guise of scientific rigor and embellished with extensive computer simulations. Without empirical verification there is little assurance that the results of such exercises are reliable, or that they should be applied for normative policy purposes." Apart from the econometrician1s reasons for validation, we seem to have forgotten the rationale that arises from the economic choice of models. That is, modeling methodologies vary in their costs as well as in their scope and accuracy. As stated by Chambers et al, (1971, p. 4 6 ) , the policy maker "must fix the level of inaccuracy he can tolerate — in other words, decide how his decision will vary, depending on

Validity and Performance of Energy Models

221

the range and accuracy of the forecast." This allows him to tradeoff cost against the value of accuracy in choosing the modeling methodology. Figure 1 shows how cost and accuracy are likely to vary and graphs this against the corresponding cost of the model's errors, given some general assumptions. The figure intends to show that the policy maker must weigh the cost of a more accurate and more expensive model against that of a less elaborate model or a different modeling methodology. The most sophisticated model that can be economically justified is one that falls in the region where the sum of the two costs is minimal. There may be some debate as to the extent that this approach might apply to the vast array of energy models already in existence. However, without some notion of a model's validity or accuracy we cannot begin to think of applying it.

r^ \Sophisticated o model υ II systems

σ» c g

I

^ ^ U

[^

\ .

| '

_ _ _ _ _ _

I More

^ O elaborate | models

^

U-^fÎ\cOs^ Γτο^0 '

Optimal region

^

^——^^rsi

^

I |

^r

S

/

S

/

X

f

°sf- o 7 j j > ^ ^ J Simple I

L_

^OsfiTl

models

L

^Declining accuracy

Fig.

II.

1.

Cost of forecasting versus cost of inaccuracy.

INTERFACE BETWEEN MODELING METHODOLOGIES AND VALIDATION TECHNIQUES

Let us now review the difficulties involved in establishing validation criteria that apply to the numerous methodologies employed in energy modeling. Table 1 adapted from Labys (1975) lists the range of modeling methodologies employed thus far. These range from econometric models and input-output models to optimization models and systems models. The range of validation techniques applicable to the different methodologies is reported in Table 2. No definitions are provided for the techniques, as this will follow in the next section. Among the implications to be drawn from Table 2, the most significant one is that econometric models have been the only class of energy models where statistical validation techniques have been of concern. Validation of input-output and programming models has proven more difficult and perhaps has received less attention as a consequence. Systems dynamics and general systems models have been involved in even fewer validation experiments. In this section, we examine approaches that have been taken to validation in the first three major forms of modeling.

Adams and Griffin (1975). Demand and production determined within the petroleum refining industry, focusing on transformation from product demand to input requirements. Includes linear programming model of petroleum refining process, 1955-1968 Hudson and Jorgenson (1974). Energy demand and supply and economic activity determined within an input-output model structure that is integrated with a macroeconometric growth model of the US economy, 1975-2000 ERDA-BESOM (1974). Linear programming model for energy services delivered in 1985 and 2000. New supply technologies based on engineering process analysis. Fixed future demands for "energy services." Interfuel substitution based on engineering process analysis Kennedy (1974). International equilibrium through quadratic programming model of price-responsive oil demands and supplies in 1980. Cartel-determined Persian Gulf crude oil price. Competitive supplies and demands for crude oil and products elsewhere Day and Tabb (1972). Dynamic equilibrium of US coal sector in which production is explained in terms of shifts among major forms of mining technology and equipment

Dynamic microeconomic difference equation system suitable for integrating linear programming on production side

Input-output model normally combined with macroeconomic framework or disaggregated raw materials balance framework

Linear programming model dealing with activities and/or regions

Linear activity analysis with quadratic objective function. Incorporates economic demand and supply functions. Can also be intertemporal

Activity analysis involving a sequence of constrained maximization problems in which objective function, limitation coefficients depend on optimal primal/ dual solutions attained earlier in the sequence

Econometric process model

Input-output model

Optimization and programming models

2. Spatial equilibrium-quadratic programming

3. Recursive programming

Dartmouth (1977). Dynamic equilibrium of US energy supply and demand featuring adjustment of capital stock due to changing resource availabilities and shifts in demand patterns PIES (1974). Interregional equilibrium through iterations between linear programming model of energy supplies and econometrically estimated model of energy demands in 1985 and 1990. Demands based on regulated or competitive prices, depending on policy scenario

Dynamic microeconometric differential equation system which features lagged feedback relations and variables in rates of change

Dynamic microeconometric equation system which, when formed into a simulation framework is coupled with programming models and/or decision rules

Systems dynamics model

Systems model

1. Spatial equilibrium-linear programming

Labys et at. (1979). Simultaneous determination of coal supply, demand, inventories and prices through simulation of econometric model of the US steam coal market, 1961-1975

Examples of energy applications

Methodologies

Dynamic microéconome trie system composed of difference or differential equations

Methodology

and the Range of Modeling

Econometric market model

Model

TABLE 1. Energy Modeling 222

W. C. Labys and C.-W. Yang

223

Validity and Performance of Energy Models

TABLE 2. Energy Modeling Methodologies Validation Techniques

and the Range of

Parametr 1C t e c h n i q u es Cd •H

H

Energy modeling methodologies

co u

CD

u

•H

CO CD

4J •H IW

u d

m

u

CD •H O •H

14-1

o

CO CO CD

d 'd

μ

CJ •U

d

•H O

a ω r-l

CU •H >-l CD 4J •H

M O

i—1

«d

> u CD

CO •H CO

Nonparametric techniques co

•H CO >ï

>*, .-< cd

i—1

cd d cd

4J

co o o 1

u o

d cd

>>

4-> •H

>

•H •U •H CO

co i-l

CO 4-1

d

•H O

ft

00

d

•H

5

•H CO

d w

M

d

w

CD C/D

d μ d H

X X

X X

X X

X X

X X

X X

X X

X

X

X

X X X

X X

X X

M-l 0) O U

o

00

d

■M

μ

O U U

CD

CD

>

•H 4J

cd

u cd 9* M o CJ

d O

CO

-H

co

•H 4-» •H CO o

>> r-i Cd d cd

O CD TJ

-H > «H

μ O

·Η

fr sO >> 4J •U

co d

M U W

CD C/5

X X

X X

X X

X X X

X

X X X

X X

X X

X X

Econometric Market model Process model Input-output Regional input-output Programming L i n e ar p r o g r a m m i ng Q u a d r a t ic programming R e c u r s i ve programming

X

S y s t e ms S y s t e ms d y n a m i cs G e n e r al d y n a m i cs

Econometric

X X

X X

X X

X X

Models

Among the validation techniques available for the validation of econometric models, these are normally divided into parametric techniques (those that derive from distribution theory) and nonparametric techniques (those that can be applied independently of distribution theory). Examples of parametric criteria listed in Table 2 include coefficient tests, single point criteria and interval criteria. Some nonparametric criteria include turning points and comparative errors. These various techniques are normally concerned with the statistical significance of coefficients in equations, the equations themselves, and the generated or estimated values of the variables (over time and space). It should be noted that such techniques are fairly robust, are applied frequently, and are accepted as a standard by the profession. Some sources of information on the nature and use of these techniques include Theil (1971), Labys (1974), National Bureau of Economic Research (1975), Cooper and Jorgenson (1970), and Pindyck and Rubinfield (1976).

EMSC - I

224

W. C. Labys and C.-W. Yang

Input-Output

Models

Input-output models can be evaluated on two fronts similar to that of econometric models: the coefficients employed and the values of the variables generated. However, the principal concern has been with the quality of the technical coefficients a^j which can be fixed or changing. It goes without saying that such coefficients do change in the long run. The variations in α· -fs are related to changing trade patterns as well as to changing technology. Consider the case of the regional input-output model. If regional output reaches a level of a technological advancement or scale effect that will support local production of its own inputs, the result is a shift from imported to locally produced inputs that makes the previously used coefficients obsolete (Miernyk, 1973). Even within the framework of fixed coefficients, there are problems as to whether national coefficients are good substitutes for regional coefficients. Five criteria have been used to assess the effectiveness of the tested nonsurvey methods of coefficient estimation: (i)

(ii)

Paelinck and Waelbroeck (1963) plotted the frequency distribution of the deviations of a^j - a^j where a^j is the national input-output coefficient and α^ή denotes an estimated technical coefficient Isard and Romanoff (1963) used the Leontief index to measure the relative effectiveness of the coefficients. The index of relative change RC. . = la. . - a*..\/i(a.

. + a*.)

has its value ranging from zero to 2. To normalize the index, Isard and Romanoff modified the range of the index from zero to 1 or S.. (iii)

= 1 - la. . - a*.|/(a. . + a* .)

the chi-square statistic or

X2. = Σ(α. . - a* .)2/a. . C i id V ^C has been calculated by Schaffer and Chu (1969) to identify those sectors with statistically significant estimates of the coefficients. (iv)

Czamanski and Malizia (1969) use a test of information content. That is, the closer the estimates are, the lower the values of information content will be ΠΑ*:Α)

(v)

= ΣΣ|α*. log 2 (a* ./a . .) | .

Schaffer and Chu (1971) employed regression analysis for the comparison of nonsurvey estimated input-output tables. The a^j's of the survey table are the dependent variables and those of nonsurvey tables are the independent variable. The relative effectiveness of each criterion above has been discussed by Morrison and Smith (1974).

With regard to applying the criteria to investigate the interregional input-output models developed by Leontief (1953), Isard (1951), Moses (1955, 1966), Chenery (1953), and Leontief and Strout (1963), most spatial allocation models assume the stability of the trade patterns within a given period. The estimated interregional flows are then crucially dependent on the estimated values of the technical

Validity and Performance of Energy Models

225

coefficients and the trade coefficients. In the Leontief-Strout (L-S) model, the trade coefficients are determined to a large extent by share parameters and true transportation costs. Unfortunately, this is the area where a paucity of information necessary for determining trade coefficients prevails, especially in applying the model to the US energy market. The L-S model allows for the phenomenon of cross-hauling among regions based on the formulation of its structural equations. However, mathematical programming approaches notably the Henderson (1958) type of allocation model and the TakayamaJudge (T-J) type of spatial model (1971) do not permit an analysis of cross-hauling with positive transportation rates (Yang, 1979). This is the critical point that separates interregional input-output approaches and the empirically more demanding mathematical programming approach. One should bear in mind that neither approach is suitable for large scale energy modeling purposes. Yang (1979) has shown that the T-J model transmits the variations in optimal flows via a system of Kuhn-Tucker conditions even for a small change in individual transportation costs. As the dimension of the model becomes larger, so will fluctuations in the optimal flows. In the case of L-S model simulation, the computation cost is extremely high and errors are transmitted from the original data to the final inverse (Miernyk, 1973). At this point, it is clear that each approach has its empirical limitations. In addition, the validity of the models becomes more complex as the size of the models increases.

Programming Models The basic validation problem associated with mathematical programming models, even when of moderate size, is the limitation of positive basic commodity flows. It is a well-known result (Gass, 1969; Silberberg, 1970) that both linear and quadratic programming transportation models yield no more than 2n-l positive flows. In empirical studies, there can be as many as nl possible flows. Even when the number of flows increase, the deviation between optimal and observed flows does not convey very meaningful results; each deviation not only explains the errors due to data, estimation, specification and performance of industry but also that due to the properties of the models. It is possible to apply Monte Carlo analysis to the linear programming model since such a model can always be transformed into a Kuhn-Tucker system of linear equations. Similar to the econometric approach, the probability distribution of the parameters and the error terms (usually normally distributed) are known and, therefore, the confidence interval of the dependent variables (in the case of econometric approach) and the dual-primal variables (in the case of linear programming approach) can be established. However, in the case of the quadratic programming model, the KuhnTucker system is no longer linear with respect to model parameters, e.g., transportation costs, slopes and intercepts of regional equations. Hence the normality of the probability distribution is not preserved in the T-J model. The only remaining workable statistical criterion in this case is tliat of factorial design analysis (Yang and Labys, 1979; Naylor, 1971). In conclusion, it would appear that sensitivity analysis of the above type based on parametric tests can be applied fruitfully to programming types of energy models. Where such parametric tests are not applicable, Labys and Yang (1979) have shown that the deterministic response surface can be appropriately applied to the sensitivity analysis of energy models. The size of the model, indeed, serves as a limitation to validating mathematical programming and input-output models, both analytically and computationally. The quality of the energy models to be adopted, therefore, rests with the objectives of the research as well as the economic budget of the modeling effort.

226

W. C. Labys and C.-W. Yang III.

SOME VALIDATION CRITERIA DEFINED

The approach presently taken in defining validation criteria is to separate the structure of the model from the estimations generated by computer simulations with the model. In the first case, criteria are presented applicable primarily to models where the coefficient of the structural equations have been estimated statistically. Because it is nearly impossible to validate "guesstimates" of coefficients used in engineering and systems types of models, these criteria are mostly applicable to models employing regression equations. In the second case, the performance of the model is determined on the basis of the generated values of the model 1 s variables by comparing the values to their corresponding actual or assumed values. Generated values are assumed to be the result of a sequence of computer simulation exercises usually employing numerical methods. The definitions of the various criteria for the most part appear from Labys (1973).

Parametric

Tests

Significance of Coefficients The parametric evaluation of an energy model can begin with the statistical significance of coefficients in the structural equations, normally obtained with single-equation or multiple-equation estimation methods, e.g. OLS, 2SLS, FIML. These are based on the principle of least squares. In the following simplified example (OLS), one begins with the linear model specification Y.

= a + ß*. + u.

(i

= 1, 2, ... , n)

(1)

The assumptions underlying the model are: (a)

the relationship between Y and X is linear;

(b)

the Z.fs are nonstochastic and fixed: E(U{)

(c)

= 0 (i = 1, 2, ..., n)

and 0

E(u.u.)

^ J

=

σ2

u

a * d;

i , Ù = 1, 2 , . · · , n)

{i = «/;

i>

«7 = 1, 2 ,

. ..,

n)

The parameters a and 3 are chosen such as to minimize the sum of the square of the residuals

n ? n 2 Λ Σ < = n Σ (Y. - v.)

(2)

such that

a =Y n Σ {Xi - X) (Yi - Y) Σ i=\ n _ 2

(3)

Validity and Performance of Energy Models

227

If we make the ,additional assumption of u

t

~ 71/(0, σ 2 )

then hypothesis tests of the coefficients are based on the probabi.lity distribution of estimators

3-

3

N

σ2 '

^

Σ

i=l

a . - x)

Σ 2

α,σ

Ot ~

^=1

~

(4) 2

4

n Σ (X. - X)

(5)

'#

The unbiased estimator of population variance σ2 is given by n I s ■■= 2

u.2

2

s -2Ü

(6)

n-2 and the estimates of the variance of a and 3

s2

s 2

s

n

- 2'

Σ a.

(7)

- « n

Ü

= S2

s. 2

(8)

n Σ (Z. - X)2

a

The test of the statistical significance of the coefficients ex, 3 is based on the t distribution with n-2 degrees of freedom as follows:

V:

3-3

2

s

ßP

n

3 3

(9)

- 2 '

8/ Σ U { - Z ) Z

i=\ Related confidence intervals are given by 3 + tc

s

l ·

is the critical value from £ distribution with n-2

where t

c

(10) degrees of freedom.

228

W. C. Labys and C.-W. Yang

Goodness-of-Fit In econometric work, goodness-of-fit statistics such as the coefficient of determination and the standard error of the estimate are normally computed for each equation. The coefficient of determination is the proportion of the variance of Y explained by the linear influence of X, It is defined as the square of the correlation coefficient n

n -ΣΛΪ.

2

l e .

- Y)2

n

(0 < r z < 1) .

71

Σ (li ^=l

-

Y)2

Σ

(y

-

-

(11)

2

i)

The standard error of the estimate has been given in relation (6). Neither of these measures, however, extends to the goodness-of-fit of an entire system of equations. One can only look to an analysis of simulation results using single-point criteria to obtain a better perspective on overall model performance. Single Point Criteria The tests which have been traditionally applied to measure the forecast error between actual and estimated or simulated observations are single point criteria, such as the mean absolute percentage error and the mean squared error n %E. = —1_ y )

J \y+ ">t - y* V— x 100% ,

(12)

- y )2,

(13)

n

I (y

M.S.E. = 1 n

t

t=l

t

where y is the actual observation values and y, is the forecast or estimated values. Similarly useful is the Theil inequality or U coefficient, which derives from a regression of actual on forecast values and which can also provide information as to sources of error (see Theil, 1971). n

1

U=

~H}

n l^t —

1 ?

n Λ u— 1

.2

y

t

+

1 ?

n I

2

y

2

.

(14)

t

~D— 1

While the error criteria relate to year-by-year error, they can be applied to the analysis of multi-period forecast analysis. However, little is presently known about the use of multi-period forecast error to compare the validity of alternative models. Interval Criteria If the modeler is more interested in predicting an interval or range of values, then the test of confidence intervals (10) could be developed and applied for this purpose.

Validity and Performance of Energy Models

229

Error-Cost Analysis While the above criteria probably are the most practical for evaluating performance with a historical sample or a small post-sample data set, they fail to define the surrounding probabilistic conditions in a way which would be useful with a large post-sample data set. There are two ways to improve this situation. First, an informative forecast could accompany the point forecasts based on some mathematical statement regarding the probability distributions surrounding these forecasts. This amounts to an interval forecast in which the point forecast is now presented along with an appropriate confidence interval. Second, a decision forecast could be prepared which recommends that the forecast be accepted in relation to some alternative consequence. For example, learner and Stern (1970) consider the case where the policy maker must decide on a certain policy which depends upon the future value y^ of the endogenous variable y^. The future value, of course, is not known and the policy maker could make an incorrect decision. The loss or consequence the policy maker must undergo in such a case is given by the loss function L(D^9y^), which describes the loss of selecting decision D; when y^ turns out to be the true value of y+. One could then form the following rule, which would lead to decision D^ while minimizing the expected loss

E[L] = lWi*yt)P(v*t

« yt)

>

(is)

y where P(y* = y^) is the probability that the future value z/î will be y^. In this case the policy maker would select the value y, for which his loss will be minimal. This can also assume the form of an error-cost function in which a policy maker decides to act by not following the best possible forecast. Rather, he selects one forecast from alternative forecasts such that the cost of making forecast errors is minimized. Sensitivity Analysis It is also important to observe the sensitivity of model solutions to variations in the parameters of the model. In the case of econometric models, parameters can simply be varied systematically and variations in the solutions evaluated in terms of deviations from a base solution. A more careful analysis would involve statistical experimental design methods, an approach which is not yet fully developed. In the case of input-output and programming models, there generally are mechanical relations between model parameters (or exogenous variables) and optimal solutions, i.e., the Leontief inverse for the input-output approach and the Kuhn-Tucker theorems for the mathematical programming technique. Consequently, the validity of the models to a large extent depends on the sensitivity of the models, defined as variations of the optimal solutions once the system is slightly perturbed. As was shown in the previous section, solutions of the quadratic programming model are very sensitive to some parameters, especially those describing transportation costs and supply intercepts. The variations in the optimal solution reflected in the interregional flows are transmitted by the Kuhn-Tucker conditions. If the size of the model becomes larger, so does the instability of optimal flows generated by its transmission mechanism. In the light of this, one must be very careful about two aspects of energy modeling: first, the modelers should avoid changing the intercepts of the regional equations or the transportation cost parameters too rapidly. But in the case of the intertemporal spatial equilibrium model, such changes are inevitable

230

W. C. Labys and C.-W. Yang

for forecasting purposes. For example, income, population and climate condition are often emerged into the intercept terms. Hence, in order to avoid model instability, a reduction in the dimensions of the modeling problem may become necessary. Second, the modelers should avoid a model framework with excessive regional disaggregation. Such a practice usually generates a set of optimal solutions that are highly sensitive to a set of exogenous shocks and the resulting errors are transmitted through the specified "built-in" mechanism. In the case of a large-scale Leontief-Strout interregional input-output model, the errors are compounded through the inversion of the large matrices of technical coefficients. The cost of inverting a matrix by conventional method is roughly proportional to the cube of the order of the matrix (Luft, 1969). In any event, a try at sensitivity analysis would seem to be the only practical approach to verifying the input-output and programming models. The approach presently recommended is described below as a nonparametric test. Nonparametric Tests Turning Point Errors A major characterization of a model's performance is its ability to explain the turning points of fluctuations in values of the endogenous variables. There are a number of descriptive variables or statistics which can be used to describe turning point errors. Most often they pertain to the number of turning points missed, the number of turning points falsely predicted, the number of under and over predictions, rank correlations of predicted and actual changes, and various tests of randomness in prediction. Comparative Errors Validation also can take the form of comparing the forecast errors obtained from the equation of interest to the forecast errors achieved from a naive or statistical equation or from judgmental or other noneconometric forecasts. As shown by Labys and Granger (1971) naive and statistical equations can provide a rigorous standard for most commodity forecast comparisons, especially when using short run data. As a starting point one could begin by comparing forecasts from an equation to forecasts generated by naive methods such as "no change", where literally no change takes place in the endogenous variables from one period to the next, y^ = y^\, or "same change," where the endogenous variables will continue to change in the same direction and by the same magnitude as the previous change y^ - £/-f_i = lét~\ ~ Ut-?' Expressed in levels, it becomes a weighted sum y+ = 2i/-_, - £/£_o· auch comparisons are made for each endogenous variable in the overall model. Cooper and Jorgenson (1970) have shown that a good comparative statistical equation which utilizes all available information is an autoregressive process of no more than three lags, y^ - b^ + b,y^_, + b~y+_2 + b-u/t-i* T n e v a l s o show how one can standardize the period of fit and technique of estimation across alternative equations prior to comparison. The same can be done using Box-Jenkins or autoregressive integrated moving average (ARMA) equations. One other approach to comparative errors is to contrast a model's estimates with those of other models of the same size or complexity. For example, the Electric Power Research Institute has sponsored the Energy Modeling Forum project (EMF, 1977) for comparing energy model outcomes. Model builders and users run a group of models independently, based on a set of common assumptions, e.g., GNP growth notes, energy costs, etc. Results of the simulations are used to identify model deficiencies and inconsistencies. Similarly, a "Model Verification and Assessment" project has been

Validity and Performance of Energy Models

231

established at the Massachusetts Institute of Technology (MIT Model Assessment Laboratory, 1978). Complementing the EMF project, it serves to develop procedures and methodologies for in-depth model assessment and attempts to apply these assessment procedures to individual energy models. Error Decomposition Haitovsky and Treyz (1972) propose that forecast error be decomposed according to components involving (1) the particular equation explaining the variable of interest, (2) the rest of the model, (3) incorrect values of the lagged endogenous variables, (4) incorrect forecasts of one or more of the exogenous variables, and (5) lack of serial correlation adjustments for observed errors. This would enable us to improve a commodity model in a way that would facilitate policy analysis since it would identify model subsectors that transmit error across other sectors. Only simple statistical measures are needed to describe the errors such as mean bias and variance. Sensitivity Analysis One drawback with sensitivity analysis based on parametric tests is that they are costly. A substantial number of stochastic solutions of the model are necessary, normally employing numerical methods. Such tests normally have been suggested for the case of dynamic econometric models of a linear or nonlinear type. In the case of the spatial equilibrium models, notably linear and quadratic programming model, the stochastic programming approach has been occasionally employed in the linear case. However, such a validation procedure does not apply to the quadratic case. What a modeler can do is to reduce the quadratic programming model to a system of Kuhn-Tucker linear equations. And from there, one can derive the property of the response surface of the regional and interregional economic activities. Such an approach has been advocated by Labys and Yang (1979) and Yang (1979) who show that the response surfaces of regional production, consumption and interregional flow are piecewise linear, whereas the value of objective function is piecewise quadratic. Hence, within a given set of basic variables in the simplex tables, the sensitivity of a model is entirely predictable. The advantage of this technique lies in the fact that once the response interval is known, a modeler can forecast the decision variables without resimulating his model.

IV.

VALIDATION IN THE ENERGY MODELING PROCESS

Energy modeling can be generally described as a process consisting of three interrelated activities: model formulation, parameter estimation, and model validation. In addition to being sequential, the interrelation can be thought of as involving feedback. During the validation process, additional shortcomings of a model may be detected and reformulation and estimation may again be necessary. The nature of energy modeling is such, however, that some types of modeling activities may not fit into this scheme. What are some of the exceptions that we face? First of all, validation is difficult to imagine for large-scale energy models with many technically determined coefficients such as those including programming methodologies. In addition, most of these models cannot be run as a total machine model. Often required is the intervention of analysts between the running of the submodels to "massage" the data in order to get a full run of the system. Finally, many of these planning models forecast a substantial number of years into the future, the year 2000 or beyond.

E M S C - I·

232

W. C. Labys and C.-W. Yang

Earlier the economic criterion in the choice of energy models had been mentioned. This criterion places pressure on energy modelers to undertake validation even under the most stringent of conditions. For example, Fromm et al. (1979) point to the possible development of Federal standards for model validation. In that context validation was seen as a means of avoiding or discarding bad models and enhancing the creditibility of good ones for policy applications. Whether or not such government intervention is realistic, of course, depends on the establishment of uniformity in validation criteria and quality standards. If we are going to pay more than lip service to validation, what should it involve? In the case of large-scale, long-term models, we should examine the internal consistency of a model as well as its consistency with theory. Also of interest is a comparison of the results of a particular model with the results of other models or of other surveys that purport to measure the same factors. This comparison need not only relate to the level of the variables; it can also embody a sensitivity analysis of the response of the alternative models to similar perturbations. This validation process implies not only validation in the nature of comparative results but also taking a serious look at the data and documentation to see if variations can be explained reasonably. For medium-term and short-term models of a moderate scale, increased use can be made of the parametric and nonparametric validation techniques described above. Where temporal energy activities are being described, techniques such as goodness of fit, single point criteria, turning point analysis and comparative errors can be employed. Where possible, validation should take place separately in the ex post or sample period from the ex ante or prediction period. Such a suggestion obviously applies better to econometric models than input-output or programming models. For the latter two methodologies, greater development and application of sensitivity analysis would seem necessary. One hesitates to suggest the need for additional research into developing validation criteria for models involving programming algorithms, but the economic costs of models relative to the benefits achieved imply that such research may be necessary. Otherwise, we have no other guides for assuring us that energy policy modeling will lead to optimal energy policy making.

APPENDIX A SURVEY OF VALIDATION PROCEDURES IN ENERGY MODELS This appendix will feature a summary and analysis of the results obtained from conducting a survey of validation procedures in energy models. It will feature a tabulation of the nature and extent of the methods used as well as attitudes towards validating energy models. Such a survey could act as a stimulus to generating interest in this area. Attached is a simple questionnaire which could be sent to conference members and the results presented at the conference.

Validity and Performance of Energy Models

233

ENERGY MODEL VALIDATION SURVEY Please circle the answer or answers which best describe your model and the validation methods employed. On the second page, please provide further description (if necessary) list your attitudes towards model validation. Please complete one form per model. Thank you. Energy Commodity (s) : Author(s): Institution and address : Model presently operated:

Yes / No

I.

Basic Model Characteristics

1.

7. 8. 9. 10.

Methodology: Econometric market model/econometric process model/input-output model/spatial equilibrium-LP/spatial equilibrium-QP/recursive programming/ systems dynamic systems (describe) other (describe) Form: Recursive/simultaneous/both; open/closed; static/dynamic; linear/ nonlinear; stochastic/nonstochastic Frequency of Observations: annual/quarterly/monthly Sample per iod : Begins ; Ends Forecast period : Begins ; Ends Scope: Domestic-Country International-Countries : Number of variables: Endogenous ....; Exogenous ....; Lagged Endogenous.... Number of equations : Estimated man-months of construction Model citation in the literature

11.

Validation Process (Indicate yes/no and type)

2. 3. 4. 5. 6.

1. 2. 3.

Internal consistency of the model Consistency of the model with theory Parametric techniques a. Coefficient tests (type b. Goodness of fit (type c. Single point criteria (type . d. Interval criteria (type e. Error-cost analysis (type ... f. Sensitivity analysis (type ..

4.

Nonparametric techniques a. Turning points b. Error decomposition c. Comparative errors (other qualitative forecasts/ other equation forecasts/other model forecasts/ other d. Sensitivity analysis (type Comparison of model accuracy with model costs

5.

Yes/

W. C. Labys and C.-W. Yang

234 III.

Please provide a brief description of the validation process if further detail is needed regarding the above questions.

IV. Please indicate your attitudes towards the need for and feasibility of energy model validation.

REFERENCES 1. Adams, F. G. and Griffin, J. (1975) An econometric linear programming model of the US petroleum refining industry. In Quantitative Models of Commodity Markets, (W. C. Labys, ed.) Ballinger, Cambridge. 2. Brock, H. and Nesbitt, D. (1977) Large-Scale Energy Planning Models: A Methodological Analysis, Stanford Research Institute, Menlo Park, CA. 3. Chambers, J. C , Mullick, S. K. , and Smith, D. D. (1971) How to choose the right forecasting technique, Harvard Business Rev, 1971, 45-74. 4. Charles River Associates (CRA) (1978) Review and Evaluation of Selected Large-scale Energy Models, CRA Report No. 231, Cambridge, MA. 5. Charpentier, J. P. (1974) A Review of Energy Models, Vols. I and II, RR 74-10, International Institute for Applied Systems Analysis, Laxenberg, Austria. 6. Chenery, Hollis B., Clark, Paul G., and Cao Pinna, Vera (1953) The Structure and Growth of the Italian Economy, United States Mutual Security Agency, Rome. 7. Cherniavsky, E. A., Juang, L. L., and Abilock, H. (1977) Dynamic Energy System Optimization Model, Brookhaven National Laboratory, Upton, New York. 8. Cooper, R. L. and Jorgenson, D. W. (1970) The Predictive Performance of Quarterly Econometric Models of the US, Working Paper No. 113, Institute of Business and Economic Research, University of California, Berkeley. 9. Cox, D. R. (1962) Tests of separate families of hypotheses, Jl. R. Statist. Soc. Series B, lk_. 10. Czamanski, S. and Malizia, E. E. (1969) Applicability and limitations in the use of national input-output tables for regional studies, Regional Sei. _23, 65-77. 11. Dartmouth Systems Dynamics Group (1977) Fossil I: Introduction to the Model, Thayer School of Engineering, Dartmouth College. 12. Day, R. H. and Tabb, W. K. (1972) A Dynamic Microeconomic Model of the U.S. Coal Mining Industry, SSRI Research Paper, University of Wisconsin, Madison. 13. Dhrymes, P. J. et. al. (1972) Criteria for evaluation of econometric models, Ann. Econ. Social Measurement, J_, 291-324. 14. Eckstein, A. J. and Heien, D. M. (1978) A Review of Energy Models with Particular Reference to Employment in Manpower Analysis, Report for the Employment and Training Administration, US Department of Labor, Washington, DC. 15. Energy Modeling Forum (1977) Energy and the Economy, EMF Report 1, Stanford University. 16. Energy Modeling Forum (1978) Coal in Transition: 1980-2000, EMF Report 2, Stanford University. 17. Energy Research and Development Administration (1974) A National Plan for Energy Research Development and Demonstration: Creating Energy Choices for the Future, ERDA-48, US Government Printing Office, Washington, DC. 18. Epple, D. (1978) Studies of US primary energy supply: a review, Energy Systems and Policy 2_, 245-65.

Validity and Performance of Energy Models 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45.

235

Federal Energy Administration (1974) Project Independence Report, US Government Printing Office. Fromm, G. (1973) Policy decisions and econometric models, paper presented at an NSF Seminar, Washington, DC. Fromm, G., Hamilton, W. L., and Hamilton, D. E. (1974) Federally Supported Mathematical Models, Survey and Analysis, Research Applied to National Needs (RANN), National Science Foundation, Washington, DC. Gass, S. I. (1969) Linear Programming, 3rd edn., McGraw-Hill, New York. Gordon, R. L. (1976) Economic Analysis of Coal Supply: An Assessment of Existing Studies, Prepared for the Electric Power Research Institute, Report No. EA-496, Palo Alto, CA. Greenberger, M., Crenson, M. A. and Crissey, B. L. (1976) Models in the Policy Process, Rüssel Sage Foundation, New York. Haitovsky, Y. and Treyz, G. (1972) The decomposition of econometric forecast error, in: P. J. Dhrymes et al., Criteria for evaluation of econometric models, Ann. Econ. Social Measurement J_, 291-324. Hausman, J. (1975) Project independence report: an appraisal of US energy needs to 1985, Bell Journal Econ. and Management Sei. _6, 517-51. Henderson, J. C. (1958) The Efficiency of the Coal Industry, Harvard University Press, Cambridge, MA. House, P. (1979) What1s really new about modeling?, J. Policy Modeling J_, 159-78. Hudson, E. A. and Jorgenson, D. W. (1974) US energy policy and economic growth, 1975-2000, Bell J. Econ. Management Sei. _5, 461-514. Isard, W. (1951) Interregional and regional input-output analysis: a model of _3^5, 318-28. space economy, Rev. Econ. and Statist. Isard, W. and Romanold, E. E. (1968) The Printing and Publishing Industries of Boston SMSA 1963, Technical Paper No. 7, Regional Science Research Institute, Cambridge, MA. Kennedy, M. (1974) An economic model of the world oil market, Bell J. Econ. Management Sei. _5, 540-77. Labys, W. C. (1973) Dynamic Commodity Models, Heath Lexington Books, Lexington. Labys, W. C. (1975) Quantitative Models of Commodity Markets, Ballinger, Cambridge. Labys, W. C. and Granger, C. W. J. (1970) Speculation, Hedging and Commodity Price Forecasts, Heath Lexington Books, Lexington. Labys, W. C. and Yang, C. W. (1979) Modeling the eastern US steam coal market, paper presented at the SME-AIME 1979 Conference, Tucson. Labys, W. C. and Yang, C. W. ( — ) A quadratic programming model of the Appalachian steam coal market, Energy Econ. (forthcoming). Labys, W. C , Paik, S., and Liebenthal, A. (1979) An econometric simulation model of the US market for steam coal, Energy Economics j_. Learner, E. E. and Stern, R. M. (1970) Quantitative International Economics, Allyn & Bacon, Boston. Leontief, W. (1953) Interregional theory, in: Studies in the Structure of the American Economy, Oxford University Press, New York, 93-115. Leontief, W. and Strout, A. (1963) Multiregional input-output analysis, in: Structural Interdependence and Economic Development (Tibor Barno, ed.), St. Martin's Press, London, 189-250. Limaye, D. R., (1976) Energy Planning and Policy, Heath Lexington Books, Lexington. Luft, Harold, S. (1969) Computational Procedure for the Multiregional Model, Report No. 16, Harvard Economic Research Project, Cambridge, MA. Manne, A. S., Richels, R. G., and Weyant, J. P. (1979) Energy modeling: a survey, Operations Res. 1-36. Miernyk, William H. (1973) Regional and interregional input-output models: a reappraisal, in: Spatial, Regional and Population Economics (Perlman, Leven, and Chinitz, eds.) Gordon & Breach, London, 263-92.

236 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68.

69. 70. 71.

W. C. Labys and C.-W. Yang MIT Model Assessment Group (1979) Independent Assessment of Energy Models, prepared for the Electric Power Research Institute, Report No. EA-1071, Palo Alto, CA. Moses, L. (1955) The stability of interregional trading patterns and inputoutput analysis, Am, Econ. Rev. 4^5, 803-32. Moses, L. (1966) A general equilibrium model of production interregional trade and location of industry, Rev. Econ. Statist. 4^, 373-97. National Bureau of Economic Research (1975) Conference on Model Formulation, Validation and Improvement, Summary Paper, NBER, Cambridge, MA. National Science Foundation and Energy Research Unit (1974) Energy Modeling, IPC Science and Technology Press, London. Naylor, T. H. (ed.) (1971) Computer Simulation Experiments with Models of Economic Systems, J. Wiley, New York. Neri, J. A. (1975) An evaluation of two alternative supply models of natural gas, Bell J. Econ. Management _6, 280-302. Paelinck, J. and Waelbroeck, J. (1963) Etude empirique sur devolution des coefficients input-output, Economic Appliquée 16, 81-111. Parikh, S. C. and Gordon, R. L. (1978) A comparison of models used in coal in transition: 1980-2000, presented at the Energy Modeling Forum, Stanford University. Pindyck, R. S. (1974) The regulatory implications of three alternative econometric supply models of natural gas, Bell J. Econ. Management Sei. 5_, 633-45. Pindyck, R. S. and Rubinfeld, D. L. (1971) Econometric Models and Economic Forecasts, J. Wiley, New York. Ramsey, J. B. (1969) Tests for specification errors in classical least squares regression models, Jl. R. Statist. Soc. Series B, _3J_, 350-71. Roberts, P. C. (1980) The modeling of national energy demand, prepared for the ECE Seminar on Energy Modeling, Washington, DC. Schafer, W. and Chu, K. (1969) Nonsurvey techniques for constructing regional interindustry models, Regional Sei. Z3, 83-101. Schafer, W. and Chu, K. (1971) Simulating regional interindustry models for Western states, Papers and Proceedings of the First Pacific Regional Science Conference 1, 123-63. Searl, M. (ed.) (1973) Energy Modeling, Working Paper of Resources of the Future, Washington, DC. Silberberg, E. (1970) A theory of spatially separated markets, Intrnal. Econ. Rev. 341-48. Takayama, T. and Judge, G. (1971) Spatial and Temporal Price and Allocation Models, North-Holland, Amsterdam. Taylor, L. D. (1975) The demand for electricity: a survey, Bell J. Econ. Management^ 74-110. Theil, H. (1971) Principles of Econometrics, J. Wiley, New York. Tietenberg, T. H. (1970) Energy Planning and Policy, Heath Lexington Books, Lexington. UK Department of Energy (1978) Energy Forecasting Methodology, Energy Paper No. 29, Economics and Statistics Division, Department of Energy, HMSO, London. Wood, D. 0. (1979) Model assessment and the policy research process: current practice and future promise, in: Proceedings of the DOE/NBS Workshop on Validation and Assessment Issues of Energy Models, (Saul Gass, ed.), National Bureau of Standards, Washington, DC. Yang, C. W. (1979) A critical analysis of spatial commodity modeling: the case of coal, unpublished PhD dissertation, West Virginia University. Yang, C. W. (1980) The stability of the interregional trade model: the case of the Takayama-Judge model, in: Modeling and Simulation, (G. Vogely and H. Mickle, eds.), Instrument Society of America, Pittsburgh. Yang, C. W. and Labys, W. C. (1979) Sensitivity analysis of the quadratic spatial equilibrium model, Working paper, College of Mineral and Energy Resources, West Virginia University.