Flow relevant covariance localization during dynamic data assimilation using EnKF

Flow relevant covariance localization during dynamic data assimilation using EnKF

Advances in Water Resources 33 (2010) 129–145 Contents lists available at ScienceDirect Advances in Water Resources journal homepage: www.elsevier.c...

6MB Sizes 0 Downloads 28 Views

Advances in Water Resources 33 (2010) 129–145

Contents lists available at ScienceDirect

Advances in Water Resources journal homepage: www.elsevier.com/locate/advwatres

Flow relevant covariance localization during dynamic data assimilation using EnKF Deepak Devegowda a,*, Elkin Arroyo-Negrete b, Akhil Datta-Gupta c,1 a

University of Oklahoma, Mewbourne School of Petroleum and Geological Engineering, T-301, 100 E. Boyd Street, Norman, 73019 OK, USA Oxy Oil and Gas Corp., Houston, TX, USA c Texas A&M University, Department of Petroleum Engineering, College Station, TX 77843-3116, USA b

a r t i c l e

i n f o

Article history: Received 20 April 2009 Received in revised form 29 September 2009 Accepted 2 October 2009 Available online 9 October 2009 Keywords: Sequential data assimilation Ensemble Kalman filter Covariance localization Subsurface characterization Streamlines

a b s t r a c t Multiphase dynamic data integration into high resolution subsurface models is an integral aspect of reservoir and groundwater management strategies and uncertainty assessment. Over the past two decades, advances in computing and the development and implementation of robust algorithms for automatic history matching have considerably reduced the time and effort associated with subsurface characterization and reduced the subjectivity associated with manual model calibration. However, reliable and accurate subsurface characterization continues to be challenging due to the large number of model unknowns to be estimated using a relatively smaller set of measurements. For ensemble-based methods in particular, the difficulties are compounded by the need for a large number of model replicates to estimate sample-based statistical measures, specifically the covariances and cross-covariances that directly impact the spread of information from the measurement locations to the model parameters. Statistical noise resulting from modest ensemble sizes can overwhelm and degrade the model updates leading to geologically inconsistent subsurface models. In this work we propose to address the difficulties in the implementation of the ensemble Kalman filter (EnKF) for operational data integration problems. The methods described here use streamline-derived information to identify regions within the reservoir that will have a maximum impact on the dynamic response. This is achieved through spatial localization of the sample-based cross-covariance estimates between the measurements and the model unknowns using streamline trajectories. We illustrate the approach with a synthetic example and a large field-study that demonstrate the difficulties with the traditional EnKF implementation. In both the numerical experiments, it is shown that these challenges are addressed using flow relevant conditioning of the cross-covariance matrix. By mitigating sampling error in the cross-covariance estimates, the proposed approach provides significant computational savings through the use of modest ensemble sizes, and consequently offers the opportunity for use with large field-scale groundwater and reservoir characterization studies. Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction The adverse impact of inaccurate forecasts in groundwater hydrology and in petroleum reservoir applications can be substantial. Thus, proper characterization of the subsurface heterogeneity and quantification of uncertainty are critical aspects of subsurface resource management strategies. This is achieved through the use of model calibration algorithms that integrate observations to improve the estimates of the model unknowns, thereby improving the predictive capability of the resulting model realizations. The recent trend in acquiring real-time performance data using down-hole well monitors and permanent sensors has driven the need for recursive and continuous model updating. This has led

* Corresponding author. Tel.: +1 405 325 3081; fax: +1 405 325 7477. E-mail addresses: [email protected] (D. Devegowda), Elkin_ArroyoNe [email protected] (E. Arroyo-Negrete), [email protected] (A. Datta-Gupta). 1 Tel.: +1 979 847 9030. 0309-1708/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.advwatres.2009.10.001

to the increased interest in the ensemble Kalman filter (EnKF) as a method for data assimilation [15,22]. In the past few years, the EnKF has been implemented for dynamic data assimilation and parameter estimation in a diverse set of disciplines [2,4,15,19,22,26]. For petroleum reservoir engineering, the primary focus has been to update geologic models via history matching or inverse modeling of production response. Starting with an ensemble of model realizations (typically permeability distributions) conditioned to all previously obtained data, the updating process utilizes the ensemble-derived statistics to calibrate the model parameters whenever measurements become available. Consequently, it is possible to maintain an ensemble of model realizations that are ‘current’ with field observations and can be used for reservoir performance forecasting. The ease of implementation, the ability for uncertainty quantification and the versatility in handling diverse data types make the EnKF a very appealing approach for data assimilation problems.


D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145

However, for field-scale subsurface characterization problems, the EnKF can be computationally challenging because it requires flow simulation for every ensemble member. In particular, for multiphase flow problems and high resolution geologic models, these flow simulations can be computationally expensive. A crucial aspect of the EnKF update is the ensemble-based estimation of various covariances and cross-covariances that relate the data to model parameters. These estimates can be severely compromised when using small ensemble sizes because of sampling errors [4,17,27]. The difficulties are exacerbated when we need to estimate a large number of model unknowns. The underlying non-linearity of the problem, for e.g. in multiphase flow applications, can further degrade the EnKF update and lead to geologically inconsistent changes to the models [5,12,19]. A common observation with the use of the EnKF is the lack of geological and physical consistency between the pre- and post-assimilation model realizations. In addition to these challenges, the loss of ensemble variability can severely underestimate the underlying uncertainties [12,13]. The loss of variability causes the EnKF to ignore newer observations because the uncertainty in the prior models as reflected by the sample-derived covariances becomes negligible [13,20]. These difficulties can be somewhat mitigated by using larger ensemble sizes and hence, more reliable representations of the ensemble-derived statistics. However, this choice can be computationally infeasible for large problems. The use of improved variants of the EnKF for practical data assimilation is common throughout the published literature [2,6,23,25,36]. These methods rely on addressing either the issue of the loss of variance or the sampling related difficulties or both. Covariance inflation schemes [2,3] target the variance deficiency of the ensemble observed following a sequence of updates. On the other hand, covariance localization schemes [20] utilize prior or ad-hoc knowledge about the nature of the parameter and data correlations to obtain improved estimates of the various covariance matrices. Distance-based covariance localization assumes that correlations between variables at different locations become insignificant beyond a certain cut-off distance [17,20]. Within the selected zone centered at the observation location and extending out to the cut-off radius, the ensemble-derived correlations are assumed to decrease with distance. The application of distance-based localization, however, may not be effective for 3-dimensional and highly heterogeneous reservoir and groundwater models [5]. To adequately capture the relationship between the data and the model unknowns, the localizing function should be closely related to the underlying flow field. In this paper, we present two novel covariance localization schemes using streamline-derived information. The streamlines are readily obtained from the underlying flow field by integrating the fluid fluxes [9,10,24,30]. The first method simply utilizes streamline trajectories to retain relevant spatial correlations and eliminate spurious terms in the cross-covariance estimates. The second approach utilizes the same underlying principles, but goes a step further by weighting the streamline trajectory based estimates using a localizing function based on the parameter sensitivities. The streamline trajectories and sensitivities can be computed easily and efficiently from the underlying velocity field and adds very little computational overhead [10,24,32,33]. Because the streamline-based localizing schemes are tied to the underlying geology and the accompanying flow field, the primary benefits of our approach are improved geologic realism and reduced loss of variance during assimilation. It also greatly enhances the computational efficiency because the streamline-based localization allows for the use of a smaller ensemble size compared to the traditional ensemble Kalman filter. In the next section, we provide a brief introduction to the EnKF for groundwater and petroleum engineering applications. We fol-

low the discussion with a section devoted to data assimilation for a synthetic test example and highlight the difficulties with the EnKF for non-linear multiphase problems and non-Gaussian prior models. Next, we examine the distance-based localization and address its limitations for subsurface characterization. Finally, we discuss the two proposed flow relevant localization approaches and illustrate their benefits using a synthetic example and a fieldstudy. 2. EnKF: background There have been a variety of applications of the EnKF for a diverse set of problems since the original implementation by Evensen in 2003. Consequently, this section will provide only the mathematical details of the EnKF in relation to subsurface modeling. The basic derivation remains the same and the optimal estimates of the model unknowns can be derived from the Bayesian formalism or from the standpoint of minimizing the posterior uncertainty of the updated model variables [1,15,16,19]. It should be noted that the EnKF is optimal only for Gaussian model statistics and linear model dynamics. In groundwater and petroleum related applications, the augmented EnKF state vector, y at time, k, is defined by a vector of the model unknowns (state and parameters) and the observed data. The model unknowns, comprising, for example, the grid block permeabilities, porosities, phase saturations and pressures, are denoted by the vector, m of length Nm and the data is denoted by the vector, d of size Nd. Each model realization within the ensemble of size, Ne, is represented by the associated state vector. The goal of the EnKF is to modify the model state and parameters so that the individual ensemble members honor the observations, dobs. Additionally, the observed data is assumed to have a Gaussian noise with zero mean and variance, R. Rather than assimilating the data, we now assimilate ‘realizations’ of the data with the associated noise. This implies that any sensor bias is removed from the observations in a pre-processing step before they are available for assimilation

dk;j ¼ dobs;k þ ej ¼ Hytrue;k;j þ ej ¼ H

mk;j dobs;k

þ ej ;

j ¼ 1; N e


The trivial matrix H that relates the true, but unknown, state vector, ytrue, to the observed data is expressed as

H ¼ ½0 I


I is the identity matrix of dimensions Nd  Nd and 0 has the dimensions Nm  Nd, so the action of H on the state vector, y is to simply select the data component, d. The initial suite of model realizations (permeability distributions in our case) that constitutes the ensemble is constructed using geostatistical interpolation methods [11] that utilize core measurements, well log-derived rock properties and even seismic data in conjunction with apriori geologic information. These data types are classified as static data to reflect the time-invariant nature of the information. The spatial relationships between the model parameters are assumed to be known through a covariance model-derived from the statistics of the data. In a probabilistic sense, the initial samples can be said to be drawn from a prior probability distribution that reflects our uncertainty in the initial estimates of the spatially varying properties. Starting with an ensemble of subsurface models that already incorporates prior information, the EnKF recursively integrates additional data when it becomes available. At assimilation step, k, if a new measurement is obtained, the update equation in the EnKF for each realization within the ensemble can be expressed as

D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145 apriori ypost þ Kðdk;j  Hyapriori Þ; k;j ¼ yk;j k;j

j ¼ 1; N e


where K is known as the Kalman gain and directly relates the data mismatch between model predictions and observations to the required changes in the model unknowns. The updated models now represent our best knowledge of the state of the system at assimilation step, k, given all previously obtained data. A numerical flow and transport simulation then provides the dynamic responses and the forecast state at the next assimilation step, k + 1. Successive steps of prediction and model corrections consequently lead to the ensemble of models that are consistent with observed dynamic data and prior static information. The best estimate of the current state should be optimal in some pre-defined sense, either as a minimum variance estimate or as the estimate that maximizes the posterior probability [1,16]. Therefore, the Kalman gain can be expressed as,

K ¼ Cy HT ðHCy HT þ RÞ1


In the above equation, Cy is the state covariance matrix and by virtue of our definition of the augmented state vector, y, we have

Cy ¼






In Eq. (4), R is a covariance matrix that reflects our uncertainty in the measurements. The model covariance, CM provides the spatial correlation between the model parameters. The physical relationships between the data and the model parameters are embedded in the cross-covariance estimate, CM,d and the ensemble-derived product CyHT in the expression for the Kalman gain can be seen to be dependent primarily on the cross-covariance. The model updates via the EnKF, in effect, are therefore strongly dependent on the accuracy of the ensemble-derived cross-covariance calculations. Sampling related error arising from small ensemble sizes can significantly degrade EnKF performance. In addition, model variability, represented by Cy, is a key component of the update expression. In particular, when there is a significant loss in variance during data assimilation (Cy  0), the Kalman gain becomes negligible and therefore, the EnKF completely disregards future observations. Both the loss of variance and the degraded filter performance are closely tied to the accuracy of the ensemblebased cross-covariance calculations. In the next section, we illustrate these effects using a test case and examine the role of covariance localization in mitigating these effects.


2.1. An illustration: parameter estimation in multiphase flow In this example we illustrate the application of the EnKF for integrating dynamic data during multiphase flow in an oil reservoir undergoing water injection. We also highlight some of the problems encountered by EnKF for non-linear problems and nonGaussian priors. The reservoir is defined on a 50  50 grid with four producers located at the corners and a water injection well in the center. The observations are generated from a reference model shown in Fig. 1 and comprise of water-cut measurements at each of the four producing wells. The water-cut is simply the ratio of the water flow rate to the total (oil and water) flow rate. The wells are constrained to produce a certain volume of total fluid per day and the injector is constrained to inject a pre-specified volume of water. An initial ensemble of model realizations (permeabilities) was generated using the Sequential Gaussian Simulation [11] with an assumed prior covariance model. Also, the individual realizations are chosen to have a bi-modal log-permeability histogram in order to highlight some of the difficulties encountered by the EnKF with non-Gaussian priors. We chose to assimilate data for 3000 days and forecast for an additional 1000 days. The plot in Fig. 2 shows the initial ensemble predictions for water-cut and the corresponding matches to the observed history after assimilation for an ensemble size of 50. As expected, a significant reduction in the spread of the forecast is seen because of the water-cut data assimilation. Next, we performed the same experiment using different ensemble sizes, starting from an ensemble size of 25, then 50 and finally with a 100-member ensemble. In all the cases studied, the EnKF was able to satisfactorily reproduce the production history. However, a closer examination of the updated permeability fields reveals some of the shortcomings of the EnKF for this nonlinear and non-Gaussian example. In Fig. 3, we have shown four arbitrarily chosen realizations from the ensemble of 25 models for the pre- and post-assimilation steps. The apparent resemblance of each of the updated realizations is a consequence of the dramatic loss of variability during assimilation, partly caused by the small ensemble size. Although our confidence in the posterior estimates would improve through data fusion, it is unrealistic to expect complete resolution of all the subsurface unknowns; firstly, because of the inherent low resolution of the production data and secondly, because the number of model unknowns is significantly larger than the amount of available data. Additionally, artifacts such as localized regions with extreme values of permeability in the updated models indicate a significant departure from prior spatial distribution resulting in a

Fig. 1. The reference permeability for the synthetic example.


D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145

Fig. 2. The matches to the water-cut measurements (in red) at each of the three producing wells in the bottom row. The initial model predictions are in the top row. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 3. For an ensemble size of 25, this figure shows a comparison of four randomly selected realizations updated with water-cut measurements via the EnKF with the corresponding initial permeability fields.

loss of geologic realism. The log-permeability histogram from one randomly selected realization is shown in Fig. 4. Clearly it shows a significant departure from the prior histogram and the relative proportion of rock facies specified in the initial model. The lack of geologic consistency can adversely impact the predictive capability of the resulting model realizations. This is especially true for a variety of applications, e.g. when trying to identify suitable locations for additional wells, for the design of enhanced oil recovery processes or for inferring the predominant pathways for tracer and contaminant transport. The sensitivity to sample size is best illustrated with Figs. 5 and 6. It is evident that some of the difficulties with the EnKF are some-

what mitigated as the ensemble size is increased [12,13,20]. As expected, the estimates based on sample statistics become more reliable as the sample size is increased. The inaccuracy of the cross-covariance estimates adversely affects the performance of the EnKF when using a small number of model replicates. This conclusion is reinforced by Fig. 7 which plots the total variance contained in the ensemble as a function of assimilation time. However, our ability to increase the ensemble size is limited by the computational requirements which can become prohibitive for large-scale field studies. For practical applications, it is important to keep the ensemble size relatively small and at the same time improve the cross-covariance estimates. This has been the

D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145


Fig. 4. (a) The log-permeability histogram for one arbitrarily chosen realization. (b) The Gaussian characteristics of the corresponding histogram post-assimilation.

Fig. 5. For an ensemble size of 100, this figure shows a comparison of four randomly selected realizations updated with water-cut measurements via the EnKF with the corresponding initial permeability fields.

motivation behind the concept of spatially localizing the effects of measurements, otherwise known as covariance localization [5,12,13,20,27]. 3. Covariance localization for ensemble-based data assimilation The goal of covariance localization for ensemble-based estimation problems is to eliminate or dampen the effects of spurious correlations between the model parameter, state and the measurements, typically arising from the limited ensemble size. Mathematically, this can be accomplished using the Schur product (element-by-element multiplication) of a localizing function, q, with the sample-based Kalman gain estimate [20,34],

ypost ¼ yapriori þ q  Kðdobs  Hyapriori Þ k k k


The resulting expression for the Kalman gain can be written as,

KLocalized ¼ q  K ¼ q  Cy HT ðHCy HT þ RÞ1


The justification for a particular form of the localizing function will be dictated by the type of application and to some extent by the ease of implementation. The following section discusses the effectiveness of covariance localization for data assimilation using ensemble methods. Specifically, we compare the commonly used distance-based localization scheme with two proposed flow relevant localizations schemes based on the physics of multiphase flow. 3.1. The need for covariance localization: an illustration To further motivate the need for covariance localization, let us consider an ensemble of two-dimensional reservoir models with spatially uncorrelated permeability distribution. Essentially, these represent samples from a multi-Gaussian distribution with a fixed mean and negligible spatial correlation length. As before, we have four producing wells at each of the corners and one injection well at the center of the field. At an arbitrarily chosen assimilation step, the cross-covariance between the permeability values and the


D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145

Fig. 6. Sensitivity to ensemble size for the synthetic example viewed in terms of the log-permeability histograms. In spite of increasing ensemble size to a 100 members, we do not maintain prior model statistics.

Fig. 7. Total variance versus time during assimilation. Increasing the ensemble size reduces the loss of variance.

based on a 50-member ensemble are severely degraded by the small sample size. Spurious correlations at distant grid locations dominate the structure of the estimates. Additionally, the covariance structure compared to the 1000-member ensemble is significantly corrupted by noise. The inaccuracy in the cross-covariance computations will adversely impact parameter estimates leading to geologically inconsistent posterior models as seen in our previous examples. Previous studies [19,16] have indicated that for bounded error growth in the model covariance estimation, the ensemble size should vary as the square of the number of variables to be estimated. This is clearly an impractical proposition for largescale systems. This underscores the need for a robust and efficient covariance localization method that minimizes spurious correlations arising from modest ensemble sizes dictated by practical applications. In the sections below, we first discuss a previously proposed distance-based covariance localization scheme [17,20,34] and some of its difficulties and limitations. We then propose two streamline-based localization schemes and demonstrate their effectiveness using synthetic and field examples. 3.2. Distance-dependent covariance localization

water-cut response can be estimated from the ensemble and the results are shown in Fig. 8 for one of the producing wells (top left corner). We compare the estimates from a 50-member ensemble and a much larger 1000-member ensemble. This particular example with no spatial correlation was deliberately chosen to highlight the effects of ensemble size on cross-covariance estimates. In fact, the ensemble-based estimate is expected to be similar to that of a homogeneous permeability field and consequently, will have prominent highs along the diagonal leading from the injection well to the selected producer with reduced values elsewhere. The cross-covariance estimates from the 1000-member ensemble which can be considered closer to the truth, shows the expected behavior as shown in Fig. 8. However, the calculations

Intuitively, the influence of a measurement on the state and parameter estimation is expected to decrease with increasing distance. This assumption forms the basis for the distance-dependent covariance localization. The impact of a measurement is assumed to be negligible beyond a pre-specified radius of influence [20]. Within this region of influence, typically a monotonically decreasing correlation function centered at the observation is used for covariance localization [18]. An example of a localizing function based on this concept is shown in Fig. 9. In this section, we examine the application of the distancebased localization to the waterflood example introduced in Section 2.1 with the reference permeability shown in Fig. 1. An ensemble

D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145


Fig. 8. Estimates of cross-covariance between water-cut data and permeability at a certain assimilation step for differing ensemble sizes.

localization is additionally, a function of the ensemble size. For smaller sample sizes, even short-range correlations can be significantly degraded by spurious and random noise and the correlations may need to be damped at a shorter distance. Furthermore, the simple notion of a distance-dependent localization may not be consistent with the subsurface heterogeneity that can be anisotropic and might include high conductivity channels that contribute to long-range correlations in specific directions [5]. For 3dimensional applications, these problems can be compounded by the requirements of different localizing distance in the horizontal and vertical directions because of the differences in the correlation lengths. In the light of these drawbacks, distance-based localization generally have limited applicability, particularly for subsurface characterization using EnKF. There is a need for more reliable localization techniques that are tied to the underlying geology and represent the flow field more realistically. 3.3. Streamline trajectory based covariance localization

Fig. 9. The distance-dependent localizing function for the well (black dot) showing decreasing levels of influence with increasing distance from the well.

size of 50 is chosen for this example. An initial cut-off radius that bounds the region of influence for each well is assumed to be 200 feet. For comparison, the field dimensions are 500 feet on each side of the square-shaped reservoir. The measurements are the water-cut information at each of the three production wells. The water-cut data for the updated models are shown along with the initial ensemble predictions in Fig. 10. Although, there is a considerable reduction in the spread of the prediction envelope, there are still significant differences between the observations and modelderived water-cut performance data. The reason for a poor match is related to the inappropriate and arbitrary choice of a cut-off radius. This underscores the need for a proper choice of a radius of influence for localizing the effects of measurements. For this example case, a cut-off distance of 400 feet seems to be more appropriate as seen in Fig. 11. In general, the choice of a suitable characteristic length-scale associated with the observations is subjective and requires some trial-and-error. Although there are some guidelines in the literatures [3], the optimal radius of influence is likely to be application dependent and may be difficult to determine a priori. Hamill and Whitaker [20] observed that the length-scale chosen to achieve

In this section we introduce the streamline trajectory based covariance localization. The primary goal here is to eliminate erroneous fluctuations in the cross-covariance matrix because of sampling error, thereby allowing for parameter updates that are consistent with the underlying geology and the flow field. The use of streamline-based methods is physically intuitive and the streamline time of flight quantitatively defines the region of influence for measurements accounting for the interactions between the heterogeneity and the flow field. Adjusting the cross-covariance by the use of streamlines moreover keeps the changes targeted to areas that have the greatest impact on the fluid flow and transport. Conceptually, this is analogous to streamline-assisted history matching which has been successfully used to identify regions within the reservoir that have significant impact on the observations [14]. The streamline-based approach does not make any a priori assumptions about the strength of correlations at different locations and thus, naturally retains relevant short and long-range correlations. The procedure is simple and computationally inexpensive to implement. Fundamentally, the Kalman update expression remains the same as shown in Eq. (1); however, the localizing function now becomes a function of the underlying flow field. At any given assimilation step, to effectively localize regions corresponding to each observation, we establish whether or not a grid cell within the region is intersected by a streamline originating from the source of the observation. If the grid cell is intersected


D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145

Fig. 10. Using a cut-off radius of 200 feet, the matches to the observations are not satisfactory. This results from the use of a very short radius of influence which eliminates relevant correlations.

Fig. 11. By increasing the cut-off radius to 400 feet, the matches to the observed water-cut data is significantly better.

by the relevant streamline, it is assigned a value of 1, otherwise 0. These results form the column corresponding to the observation in the matrix, q, where a value of one refers to a location relevant to the observation. By successively accounting for each observation, we populate the localizing function, q. Within this framework, noisy terms arising due to sampling errors are eliminated if they

are not compatible with the underlying flow field. This is repeated for each model replicate within the ensemble. Consequently, this results in an ensemble of regions of influence that represents potential target zones for each realization. To implement this form of covariance localization to the entire ensemble, an aggregate zone common to the ensemble is selected

D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145


Fig. 12. The spatial localization using streamline trajectories shows a complicated but flow relevant pattern that cannot be replicated by a notion of distance. On the left (a and b) are the streamline patterns at a specified assimilation step and the corresponding grid blocks for two randomly chosen realizations. On the right in ‘c’ is the region selected by the streamline trajectories over all the realizations which is referred to as the ‘region of influence’ in the text.

by stacking the regions of influence of each realization. Potentially, additional constraints may be applied to restrict the targeted zone of influence. For example in multiphase flow, the region may be restricted to areas behind a flood front. For recursive data assimilation, this implies that the target area is redrawn at every assimilation step to account for changing subsurface flow conditions, additional wells or simply to reflect modified well configurations. A visual representation of the spatial localization region for one randomly selected well in the bottom right of a synthetic reservoir is shown in Fig. 12 at a certain assimilation step. The grid blocks selected by the streamlines highlight the complicated flow pattern that is a function of the underlying heterogeneities and the well locations. Clearly, the information provided by the streamline trajectories cannot be replicated by a Naïve notion of distance. Some segments of the reservoir adjacent to the wells are not covered by streamlines and on the other hand, distant locations are included in the area of influence as dictated by the flow pattern. This implies

that the EnKF-predicted changes will be targeted and restricted to areas that are strongly correlated with the observations. Mathematically, the localizing function q now consists of a matrix with columns of 0’s and 1’s designed to select the appropriate grid cells relevant to the EnKF update. To examine the benefits of this approach, streamline trajectory based localization is implemented for this synthetic example. Fig. 13 shows four randomly selected permeability realizations that are calibrated with water-cut data using EnKF and streamline-based localization. Although an ensemble size of 50 was used here, the results seem to outperform the traditional EnKF with 100 members as shown in Figs. 5 and 6. The updated permeability realizations show no overshooting or undershooting and appear to preserve all the important features of the prior geologic model including the bi-modality of the permeability histogram (Fig. 14). This is a consequence of restricting the model adjustments to areas relevant to the flow through the use of streamline trajectories. Another important benefit of flow relevant covariance localization is

Fig. 13. Four selected realizations resulting from the use of streamline trajectory based covariance localization with the EnKF in comparison with the prior models.


D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145

Fig. 14. The prior model statistics in (a) are better preserved with the use of streamline trajectory based covariance localization. (b) Plots the corresponding log-permeability histogram for one selected realization.

to prevent the loss of ensemble variability and thus, avoiding filter divergence. This is shown in Fig. 15 and compared to the standard EnKF, a significantly improved result is obtained with only half the number of ensemble members. Implementation of this form of covariance localization using streamline trajectories entails very little additional computational burden. The streamlines are traced using the fluid fluxes that are readily available from the solution of the flow problem numerically [10,24,34]. Consequently, streamline trajectory based spatial localization forms the basis of an efficient means to enhance state and parameter estimation using the EnKF.

Sensitivities define the relationship between the model parameters and the response and are routinely employed for dynamic data integration using inverse modeling algorithms to update the spatial distribution of porosity and permeability based on the field response [7,8,10,28,29,31,32,35]. In this section, we first review the basic methodology behind the streamline-derived sensitivities for covariance localization in the EnKF. We illustrate the technique with a simple example to demonstrate improved estimates of the cross-covariance. We then provide the mathematical details behind the sensitivity computations and the application of the methodology in conjunction with the EnKF for the purposes of covariance localization.

3.4. Streamline-derived sensitivity-based covariance localization Distance-dependent localization attempts to taper the crosscovariance on the assumption that the correlations are a function of the spatial distance only. With streamline trajectories, the cross-covariance calculations become more accurate because spurious terms unrelated to the physical solution are eliminated. In using streamline-derived sensitivity-based covariance localization, we attempt to achieve the goal of tapering the cross-covariance calculations within the area of influence defined by streamline trajectories by the use of a localizing function derived from the parameter sensitivities [12,13]. The sensitivities are computed efficiently from the streamlines.

Fig. 15. Plot of the total variance versus time during assimilation. For the same performance, the standard EnKF implementation requires a much larger ensemble in comparison with trajectory based localization of the cross-covariance.

3.4.1. Sensitivity-based localization: an illustration To illustrate the effects of finite sample size on the cross-covariance estimation and the benefits of streamline-derived sensitivitybased covariance localization, we consider the example introduced in Section 3.1. For two arbitrarily chosen permeability realizations, the streamline patterns for two wells at a selected step during assimilation are shown in Fig. 16. Barring minor deviations because of the random nature of the underlying permeability values, the streamline patterns are similar to that of a homogeneous permeability field. This is, of course, the consequence of the uncorrelated nature of the permeability distribution in this case. The cross-covariance calculations between water-cut data at one of the wells and the permeabilities for 50 and 1000 members discussed in Section 3.1 are reproduced in Fig. 17. Also shown are the localized cross-covariance calculations using streamline-based sensitivities. The 1000-member ensemble has a sample size roughly on the order of the model parameter dimensions (= 2500) and consequently, will be closely related to the true value of the cross-covariance. Additionally, the relationship between the well data and the permeabilities is seen to have the expected structure with the highs located along the diagonal and the lows further away from the diagonal. This is physically intuitive but is not apparent for the 50-member ensemble. However, spatially localizing the cross-covariance based on streamline-derived sensitivities described in the next section for the 50-member ensemble results in the cross-covariance structure very similar to the estimate from the much larger 1000 member ensemble. We now discuss the mathematical background behind the sensitivity-based localization. 3.4.2. Mathematical background and formulation Our approach relies on inexpensive streamline-based sensitivity calculations that can be used to modify the ensemble-derived cross-covariance estimates. If we denote the sensitivity matrix by

D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145


Fig. 16. Two randomly selected realizations with the associated streamline trajectories at two wells clearly showing the resemblance to the patterns obtained from homogenous parameter fields.

Fig. 17. Sensitivity-based localization offers an order reduction in computational effort. With just 50 members, the cross-covariance estimates resemble the unconditioned estimate from a 1000-member ensemble. Moreover, the unconditioned estimate is largely noise with smaller ensemble sizes.

G and the corresponding model covariance matrix by CM, the crosscovariance can be represented as the product

CM;d ¼ CM GT


From Fig. 17, we can see that for the case with ensemble size of 50, the cross-covariance estimate is dominated by spurious correlations and a simple screening of the sample-derived correlations may not be sufficient. In fact, it might be necessary to include a weighting factor to account for the relative significance of various

spatial locations to the production response. This can be efficiently achieved with the use of parameter sensitivities. To impose the structure of the sensitivity matrix G on the crosscovariance, we simply multiply the sample-based correlations with the normalized sensitivities. This is illustrated in Fig. 18 which shows the localizing function for the grid cells selected by streamlines. The localizing function, q, modifies the ensemble-derived cross-covariance using a Schur product in the following manner,


D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145

Fig. 18. The locations selected by the streamlines in (a) enter into the cross-covariance estimate and the weighting function shown in (b) tapers the resulting crosscovariances. This is the basis of streamline-derived sensitivity-based covariance localization.

Fig. 19. The changes to the models are kept targeted and minimal and the matches to the observations are reasonable.

CM;d ¼ q  CM;d;Ensemble ¼ GTN  CM;d;Ensemble ¼ CM;d;Ensemble  GTN


The last equality is a consequence of the interchangeability of the Schur product operator which simply is an element-by-element multiplication and GN refers to the normalized streamlinederived sensitivities. The elements of the localization matrix q are obtained by summation of the sensitivity coefficient at each

grid location obtained from each realization within the ensemble and arranging the resulting values in column vectors corresponding to each measurement. The values within each column vector are now normalized on a scale of 0–1 using the maximum absolute sensitivity within each column. This procedure essentially forms a matrix of localizing coefficients that quantify the relative importance of each grid parameter to the individual production

D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145


Fig. 20. The updated model realizations shown on the right for four randomly selected members clearly show a high correlation with the prior geological models.

Fig. 21. The log-permeability histogram (b) for one selected realization in comparison with the prior in (a).

Fig. 22. Streamline-derived sensitivity-based covariance localization restricts model updates to flow relevant regions and the loss of variance in the updated members is very minimal.

Fig. 23. Without localization, the posterior ensemble-derived covariance matrix shows a loss of variance in the trailing eigen-directions. However, by using flow relevant conditioning, the eigen-spectrum shows minimal loss of variance.


D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145

Fig. 24. The Goldsmith pilot study area showing the location of the producers in yellow and the injectors in blue. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 25. (a) The initial log-permeability distribution and the corresponding histogram for one randomly selected realization. (b) The posterior estimate following data assimilation using EnKF while and (c) The posterior estimate using EnKF with streamline-based covariance localization.

responses. In all cases studied here, the sensitivity to permeability variation is used to update the parameter field (the permeability). The computation of the matrix q is performed at each update step to reflect changing field conditions or updated well configurations.

While there are several approaches to compute sensitivity coefficients, streamline models are particularly well suited for this purpose [7,8,10,28,29,35]. By taking advantage of the analytic relationship between streamline characteristics such as the time-

D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145


of-flight and reservoir parameters like permeability and porosity, sensitivity estimation can be obtained simultaneously with the streamline simulation with very little additional computational burden [8,10]. The efficiency of the computation is a direct result of the fact that the parameter sensitivities can be expressed as 1D integrals along the streamline trajectories [7,8,10,32,35].

based localization appears to be promising and offers the possibility of dramatically reducing the associated computational burden by allowing the use of modest ensemble size in practical applications.

3.4.3. Application: synthetic example We now discuss the performance of the streamline-sensitivitybased covariance localization scheme using the synthetic example previously introduced in Section 2.1. We use 50 ensemble members here. The matches to the water-cut performance data at each well are shown in Fig. 19. Fig. 20 shows a randomly selected set of four updated permeability fields. For comparison purposes, we have also shown the corresponding initial permeability. In contrast to the traditional EnKF, the changes here are more localized and much of the initial geologic features are preserved. Also, parameter overshooting is largely eliminated. There are no localized zones of high and low permeability in the updated realizations of Fig. 20. Additionally, without covariance localization, the prior multimodal parameter histograms tend to become Gaussian. However, in this case, prior model statistics are preserved and this is reflected in the respective histograms in Fig. 21. It is important to note that all of this was achieved using a modest ensemble size of 50 members. The tendency of the EnKF to create a variance-deficient ensemble is also mitigated to a large extent. This is indicated in Fig. 22, which plots the total variance versus assimilation time with and without localization. The loss of variability is reflected by the eigen-spectrum of the model parameter covariance matrices which is an indicator of the energy contained in different eigen-directions. Dramatic decay rates of the eigen-spectrum can be viewed as insufficient projection of the variance along certain axes of the hyper-ellipse that represents the parameter covariance structure. To illustrate this, the eigen-value decomposition of the ensemble-derived covariance matrix is computed at each assimilation step with and without localization. This is shown in Fig. 23. The degraded covariance structure associated with the lower energy trailing eigen-values implies a smaller effective ensemble size. In conclusion, for efficient data fusion for practical hydrologic and reservoir characterization problems, the use of sensitivity-

In this section, we discuss the application of streamline-derived sensitivities for covariance localization to a field-scale example. The study area is the Goldsmith CO2 pilot project, a predominantly dolomitic reservoir in West Texas. The pilot area comprises of nine inverted 5-spot patterns covering around 320 acres and the average formation thickness is 100 feet. Significant water-cut data for 20 years of production prior to the start of the CO2 injection is available at nine production wells and these were used to condition the permeability fields. The detailed production rates, well scheduling, well conversions and shut-in can be found elsewhere [21]. An ensemble of 50 permeability realizations was constructed using Sequential Gaussian co-simulation with well and seismic data [10]. Fig. 24 is an areal plot of the location of the Goldsmith CO2 pilot area within an extended study area showing the location of the surrounding producers and injectors. In Fig. 25a, we plot the spatial distribution of the log-permeability field and the corresponding histogram for one randomly selected realization prior to data assimilation. The multimodal histogram of the initial log-permeability field reflects the presence of multiple geologic facies with a significant peak at lower permeability values indicating a fairly large proportion of poor reservoir quality rock. Using the standard EnKF, the ensemble of model realizations are conditioned to water-cut history and Fig. 25b shows the updated model after a sequence of updates. The loss of structure in the permeability field is quite apparent here and considerable parameter overshooting is observed following data assimilation resulting in localized zones of extreme values of permeability. The log-permeability histogram also tends to become Gaussian and differs significantly from the initial multimodal distribution. Repeating the water-cut assimilation with spatial localization using streamline-derived sensitivities, we can significantly improve the quality of the solution. This is shown in Fig. 25c. The

3.5. Streamline-based covariance localization: field-scale example

Fig. 26. Un-localized cross-covariance estimates between water-cut data (at the indicated well) and grid block permeabilities are plotted at a selected assimilation step in (a) and the corresponding values using streamline-based covariance localization are shown in (b).


D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145

Fig. 27. Matches to water-cut information at three selected wells with the initial predictions on top and the final matches in the bottom row.

figure demonstrates that both the spatial continuity of the predominant geological features and the multimodal nature of the permeability histogram are maintained. This is achieved without creating physically inconsistent regions of extreme permeability values. The adjustments to the initial models are minimal and localized to the water swept zones which are physically intuitive and reasonable. This is the preferred mode of comparison to check the validity of the updates in the absence of a reference or ‘truth’ model. Additionally, the log-permeability histogram of the updated model clearly indicates that the relative proportion of various reservoir rock types is preserved. We can also interpret these results by analyzing the ensemblederived cross-covariance estimates obtained from the un-localized and the localized EnKF variants. In Fig. 26, we plot the cross-covariance at a selected assimilation step between water-cut at one of the wells and the permeabilities. The un-localized estimate shows large values even for distant locations where it is reasonable to expect that there is no correlation between the permeability and the water-cut data. However, with streamline-derived sensitivitybased localization, distant data points are not included in the analysis and the cross-covariance is shown to be more localized in the vicinity of the well. This is to be expected because the water arriving at producers travels mostly across the reservoir volume surrounded by the injectors and the producing wells and the crosscovariance estimates should be restricted to these regions of the reservoir. Consequently, the predicted changes should also be preferentially found in this area. This is clearly not the case with the un-localized cross-covariance estimate used in the standard EnKF formulation. Therefore, although the producers are completed in the center of the field, the largely indiscriminate updates contribute to the loss of structure in the permeability field during data assimilation when no localization is applied as shown in Fig. 25b. In Fig. 27 we show the matches to production history for three arbitrarily selected wells. Because of new wells and changing well conditions in the field, the streamlines as well as the regions of localization at each assimilation step can vary and this is easily accounted in our proposed approach. At each assimilation step, we simply repeat the procedure of tracing the streamlines and then identify the corresponding region of influence for each well in order to localize the crosscovariance estimates [5].

4. Summary and conclusions The adverse impacts of inaccurate reservoir or groundwater model forecasts can compromise the economic feasibility of a project or significantly alter the conclusions of comprehensive engineering studies. Accurate subsurface characterization therefore becomes vital for evaluating various resource management strategies. The EnKF is a viable and promising approach to sequential model updating to reconcile geologic models to dynamic data. The primary advantages of the EnKF are computational ease, portability across diverse numerical flow simulation software and the ability to simultaneously assimilate data and quantify model uncertainties. However, the EnKF is very sensitive to ensemble size and the noise resulting from random statistical variability is shown to lead to degraded model updates and geologically implausible subsurface models thereby severely compromising model forecasts, especially with modest ensemble sizes. In this paper, we propose eliminating or reducing the severity of the statistical noise by conditioning the cross-covariance matrix between the data and the model unknowns and this is achieved through the use of flow relevant streamline-derived information. We compare the proposed approach with the traditional implementation of the EnKF and demonstrate that the use of covariance localization via the use of the streamlines significantly improves EnKF performance and leads to significant savings in computational effort by reducing the need for large ensemble sizes. The primary conclusions to be drawn from this work are:  Spatial localization of the ensemble-derived correlations is a necessary step prior to the process of assimilation. The selected covariance localization methodology should be related to the underlying flow patterns and the geologic structure so that updates are consistent with the prior static information.  Streamline trajectory based and streamline-derived sensitivitybased localization schemes are efficient and effective methods to regionalize the cross-covariance to restrict the influence of the observations on the parameters. Distance-dependent localization may not be appropriate for highly heterogeneous reservoir mapping and the effectiveness is likely to be applicationdependent.

D. Devegowda et al. / Advances in Water Resources 33 (2010) 129–145

 Considerable speed-up of the assimilation process is achievable using information derived from streamlines, because the accuracy of the solution is similar to an EnKF assimilation sequence using a much larger ensemble. The extra step of conditioning the cross-covariance far outweighs the need to maintain and update a much larger ensemble thereby making flow relevant conditioning techniques practical enough to use for operational problems.  In general, the use of sensitivities allows us to reduce the ensemble sizes further. However, for some data types, for e.g. head measurements, the sensitivity-based approach is difficult to implement because of the formulation of the pressure sensitivities requires significantly extra effort.

References [1] Akella S, Efendiev Y, Datta-Gupta A. A coarse-scale constrained ensemble Kalman filter for subsurface characterization. Water Resour Res; submitted for publication. [2] Anderson JL. An ensemble adjustment filter for data assimilation. Monthly Weather Review 2001;129(12):2903–94. [3] Anderson JL. An adaptive covariance inflation error correction algorithm for ensemble filters. Tellus 2007;59(2):210–24. [4] Anderson JL, Anderson SL. A Monte-Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon Weather Rev 1999;127(12):2741–58. [5] Arroyo E, Devegowda D, Datta-Gupta A, Choe J. Streamline assisted ensemble Kalman filter for rapid and continuous reservoir model updating. SPERE 2008;11(6):1046–60. [6] Bishop CH, Etherton BJ, Majumdar SJ. Adaptive sampling with the ensemble transform filter. Part I: Theoretical aspects. Mon Weather Rev 2001;129(3):420–36. [7] Cheng H, Wen XH, Milliken WJ, Datta-Gupta A. Field experiences with assisted and automatic history matching using streamline models. In: Paper SPE 89857 presented at the SPE annual technical conference and exhibition, Houston, 24– 27 September; 2004. [8] Cheng H, Khargoria A, He Z, Datta-Gupta A. Fast history-matching of finite difference models using streamline-derived sensitivities. SPERE 2005;8(5):426–36. [9] Cheng H, Oyerinde AS, Datta-Gupta A. Compressible streamlines and threephase history matching. SPEJ 2007;12(4):375–85. [10] Datta-Gupta A, King MJ. 2007. Streamline simulation: theory and practice. Textbook series #11, Society of Petroleum Engineers, Richardson, TX. [11] Deutsch CV, Journel A. GSLIB geostatistical software library and user’s guide. Oxford University Press; 1992. [12] Devegowda D, Arroyo E, Datta-Gupta A, Douma SG. Efficient and robust reservoir model updating using ensemble Kalman filter with sensitivity based covariance localization. In: Paper SPE 106144 presented at the SPE reservoir simulation symposium, Woodlands, Texas, 24–26 February; 2007. [13] Devegowda D. Streamline assisted ensemble Kalman filter – formulation and field application. PhD dissertation, Texas A&M University; 2008.


[14] Emanuel AS, Milliken WJ. History matching finite-difference models with 3D streamlines. In: Paper SPE 4900 presented at the SPE annual technical conference and exhibition, New Orleans, 27–30 September; 1998. [15] Evensen G. The ensemble Kalman filter: theoretical formulation and practical implementation. Ocean Dyn 2003;53:353–67. [16] Evensen G. Data assimilation, the ensemble Kalman filter. Springer Publishers; 2006. [17] Furrer R, Bengtsson T. Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J Multivariate Anal 2004;98:227–55. [18] Gaspari G, Cohn SE. Construction of correlation functions in two and three dimensions. Office Note Series on Global Modeling and Data Assimilation, DAO Office Note 96-03R1; 1996. [19] Gu Y, Oliver DS. The ensemble Kalman filter for continuous reservoir model updating. J Energy Resour Technol 2006;128:79–91. [20] Hamill TM, Whitaker JS. Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon Weather Rev 2001;129:2776–93. [21] He Z, Yoon S, Datta-Gupta A. Streamline based production data integration under changing field conditions. SPEJ 2002;7(4):423–36. [22] Houtekamer PL, Mitchell HL. Data assimilation using ensemble Kalman technique. Mon Weather Rev 1998;126(3):796–811. [23] Houtekamer PL, Mitchell HL. A sequential ensemble Kalman filter for atmospheric data assimilation. Mon Weather Rev 2001;129(1):123–37. [24] Jimenez E, Sabir K, Datta-Gupta A, King MJ. Spatial error and convergence in streamline simulation. SPERE 2007;10(3):221–32. [25] Johns CJ, Mandel J. A two-stage ensemble Kalman filter for smooth data assimilation. Environ Ecol Stat; in print. [26] Nævdal G, Johnsen LM, Aanonsen NI, Vefring EH. Reservoir monitoring and continuous model updating using ensemble Kalman filter. SPEJ 2005;10(1):66–74. [27] Oke PR, Sakov P, Corney SP. Impacts of localization in the EnKF and EnOI: experiments with a small model. Ocean Dyn 2007;57(1):32–45. [28] Oyerinde AS. A composite tracer analysis approach to reservoir characterization. MS thesis, Texas A&M University; 2004. [29] Oyerinde AS. Streamline based three phase history matching. PhD dissertation, Texas A&M University; 2008. [30] Pollock DW. Semi-analytical computation of path lines for finite difference models. Ground Water 1998;26(6):743–52. [31] Tarantola A. Inverse problem theory. SIAM; 2004. [32] Vasco DW, Yoon S, Datta-Gupta A. Integrating dynamic data into high resolution reservoir models using streamline-based analytic sensitivity coefficients. SPEJ 1999;4(3):389–99. [33] Vega L, Rojas D, Datta-Gupta A. Scalability of the deterministic and bayesian approaches to production-data integration into reservoir models. SPEJ 2004;9(3):330–8. [34] Whitaker JS, Hamill TM. Ensemble data assimilation without perturbed observations. Mon Weather Rev 2002;130(7):1913–24. [35] Yoon S, Barman I, Datta-Gupta A, Pope GA. In-situ characterization of residual NAPL distribution using streamline based inversion of partitioning tracer tests. In: Paper SPE 52729 presented at the SPE/EPA exploration and production environmental conference, Austin, 1–3 April; 1999. [36] Zhang F, Zhang M, Hansen JA. Coupling ensemble Kalman filter with four dimensional variational data assimilation. Mon Weather Rev; submitted for publication.