Robust dynamic process monitoring based on sparse representation preserving embedding

Robust dynamic process monitoring based on sparse representation preserving embedding

Journal of Process Control 40 (2016) 119–133 Contents lists available at ScienceDirect Journal of Process Control journal homepage: www.elsevier.com...

2MB Sizes 1 Downloads 39 Views

Journal of Process Control 40 (2016) 119–133

Contents lists available at ScienceDirect

Journal of Process Control journal homepage: www.elsevier.com/locate/jprocont

Robust dynamic process monitoring based on sparse representation preserving embedding Zhibo Xiao a , Huangang Wang a,∗ , Junwu Zhou b a b

Institute of Control Theory and Technology, Department of Automation, Tsinghua University, Beijing 100084, China State Key Laboratory of Process Automation in Mining & Metallurgy, Beijing 100160, China

a r t i c l e

i n f o

Article history: Received 11 March 2015 Received in revised form 16 January 2016 Accepted 25 January 2016 Keywords: Dynamic process monitoring Robust fault detection Manifold learning Neighborhood preserving embedding Robust sparse representation

a b s t r a c t In this paper, a novel dimensionality reduction technique, named sparse representation preserving embedding (SRPE), is proposed by utilizing the sparse reconstruction weights and noise-removed data recovered from robust sparse representation. And a new dynamic process monitoring scheme is designed based on SRPE. Different from traditional manifold learning methods, which construct an adjacency graph from K-nearest neighbors or ε-ball method, the SRPE algorithm constructs the adjacency graph by solving a robust sparse representation problem through convex optimization. The delicate dynamic relationships between samples are well captured in the sparse reconstructive weights and the error-free data are recovered at the same time. By preserving the sparse weights through linear projection in the clean data space, SRPE is very efficient in detecting dynamic faults and very robust to outliers. Finally, through the case studies of a dynamic numerical example and the Tennessee Eastman (TE) benchmark problem, the superiority of SRPE is verified. © 2016 Elsevier Ltd. All rights reserved.

1. Introduction As an important technique to enhance process safety and to ensure product quality, process monitoring has received tremendous attention in both academia and industry during the last two decades [1–3]. In modern industrial systems, owing to the wide use of sensors and implementation of advanced computer and information technology, process data, which can be efficiently collected and stored at low costs, usually have high dimensions and reflect rich information about the process characteristics [2,4]. As a result, these process data can be utilized for process modeling and monitoring. Multivariate statistical process monitoring (MSPM) is one of the most popular data-driven techniques and has been widely applied in real industrial processes for its simplicity. Principal component analysis (PCA) and partial least squares (PLS) are two of the most widely used models in MSPM [5–7], both of which project high dimensional, highly correlated process data onto a low dimensional latent subspace to retain the most data variation through the first few principle components. In the low dimensional subspace, the global Euclidean structure of the data is preserved. However, PCA assumes that the process data are Gaussian distributed, which is

∗ Corresponding author. Tel.: +86 10 62781993; fax: +86 10 62786911. E-mail addresses: [email protected] (Z. Xiao), [email protected] (H. Wang). http://dx.doi.org/10.1016/j.jprocont.2016.01.009 0959-1524/© 2016 Elsevier Ltd. All rights reserved.

seldom the case in real industry. When the Gaussian assumption is not met, the local geometrical structure of the original data set can be lost in the low dimensional subspace, and thus the monitoring performance will be deteriorated. Besides, MSPM methods assume that the process variables are sampled statistically independently over time. However this assumption is only valid over long sampling intervals. Because of the closed-loop control systems in industrial processes, the process samples are temporally correlated. When a fault only influences the dynamic characteristics of the process, the monitoring performance will be seriously hampered. To extend the PCA application to dynamic process monitoring, Ku et al. proposed a time lagged version of PCA called dynamic PCA (DPCA) [8]. In the DPCA model, time lagged measurements are added to the data matrix and traditional PCA is performed on the augmented data. Although DPCA can handle dynamic process better than PCA, within the DPCA model the cross-correlation among variables and auto-correlation over time are mixed, making the model parameters overly large and making it difficult to interpret the dynamic relationships. Besides, the score variables of the DPCA model are auto-correlated. To remove this autocorrelation, the univariate ARMA filters were proposed [9]. Besides the DPCA method, there are also some attempts based on the subspace modeling [10,11]. These attempts mainly focused on the subspace identification methods, e.g. numerical subspace state space system identification (N4SID), error in variable (EIV) identification, and canonical variate analysis (CVA) [12–14]. In the CVA

120

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

model, both the past information vectors and the future information vectors are defined. Afterwards, the dynamics are modeled by capturing the relationships between the two vectors. The CVA model can be calculated based on the singular value decomposition of the covariance matrix of the two vectors. Last but not the least, all PCA-based methods are based on the L2-norm objective function, which is prone to the presence of outliers, because the effect of the outliers with a large magnitude is exaggerated by the use of the L2-norm [15–17]. In modern industry, due to the failure of instrument and experiment errors, outliers are inevitable in historical database and costly to detect manually. When a PCA model is trained from this corrupted data set, the monitoring performance will be far from satisfactory. Recently, a number of nonlinear manifold-learning based dimension reduction (DR) techniques have shown promising results and proven to extract more meaningful information than PCA, such as IsoMap [18], Local Linear Embedding (LLE) [19,20] and Laplacian eigenmap [21]. All of these methods construct the DR model based on a weighted adjacency graph. In this graph, each vertex represents a training sample and each weighted-edge linking two vertices represents some proximity measure between samples. The objective of these manifold learning methods is to maximally preserve the local geometric structure of the data hidden in the underlying adjacency graph after projection onto a low dimensional subspace. As these nonlinear methods are only defined on the training data set and they cannot explicitly give low dimensional representation for a novel test sample, in order to solve the out-of-sample problem there has been extensive work on linear approximation algorithms of these manifold-learning methods, such as neighborhood preserving embedding (NPE) [22], locality preserving projection (LPP) [23–26], and isometric projection [27]. Among these linear projection methods, NPE [28], LPP [29] and various improvement methods based on them [30–33] have been successfully applied to process monitoring and achieved some promising results. These manifold based projection methods can handle the nonlinear and non-Gaussian characteristics of process data better than PCA as they can preserve the local geometric structure of the process data which is totally lost for PCA. In order to deal with the dynamic process monitoring problem, Miao et al. [30] proposed a method to construct the adjacency graph based on a time window, which shows better results than NPE and traditional PCA-based methods. However, these manifold based projection algorithms still depend on an adjacency graph. This adjacency graph is usually constructed by the K-nearest neighbors or the ε-ball method, where two samples link with each other if and only if they reside among each others K nearest neighbors or within the ball of Euclidean distance ε. The graph construction procedure heavily depends on pair-wise sample Euclidean distance and can be easily affected by noise and outliers which are very common in process data. This means that the graph structure obtained above can be invalid and fail to express the true underlying manifold and relationships between samples. DR models derived based on this graph thus fail to characterize the real process or to give reliable monitoring results. Besides, the parameter selection of the adjacency graph (K or ε) is an open problem, which can only be set according to experience. Once the parameter has been set, all the samples in the adjacency graph will have a fixed size neighborhood. But in reality, the process data may have diverse probability density in different areas of the feature space. The adjacency graph thus needs to offer a datam-adaptive neighborhood. All these drawbacks somewhat limit the application of manifold-learning methods in process monitoring. Sparse representation (SR) is a recently proposed statistical modeling tool and has shown to be very powerful in various application areas [34,35]. However, SR has not received enough attention in the field of process monitoring. To our best knowledge, there has

been little work applying SR to model industrial processes. SR can select a subset of the training data to most compactly express a given data point, reconstructing this point by a linear combination of this subset plus an error term by solving an L1-norm minimization problem. After modeling the training data set through SR, a sparse weighted adjacency graph and the error-free reconstructed data can be obtained at the same time [36]. SR explicitly penalizes the dense Gaussian error through L2-norm and sparse large magnitude outliers through L1-norm [35]. As a result, the obtained adjacency graph is robust to outliers and noise, thus can better reflect the delicate relationships among training samples compared with traditional Euclidean distance based graph [37,38]. For process data, this relationship among samples not only represents the geometric topological structure of the data, but also reflects the dynamic characteristics of the data (auto-correlation). In other words, the graph constructed from SR can encode much more useful and meaningful information about the dynamic characteristics and geometric manifold structure of the process data. Besides, SR graph has a datam-adaptive neighborhood structure, which is automatically determined and needs not to specify a fixed neighborhood parameter beforehand. Lastly, the noise-removed data can be further utilized for robust process modeling and monitoring. Inspired by the work of NPE which linearized the classic LLE algorithm, this paper proposes a novel robust DR technique, named sparse representation preserving embedding (SRPE), based on the sparse reconstructive weights and noise-removed data recovered from robust sparse representation. Compared with traditional PCA and manifold learning methods, SRPE has the following advantages. Firstly, SRPE performs DR in a clean data space, making it much more robust to noise and outliers. Secondly, SRPE tries to preserve the SR graph which encodes rich process dynamics and local geometric structure of the process data in the projected latent subspace, making it much more efficient in detecting faults associated with abnormal dynamic characteristics. Finally, SRPE is also a linear DR method, monitoring statistics T2 and SPE can be easily extended from PCA. The rest of this paper is organized as follows. First, some preliminary knowledge on NPE is briefly introduced. Then, the SRPE algorithm and the corresponding novel process monitoring scheme are detailed. To illustrate the effectiveness and superiority of the proposed SRPE process monitoring method, extensive comparative experiments are conducted on a dynamic numerical example and the Tennessee Eastman (TE) benchmark problem. Conclusions are drawn in the end. 2. Preliminary knowledge on neighborhood preserving embedding NPE is a linear approximation algorithm to the classic LLE manifold learning method. In the NPE model, a training sample is first reconstructed by a linear combination of a few of its nearest neighbors. The objective of NPE is to preserve this reconstruction relationship in the projected low dimensional subspace. Because of its elegancy and effectiveness, NPE has attracted attention in both the computer vision and process monitoring communities. Suppose the training data set X = {x1 , x2 , . . ., xn } ⊆ Rm consists of n sample vectors in an m-dimensional real-valued feature space. NPE aims at learning a linear projection matrix A ∈ Rm×p (p < m), where p is the dimension of the latent subspace. The NPE algorithm procedure can be summarized as follows. (1) Constructing the adjacency graph: The K nearest neighbors or the ε-ball method can be adopted for graph construction: Two samples are linked if one is among the other’s K nearest neighbors or one is within the other’s Euclidean distance ε-ball.

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

(2) Computing the graph weights: Let W denote the weight matrix, whose element Wi,j represents the reconstruction coefficient of node i from node j. Wi,j is nonzero if and only if node i link with node j. The weights can be computed by minimizing the following sum of squares Euclidean distances (i1 , . . ., ik are determined based on the adjacency graph):

 2  ik n      xi −  min W x i,j j   W  j=i1 i=1  s.t.

ik 

Wi,j = 1,

(1)

C,E

(3) Computing the projection: The projection matrix A can be computed by solving the following objective function:

min

 2  ik n     T  A xi − AT  W x i,j j    i=1  j=i1

(2)

The column vector of A can be obtained by solving the following generalized eigenvector problem, the details of the theoretical justification can be found in the reference paper [22]: (3)

M = (I − W )T (I − W )

NPE has already been successfully applied to process monitoring with some success. T2 and SPE statistics can be derived in the same way as PCA. The details can be found in the reference paper [28]. 3. Methodology 3.1. Robust sparse representation SR theory originally stems from signal compression and representation. Consider the problem of learning sparse linear representations with respect to an overcomplete dictionary, in the case of process monitoring this dictionary can be simply chosen as the training data set. The basic rationale behind SR is that if the optimal representation is sufficiently sparse, then it can be efficiently recovered through convex optimization. For a dynamic process, each sample point is dynamically correlated with its nearby points in the time sequence, making the optimal representation sparse in nature. As a result, SR fits in dynamic process monitoring just well. In the following, the details of SR theory will be given. Given a set of data vectors X = {x1 , x2 , . . ., xn } ⊆ Rm , SR represents each xi by a small subset of other points in X the most



T

be the compactly. Let ci = ci,1 , ci,2 , . . ., ci,i−1 , 0, ci,i+1 , . . ., ci,n n-dimensional coefficient vector for sample xi , whose element ci,j (i = / j) denotes the contribution of xj for reconstructing xi . SR calculates the sparse representation of all the training samples jointly from the following optimization problem: C,E

    C  + E 0 0

diag(C) = 0

eT C = eT where L1-norm denotes the sum of absolute entries of a matrix,  is a positive regularization parameter, E denotes large sparse outlying errors. To further discriminate small dense Gaussian noise from large sparse outliers, the objective can be further adjusted to the following form: min

C,E,Z

      C  + e E + z Z 2 1 F 1 2

(6)

diag(C) = 0 eT C = eT

The column eigenvectors a1 , a2 , . . ., ap corresponding to the bottom p eigenvalues 1 ≤ 2 ≤ · · · ≤ p form the projection matrix A.

s.t. X = X · C + E

(5)

diag(C) = 0

s.t. X = X · C + E + Z

XMX T a = XX T a

min

    C  + E 1 1

s.t. X = X · C + E

j=i1

A ∈ Rm×d

trivial solution C = I, where I denotes the identity matrix. The affine constraint eT C = eT , where e ∈ Rn denotes a column vector of all ones, is enforced so that the data need not to be centered. The sparsest representation can be effectively computed by the above objective function. However, the problem above is NP-hard in general as it involves discrete L0-norm. Based on recent development in the theory of compressed sensing, if the solution C is sparse enough, which is indeed the case for process data, the L0-norm minimization problem above can be relaxed by the following L1-norm minimization problem: min

j = 1, 2, . . ., m

121

(4)

eT C = eT where  · 0 is the L0-norm, which is the number of nonzero entries of a matrix. Diagonal entries are forced to be zeros to avoid the

where Z is added to represent small and dense Gaussian noise and Frobenius norm is enforced on Z. e and z are both positive regularization parameters to balance the three terms in the objective function, which can be determined by the following formula:





z =

˛z , z

˛z > 1,

z = minmax xTi xj 

e =

˛e , e

˛e > 1,

e = minmaxxj 

i

i

j= / i

j= / i

  1

(7) (8)

The proof of the above formula can be found in the reference paper [35]. Note that the above optimization program is convex with respect to the optimization variables (C, E, Z), thus can be efficiently solved using convex programing tools like CVX [39] or alternating direction method of multipliers (ADMM) [40]. 3.2. Sparse representation preserving embedding Although SR is suitable to model the process data, this optimal representation is only defined on the training data set, thus cannot be directly applied to process monitoring. In this section, we propose to extend SR algorithm with manifold learning techniques and call this method the sparse representation preserving embedding. In the proposed SRPE algorithm, SR program is first conducted on the whole training data set X. After the sparse representation C = [c1 , c2 , . . ., cn ] ∈ Rn×n , where ci is the sparse coefficient for sample xi , has been obtained through the convex program, a sparse weighted adjacency graph W can be constructed using the sparse weights C, where Wi,j = Ci,j . This underlying sparse graph has many good properties compared with traditional Euclidean distance based nearest neighbor graph. Firstly, SR-based graph has a datam-adaptive neighborhood. The sparsity is automatically determined by solving the SR problem rather than a specified fixed neighborhood parameter. Secondly, SR-based graph is robust to noise and outliers thus can better reveal the dynamic characteristics and manifold structure hidden in the process data. Aside from the SR-graph, the noise-removed data can also be obtained from X* = X · C. The error-free data X* has already removed all the Gaussian noises (Z) and outliers (E) from the original data set X. SRPE aims

122

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

NPE 8

3

6

2

4

1

2

PC2

PC2

SRPE 4

0

0

−1

−2 Normal Outlier

−2

Normal Outlier

−4

2

2

T 99% −3 −3

−2

−1

0

1

2

3

4

5

T 99% −6 −3

6

−2

−1

PC1

PC1

(a)

(b)

PCA

1

2

8

3

6

2

4

1

2

0 −1

0 −2

−2

−4 Normal Outlier

−3

Normal Outlier

−6

2

T2 99%

T 99% −4 −5

3

DPCA

4

PC2

PC2

0

0

5

−8 −8

−6

−4

−2

0

2

PC1

PC1

(c)

(d)

4

6

8

10

Fig. 1. 2D projection of 500 numerical dynamic process test data: (a) SRPE, (b) NPE, (c) PCA, (d) DPCA. Each blue circle in these figures represents a normal sample in the test data set, while each red asterisk represents a faulty sample. The blue line represents the T2 control limit of significance level 99%. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Table 1 Monitoring variables in the Tennessee Eastman process. No.

Measured variables

No.

Measured variables

No.

Manipulated variables

1 2 3 4 5 6 7 8 9 10 11

A feed D feed E feed Total feed Recycle flow Reactor feed rate Reactor pressure Reactor level Reactor temperature Purge rate Product separator temperature

12 13 14 15 16 17 18 19 20 21 22

Product separator level Product separator pressure Product separator underflow Stripper level Stripper pressure Stripper underflow Stripper temperature Stripper steam flow Compressor work Reactor cooling water outlet temperature Separator cooling water outlet temperature

23 24 25 26 27 28 29 30 31 32 33

D feed flow valve E feed flow valve A feed flow valve Total feed flow valve Compressor recycle valve Purge valve Separator pot liquid flow valve Stripper liquid product flow valve Stripper steam valve Reactor cooling water flow Condenser cooling water flow

at computing the projection matrix A ∈ Rm×p (p < m) by preserving the SR-based graph in the error-free clean data space X* . Inspired by the work of NPE, the SRPE objective function has the following form:

(A) JSRPE

n   T ∗  A xi − AT X ∗ · ci 2 = min m×p A∈R

projection is much more sensible and robust. The above objective can be further reduced by simple algebraic formulations: (A)

JSRPE = min

A ∈ Rm×p

(9)

i=1

N   T ∗  A xi − AT X ∗ · ci  i=1

= min AT (XC)(I − C)T (I − C)(XC)T A A ∈ Rm×p

(10)

= min AT XM ∗ X T A A ∈ Rm×p

where xi * and X* represent the noise-removed data obtained from SR. According to the objective function above, the delicate relationships among data samples in the error-free clean data space is preserved in the projected latent low dimensional subspace. In error-free clean data space, the dynamic relationships and local geometrical structure can be better modeled, thus the computed

M ∗ = C(I − C)T (I − C)C T In order to remove arbitrary scaling factor in the projection, an additional constraint is enforced as follows: AT (XC)(XC)T A = 1

(11)

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

123

SRPE

NPE

20

10

15 5 10 0

PC2

PC2

5 0

−5

−5 −10

−10

Normal Outlier

−15

Normal Outlier T2 99%

2

−20 −30

T 99% −25

−20

−15

−10

−5

0

5

−15 −12

10

−10

−8

−6

−4

PC1

−2

0

2

4

6

PC1

(a)

(b)

PCA

DPCA

10

8

8

6 4

6

2

4

PC2

PC2

0 2 0

−2 −4

−2

−6

−4

−8

Normal Outlier

−6

Normal Outlier

−10

2

T2 99%

T 99% −8 −15

−10

−5

0

5

10

15

−12 −6

−4

−2

PC1

0

2

4

6

8

PC1

(c)

(d)

Fig. 2. Projection results of the 500 corrupted training data onto the 2-D latent subspace: (a) SRPE, (b) NPE, (c) PCA, (d) DPCA. In each of the 4 sub-figures, a green circle represents a normal training sample, while a red asterisk represents an outlier (10 in all) in the training set. The blue line represents the T2 control limit of significance level 99%. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

It is easy to check that the matrices XM* XT and (XC)(XC)T are symmetric and positive semi-definite. The above problem can be finally reduced to the following form: min

AT XCC T X T A=1

AT XM ∗ X T A

X = Xˆ + X˜ = YBT + Error Y = XB(BT B)

(13)

where ai corresponds to the eigenvector of the bottom ith eigenvalue. For a new test sample xnew ∈ Rm , the p-dimensional projection ynew can be computed by: ynew = xnew · A

(15)

(12)

The solution to the above reduced problem A = [a1 , a2 , . . ., ap ] ∈ Rm×p can be efficiently solved by a generalized eigenvalue problem below: XM ∗ X T a = XCC T X T a

low dimensional coordinates are denoted as Y ∈ Rn×p . The relationships between them can be formulated as follows:

(14)

3.3. Process monitoring based on SRPE As SRPE is a linear DR method like PCA, monitoring statistics T2 and SPE can be easily derived in a PCA style. Process monitoring based on SRPE can be divided into two phases: The first phase is the off-line modeling phase, where normal training data are collected and the SRPE model is trained on it. The second phase is the on-line monitoring phase. The monitoring scheme will be detailed in this section. Let X ∈ Rn×m denote the training data set of n process samples with m measurements. SRPE is first performed on the training data X, and the dimension of the latent subspace is chosen as p ≤ m. The

−1

= XA

(16) −1

where B ∈ Rm×p and A = B(BT B) ∈ Rm×p which can be computed ˜ through SRPE. X represents the residual space while Xˆ represents the principle space where the most process dynamics and manifold structure are preserved. For a new test sample xnew the projection into the two subspaces can be obtained in the following way: ˆnew + x ˜new = ynew BT + (xnew − ynew BT ) xnew = x = xnew A(AT A)

−1 T

A + xnew (I − A(AT A)

−1 T

A )

(17)

Based on the SRPE model, T2 and SPE statistics, which aim to quantify the variations within the model and variations outside the model, are utilized for fault detection. The T2 statistics can be defined as follows: T 2 = yT −1 y = xT AT (cov(Y ))−1 Ax

(18)

where y denotes the projection of the original sample x in the SRPE principle subspace, which can be calculated by y = x · A.  is the covariance matrix of the projection matrix Y, which can be computed by  = YT Y/(n − 1). Under the assumption that in the SRPE principle subspace the projections are Gaussian distributed,

124

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

NPE 4

4

3

3

2

2

1

PC2

PC2

SRPE 5

1 0

0 −1

−1

−2 Normal Outlier

−2

Normal Outlier

−3

T2 99% −3 −2

−1

0

1

2

3

T2 99% −4 −4

4

−3

−2

−1

PC1

0

1

2

3

4

PC1

(a)

(b)

PCA

DPCA

4

5 4

3

3 2 2 1

PC2

PC2

1 0

0 −1

−1

−2 −2 −3

Normal Outlier

−3

Normal Outlier

−4

2

2

T 99% −4 −4

−3

−2

−1

0

1

2

3

T 99% 4

−5 −6

−4

−2

0

PC1

2

4

6

PC1

(c)

(d)

Fig. 3. 2-D projection of test data under training data corruption: (a) SRPE, (b) NPE, (c) PCA, (d) DPCA. Each blue circle in these figures represents a normal sample in the test data set, while each red asterisk represents a faulty sample. The blue line represents the T2 control limit of significance level 99%. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

the control limits for T2 can be approximated by means of an F-distribution: 2 Tlim =

p(n − 1) Fp,n−p;˛ n−p

(19)

where the parameter ˛ denotes the significance level. In the residual subspace, the SPE statistics are applied to measure how well a test sample conforms to the SRPE model. By reconstructing x from the projection y in the SRPE principle subspace, SPE is defined as follows:



2

 

ˆ = x(I − A(AT A) SPE = x − x

−1 T

2 

A )

(20)

According to the Gaussian assumption, the SPE statistic follows a weighted chi-square distribution. Thus the control limit for SPE can be determined from the 2 distribution function as follows: SPElim = g · 2h,˛ ,

g=

v 2m

,

h=

2m2

v

illustrates that SRPE is very effective in discriminating the normal operating condition from dynamic faulty conditions. To further demonstrate the ability of SRPE in dealing with outliers, an additional experiment is conducted, where 10 large magnitude outliers are added to the training data set. Then, the Tennessee Eastman (TE) benchmark problem is used to exhibit the fault detection ability of SRPE method. An additional experiment where faulty training samples are added to the training data set is also conducted to show the SRPE’s ability in dealing with outliers. In all the experiments, the performance of SRPE is compared with NPE, PCA and DPCA methods.

(21)

4.1. Numerical example The dynamic multivariate numerical system used in this section is first suggested by Ku et al. [8]. It is described by the following state-space functions:

where m and v denote the mean and variance of the training data set SPE statistics and ˛ denotes the significance level.

z(t) = Az(t − 1) + Bu(t − 1)

4. Case study

where A and B are constant matrices as follows:

In this section, two case studies are conducted to evaluate the performance of the proposed SRPE method. Firstly, a multivariate dynamic numerical system is adopted to show the superiority of SRPE process monitoring scheme in dealing with dynamic fault detection. The 2-D feature extraction in the latent subspace

(22)

y(t) = z(t − 1) + v(t − 1)

 A=

0.188 −0.191 0.847

0.264



 ,

B=

1

2

3 −4

 (23)

In the state-space functions above, u(t), y(t), z(t) represents inputs, outputs and state variables respectively. v(t) is random noise

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

125

Fig. 4. Diagram of the Tennessee Eastman process.

Table 2 TE process programed faults (IDV1–IDV21). Fault

Description

Type

1

A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed temperature (stream 2) Reactor cooling water inlet temperature Condenser cooling water inlet temperature (stream 2) A feed loss (stream 1) C header pressure loss reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) Reactor cooling water inlet temperature Condenser cooling water inlet temperature Reaction kinetics Reactor cooling water valve Condenser cooling water valve Unknown Valve (stream 4)

Step

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16–20 21

where w(t) is a N(0, 1) distributed white noise. The variables used for process monitoring are chosen as x(t) = [y(t), u(t)]. First, 500 normal samples are collected to constitute the training data set and the four methods, namely SRPE, NPE, PCA, DPCA, are all performed on it. The number of nearest neighbors K for NPE is chosen as 4 by experience. DPCA time lag is chosen as 3. The regularization parameter ˛e and ˛z are both loosely set as 14. For comparison purposes, a test data set with faults is also generated. The test data set consists of 500 samples, the first 250 samples being normal and a dynamic fault introduced from the 251st sample. This fault only influences the dynamic relationship of the state space function and is characterized by a slight change in the constant matrix A:

Step Step Step Step Step Step Random variation Random variation Random variation Random variation Random variation Slow drift Sticking Sticking Unknown Constant position

following the Gaussian distribution N(0, 0.1). The input variables u(t) satisfies the following dynamic relationship: u(t) = C · u(t − 1) + D · w(t − 1)



C=

 D=

0.811 −0.226 0.477 0.193



0.415 0.689

−0.320 −0.749



(24)

 A Fault =

0.18 −0.1 0.8

0.5

 (25)

Although this fault causes no significant changes in the distribution of process data in the feature space, it does influence the dynamical characteristics of the process variables, which can be effective captured by the SRPE principle subspace. To illustrate the superiority of SRPE in capturing dynamic abnormality of the process, the dimension of the principle subspaces of all the four methods are chosen as 2. All four methods thus project the original 500 four-dimensional process test data onto a latent 2-D plane. The T2 statistics control limits corresponding to the 99% significance level are also plotted. The results are shown in Fig. 1. In Fig. 1, the blue circles represent the first 250 normal samples while the red asterisks represent the last 250 faulty samples. The blue line is the corresponding 99% T2 control limit. As Fig. 1 shows, the SRPE method can discriminate normal from faulty samples much better than the other three methods. In the projected 2-D SRPE principle subspace, the overlapping area

126

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

Fig. 5. TE process monitoring results of fault 10: (a) SRPE, (b) NPE, (c) PCA, (d) DPCA.

between normal data and faulty data is much smaller than the other methods. Besides, much fewer faulty samples are located within the T2 control limit. This is because SRPE algorithm can extract much more useful information of the process dynamics in the principle subspace, making SRPE especially suitable to detect dynamic faults of the underlying process. To further demonstrate the robustness of the SRPE method in the presence of outliers, 10 generated outliers are randomly mixed into the 500 training samples (2%). These outliers have some of their measurements faraway deviating from the normal operating region, which are very common from instrument failures and experiment errors in real industry. All the four methods are retrained on this corrupted data set, and the projection of the 500 training data onto the underlying 2-D latent subspace is given in Fig. 2.

In this experiment, the parameter setting is the same as the uncorrupted case above. In Fig. 2, green circles represent the 490 normal training data while red asterisks represent the 10 outliers mixed into the training data set. The blue line is the 99% T2 control limit. As can be spotted, the SRPE T2 control limit is almost not affected by outliers at all, excluding all ten outliers correctly. In the PCA and DPCA cases, outliers are even mixed with the normal data, indicating severe model distortion. Besides, the T2 control limits of PCA and DPCA are also severely affected, covering large areas of possible faulty samples. In the NPE case, although outliers are separated from normal data, the T2 control limit is apparently much looser than reality, causing a potential low fault detection rate. Based on these models trained from the corrupted training data, tests are conducted and the corresponding results are given in Fig. 3.

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

127

Fig. 6. TE process monitoring results of fault 16: (a) SRPE, (b) NPE, (c) PCA, (d) DPCA.

From the results in Fig. 3, it can be seen that SRPE is much more robust in the presence of outliers. The projection matrix and the underlying latent SRPE principle subspace accurately capture the dynamic characteristics in the presence of outliers in the training set. At the same time, the control limits of the SRPE method is also not affected by the outliers while the other three methods are severely affected. In summary, SRPE performs much better in dynamic process monitoring in the presence of outliers than the other traditional methods. This superiority will be further illustrated through the following case study of the TE process. 4.2. Tennessee Eastman benchmark problem The Tennessee Eastman (TE) process model is a well-known realistic simulation program of a complex chemical plant in Tennessee Eastman Chemical Company, which is widely used

as a benchmark for evaluating and comparing the effectiveness of different monitoring techniques. The prototype problem was originally presented by Downs and Vogel [41]. This benchmark process consists of five unit operations, namely a reactor, a partial condenser, a recycle compressor, a stripper and a vapor/liquid separator. The schematic diagram of the process is illustrated in Fig. 4. The process involves a total of 8 components including two desired products (G and H), which are produced by two simultaneous gas/liquid exothermic reactions, and a byproduct F, which is produced by two additional reactions, from four reactants A, C, D and E. There is also an inert B in the process. There are in total 53 measurements in the process including 41 process variables (19 discrete composition measurements and 22 continuous process measurements) and 12 manipulated variables (one being constant). In this paper, 33 continuous variables (22 process variables and 11 manipulated variables) are

128

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

Fig. 7. TE process monitoring results of fault 19: (a) SRPE, (b) NPE, (c) PCA, (d) DPCA.

selected for process monitoring, as listed in Table 1. There are 21 programed faults in TE process, and the details are given in Table 2. The process variables in TE are nonlinear and highly crosscorrelated and auto-correlated. Various monitoring techniques have been proposed to tackle the problem. To construct the normal process monitoring model, 500 training samples are collected. For each of the 21 programed faults, a test data set consisting of 960 samples is generated, where faults are introduced from the 161st sample. All the process variables are collected at a sampling interval of 3 minutes. As in the numerical example before, NPE, PCA and DPCA methods are also conducted for a comparison. We have also conducted an additional experiment of the one-class support vector machine (OCSVM) method. The parameter setting for TE process is as follows: The principle component (PC) number is determined by the cumulative

percentage of variance rule to explain 80% of the data variance. The PCA retains 13 PC and DPCA retains 16 PC with a time-lag of 3. For the SRPE and NPE methods, the PC numbers are both chosen as 16 for comparison purposes. The neighborhood parameter K of NPE is set as 14 while the regularization parameter ˛e and ˛z are both set as 6. The OCSVM parameters  and  are set as 0.01 and 2−8 . All five methods share the same significance level of 99%. The monitoring performances of the 21 faults are given in detail in Table 3. As can be seen from Table 3, faults 1, 2, 6, 7, 8, 13 and 14 are easy to detect and all five methods give similarly good performances. However faults 5, 10, 16, 19 and 20 are difficult to detect for the other three DR methods, while SRPE gives much better performance. These faults are associated with dynamic characteristic changes and SRPE can effectively capture this dynamic abnormality in its principle subspace. For SRPE, the T2 statistic is

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

129

Fig. 8. TE process monitoring results of fault 20: (a) SRPE, (b) NPE, (c) PCA, (d) DPCA. Corrupted Data Distribution Normal Outlier IDV1 Outlier IDV2 Outlier IDV6 Outlier IDV7 Outlier IDV8

60 40

PC3

more sensitive than the SPE statistic in fault detection. This indicates that useful and meaning process information is effectively captured in the SRPE principle subspace. For faults 3, 9, 15, as the process variables are not significantly influenced by the faults, all five methods perform poorly and SRPE gives a slightly better monitoring results. The detailed monitoring results of faults 10, 16, 19 and 20 are illustrated in Figs. 5–8. As the results are shown in these figures, the superiority of the proposed SRPE algorithm is once again proved. In all these four faults which are difficult to detect for the other three PCA-based methods, SRPE gives much higher fault detection rates and much faster fault detection speeds, while maintaining a lower false alarm rate at the same time. Here, we select fault 10 for a comparative study. As shown in Fig. 5, only the T2 statistic of SRPE is able to promptly detect this fault when its introduced from the 161st sample. All the other methods detect this fault with much longer time latency. Besides, the fault detection rates of all the other methods are much lower than the T2 of SRPE method. This implies that the T2 statistic of

20 0 −20 −40 60

−20 40

0 20

20 0

40 60

PC1

−20

PC2

Fig. 9. PCA dimension reduction result of the corrupted training set: Blue dots represent normal training data. Different red markers represent outliers in the training data set coming from 5 different faulty training data sets. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

130

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

Table 3 TE process monitoring results. (The leading fault detection rates are marked in boldface in each row.) Fault

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 FAR

SRPE

NPE

PCA

DPCA

OCSVM

T2

SPE

T2

SPE

T2

SPE

T2

SPE

100.00 99.00 9.63 100.00 100.00 100.00 100.00 97.88 6.50 91.00 76.63 99.88 95.50 100.00 18.75 94.13 97.25 90.63 92.63 91.50 68.13 2.57

99.25 98.25 16.13 44.75 35.88 99.63 100.00 98.50 13.25 56.50 55.75 99.38 94.38 99.63 21.00 46.88 82.13 90.50 9.88 53.63 43.00 6.40

99.63 98.63 5.88 59.88 99.88 100.00 100.00 97.63 6.25 51.00 60.50 99.13 94.63 100.00 9.00 36.00 86.75 90.13 17.25 52.63 51.00 2.56

99.75 98.38 11.63 94.75 31.13 100.00 100.00 98.13 7.75 54.38 67.25 98.63 94.50 100.00 15.25 45.38 90.75 91.13 10.63 57.25 49.25 3.78

99.25 98.38 6.50 32.75 28.88 99.38 100.00 97.12 4.62 46.50 49.25 98.62 94.13 99.50 8.13 31.37 80.87 89.88 8.25 43.25 38.75 1.96

100.00 98.88 5.37 100.00 28.13 100.00 100.00 95.13 5.50 45.00 77.25 96.63 95.25 100.00 7.88 44.50 95.63 90.38 38.25 58.37 56.37 3.07

99.13 98.13 0.25 3.13 22.50 99.13 45.50 97.13 0.63 36.25 19.25 98.63 94.00 93.88 2.63 17.13 76.25 88.38 0.25 33.88 31.75 0.36

99.63 99.25 4.25 99.63 32.50 99.63 99.63 97.88 4.13 51.63 90.25 98.38 95.13 99.63 7.63 48.00 96.75 90.50 76.88 64.63 54.88 2.74

99.80 98.60 6.75 99.60 100.00 100.00 100.00 97.90 9.13 87.62 69.81 99.80 95.50 100.00 15.60 89.80 95.30 90.00 83.9 90.00 52.80 5.23

Table 4 TE process monitoring results under corrupted training data set. (The leading fault detection rates are marked in boldface in each row.) Fault

SRPE 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 FAR

NPE 2

PCA 2

DPCA 2

OCSVM

T

SPE

T

SPE

T

SPE

T

SPE

100.00 97.50 9.00 100.00 100.00 100.00 100.00 98.00 6.25 87.123 72.88 99.88 95.88 100.00 8.63 93.38 97.13 91.13 93.25 88.13 51.00 3.33

5.13 16.13 0 0 0 90.38 1.00 0 0 0 0 9.63 16.88 0 0 0 32.38 80.25 0 0 0 0

95.25 95.13 0 0.25 11.50 98.75 16.38 35.00 0.13 0.25 36.38 56.38 59.63 100.00 0.13 0.75 82.25 86.88 6.25 19.88 28.25 0

0 0 0 0 0 82.75 0 0 0 0 0 0.50 0 0 0 0 33.63 77.50 0 0 0 0

97.75 95.00 0 0.13 12.25 96.13 22.50 59.25 0.13 1.00 18.38 65.63 75.38 90.13 0 0.13 68.38 86.75 5.88 1.13 4.75 0.03

1.38 0.13 0 0.13 1.50 94.63 26.13 8.88 0 0.25 21.75 59.75 63.25 100.00 0 0.13 82.00 88.13 0 3.13 26.00 0

95.13 94.25 0 0 9.25 93.50 21.88 43.38 0 1.63 0.50 60.00 74.38 71.50 0 0 58.13 86.63 0 0.50 0 0.03

97.13 94.63 0 0.13 6.13 98.63 22.25 44.75 0 0 27.63 67.63 65.88 99.63 0 0 80.13 88.25 0 1.25 18.38 0.18

SRPE is very sensitive to faulty behavior in the process variables. This is because the rich process information is effectively captured by the SRPE principle subspace, while for the other three methods this information is lost, causing detection latency and low fault detection rates. To further illustrate the robustness of SRPE to outliers, an additional experiment is conducted. Note that for each of the 21 programed faults (IDV1–IDV21), there is a training data set consisting of 480 faulty samples respectively. We observe that faults 1, 2, 5, 7, 8 will cause the process variables deviate greatly from the normal operating region, resembling the behavior of outliers.

14.37 63.12 0 0.13 2.13 94.87 18.75 21.12 0 0 7.75 43.25 68.13 94.00 0 0 67.75 87.62 0 0.13 2.38 0

Thus, we randomly replace 50 samples from the 500 normal training data with 10 faulty samples chosen randomly from each of the 5 above faulty training data set. These added faulty samples constitute 10% of all the training samples and are placed randomly in the time sequence, imitating the presence of outliers in the historical database. The distribution of the 450 normal samples and 50 outliers are illustrated in the PCA 3-dimensional subspace as Fig. 9 shows. The blue markers represent normal data, while the red markers represent 5 different kinds of outliers. The parameter setting is the same as the previous uncorrupted case, except for the OCSVM parameters , which is re-tuned

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

131

Fig. 10. TE process monitoring results of fault 4 under training data set corruption: (a) SRPE, (b) NPE, (c) PCA, (d) DPCA.

to be 0.10. The monitoring performance of the five methods are compared in Table 4. It can be seen that the outliers make the PCA, DPCA, NPE and OCSVM methods deteriorate greatly, causing very low fault detection rates. However, for the SRPE method, the T2 statistic is still able to detect faults at high rates, showing few negative influences from the outliers. This robustness is achieved by performing DR computation in the error-free reconstructed clean data space, where the effects of outliers are already eliminated. Note that the error-free data are automatically determined by solving the SR program along with the sparse weights. This SR program is highly efficiently solved through convex optimization, which will introduce no additional workload.

The fault 4 and fault 10 are selected for a comparative study. In the previous uncorrupted experiment, fault 4 is easy to detect for all the four methods. However, when outliers are present in the training data set, which is inevitable in real industrial processes, the monitoring performance of the other three methods quickly deteriorate as shown in Fig. 10. While for the SRPE T2 statistic, the fault detection speed and accuracy remain very high. For fault 10, which is originally difficult to detect, the performance of the other three methods remain very poor as shown in Fig. 11, while the SRPE T2 statistic still has a big advantage in fault detection rate and speed. This extended experiment under training data set contamination confirms the robustness of the proposed SRPE method, showing promising application values in real industrial process

132

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133

Fig. 11. TE process monitoring results of fault 10 under training data set corruption: (a) SRPE, (b) NPE, (c) PCA, (d) DPCA.

monitoring, where outliers are very common in the historical database.

the SRPE method was much more robust in the presence of outliers because of the data cleansing effects from SR. Through the two extensive case studies, the superiority of the SRPE method was fully demonstrated.

5. Conclusions References This paper proposed a novel linear unsupervised dimension reduction method called SRPE and applied it to dynamic process monitoring. The proposed method was based on the sparse reconstruction weights and error-free clean data, which were simultaneously obtained by solving a convex optimization problem. SRPE aimed at preserving the sparse weights in finding projection directions onto a latent low dimensional subspace. As the delicate relationships among process data were better revealed in the SR weighted graph and error-free data space, the SRPE method was able to capture much more meaningful and useful process information in the SRPE principle subspace, thus resulting in early detection of faults and high fault detection rates. Besides,

[1] Z.Q. Ge, Z.H. Song, F.R. Gao, Review of recent research on data-based process monitoring, Ind. Eng. Chem. Res. 52 (2013) 3543–3562. [2] S.J. Qin, Statistical process monitoring: basics and beyond, J. Chemometr. 17 (2003) 480–502. [3] S.J. Qin, Survey on data-driven industrial process monitoring and diagnosis, Annu. Rev. Control 36 (2012) 220–234. [4] S. Yin, S.X. Ding, A. Haghani, H.Y. Hao, P. Zhang, A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process, J. Process Control 22 (2012) 1567–1581. [5] Y.W. Zhang, S. Li, Z.Y. Hu, C.H. Song, Dynamical process monitoring using dynamical hierarchical kernel partial least squares, Chemometr. Intell. Lab. Syst. 118 (2012) 150–158. [6] Z.B. Zhu, Z.H. Song, A novel fault diagnosis system using pattern classification on kernel FDA subspace, Expert Syst. Appl. 38 (2011) 6895–6905.

Z. Xiao et al. / Journal of Process Control 40 (2016) 119–133 [7] Z.Q. Ge, C.J. Yang, Z.H. Song, Improved kernel PCA-based monitoring approach for nonlinear processes, Chem. Eng. Sci. 64 (2009) 2245–2255. [8] W.F. Ku, R.H. Storer, C. Georgakis, Disturbance detection and isolation by dynamic principal component analysis, Chemometr. Intell. Lab. Syst. 30 (1995) 179–196. [9] U. Kruger, Y.Q. Zhou, G.W. Irwin, Improved principal component monitoring of large-scale processes, J. Process Control 14 (2004) 879–888. [10] S.X. Ding, Data-driven design of monitoring and diagnosis systems for dynamic processes: a review of subspace technique based schemes and some recent results, J. Process Control 24 (2014) 431–449. [11] U. Schubert, U. Kruger, H. Arellano-Garcia, T.D. Feital, G. Wozny, Unified modelbased fault diagnosis for three industrial application studies, Control Eng. Pract. 19 (2011) 479–490. [12] C. Lee, S.W. Choi, I.B. Lee, Variable reconstruction and sensor fault identification using canonical variate analysis, J. Process Control 16 (2006) 747–761. [13] A. Simoglou, E.B. Martin, A.J. Morris, Dynamic multivariate statistical process control using partial least squares and canonical variate analysis, Comput. Chem. Eng. 23 (1999) S277–S280. [14] R.J. Treasure, U. Kruger, J.E. Cooper, Dynamic multivariate statistical process control using subspace identification, J. Process Control 14 (2004) 279–292. [15] S.F. Moller, J. von Frese, R. Bro, Robust methods for multivariate data analysis, J. Chemometr. 19 (2005) 549–563. [16] M. Daszykowski, K. Kaczmarek, Y.V. Heyden, B. Walczak, Robust statistics in data analysis – a review basic concepts, Chemometr. Intell. Lab. Syst. 85 (2007) 203–219. [17] N. Kwak, Principal component analysis based on L1-norm maximization, IEEE Trans. Pattern Anal. Mach. Intell. 30 (2008) 1672–1680. [18] J.B. Tenenbaum, V. de Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (2000) 2319. [19] L.K. Saul, S.T. Roweis, Think globally, fit locally: unsupervised learning of low dimensional manifolds, J. Mach. Learn. Res. 4 (2004) 119–155. [20] S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (2000) 2323. [21] M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in: Advances in Neural Information Processing Systems 14, vols. 1 and 2, 2002, pp. 585–591. [22] X.F. He, D. Cai, S.C. Yan, H.J. Zhang, Neighborhood preserving embedding, in: Tenth IEEE International Conference on Computer Vision, vols. 1 and 2, Proceedings, 2005, pp. 1208–1213. [23] X.F. He, P. Niyogi, Locality preserving projections, Adv. Neural Inform. Process. Syst. 16 (16) (2004) 153–160.

133

[24] X.F. He, S.C. Yan, Y.X. Hu, P. Niyogi, H.J. Zhang, Face recognition using Laplacian faces, IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 328–340. [25] D. Cai, X.F. He, J.W. Han, H.J. Zhang, Orthogonal Laplacian faces for face recognition, IEEE Trans. Image Process. 15 (2006) 3608–3614. [26] X. He, D. Cai, P. Niyogi, Tensor Subspace Analysis, 2005. [27] D. Cai, X. He, J. Han, Isometric Projection, 2007. [28] K.L. Hu, J.Q. Yuan, Statistical monitoring of fed-batch process using dynamic multiway neighborhood preserving embedding, Chemometr. Intell. Lab. Syst. 90 (2008) 195–203. [29] K.L. Hu, J.Q. Yuan, Multivariate statistical process control based on multiway locality preserving projections, J. Process Control 18 (2008) 797–807. [30] A.M. Miao, Z.Q. Ge, Z.H. Song, L. Zhou, Time neighborhood preserving embedding model and its application for fault detection, Ind. Eng. Chem. Res. 52 (2013) 13717–13729. [31] M.G. Zhang, Z.Q. Ge, Z.H. Song, R.W. Fu, Global–local structure analysis model and its application for fault detection and identification, Ind. Eng. Chem. Res. 50 (2011) 6837–6848. [32] J.B. Yu, Local and global principal component analysis for process monitoring, J. Process Control 22 (2012) 1358–1373. [33] L.J. Luo, S.Y. Bao, Z.L. Gao, J.Q. Yuan, Batch process monitoring with tensor global–local structure analysis, Ind. Eng. Chem. Res. 52 (2013) 18031–18042. [34] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009) 210–227. [35] E. Elhamifar, R. Vidal, Sparse subspace clustering: algorithm, theory, and applications, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2013) 2765–2781. [36] Z. Zhang, S.C. Yan, M.B. Zhao, Pairwise sparsity preserving embedding for unsupervised subspace learning and classification, IEEE Trans. Image Process. 22 (2013) 4640–4651. [37] B. Cheng, J.C. Yang, S.C. Yan, Y. Fu, T.S. Huang, Learning with L(1)-graph for image analysis, IEEE Trans. Image Process. 19 (2010) 858–866. [38] L.S. Qiao, S.C. Chen, X.Y. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recogn. 43 (2010) 331–341. [39] M. Grant, S. Boyd, Y. Ye, CVX: Matlab Software for Disciplined Convex Programming, 2008. [40] S.P. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn. 3 (2011) 1–122. [41] J.J. Downs, E.F. Vogel, A plant-wide industrial-process control problem, Comput. Chem. Eng. 17 (1993) 245–255.