Mechanical Systems and Signal Processing 95 (2017) 158–171
Contents lists available at ScienceDirect
Mechanical Systems and Signal Processing journal homepage: www.elsevier.com/locate/ymssp
Discriminative nonnegative matrix factorization (DNMF) and its application to the fault diagnosis of diesel engine Yang Yongsheng a,b, Ming Anbo c,⇑, Zhang Youyun a, Zhu Yongsheng a a
Department of Mechanical Engineering, Xi’an Jiaotong University, Xi’an 710049, China Department of Computer Engineering, Shaanxi Academy of Governance, Xi’an 710068, China c HighTech Research Institute of Xi’an, Xi’an 710025, China b
a r t i c l e
i n f o
Article history: Received 11 October 2016 Received in revised form 9 January 2017 Accepted 17 March 2017 Available online 1 April 2017 Keywords: Discriminative nonnegative matrix factorization (DNMF) Knearest neighborhood method Timefrequency distribution Diesel engine Fault diagnosis
a b s t r a c t Diesel engines, widely used in engineering, are very important for the running of equipments and their fault diagnosis have attracted much attention. In the past several decades, the image based fault diagnosis methods have provided efficient ways for the diesel engine fault diagnosis. By introducing the class information into the traditional nonnegative matrix factorization (NMF), an improved NMF algorithm named as discriminative NMF (DNMF) was developed and a novel imaged based fault diagnosis method was proposed by the combination of the DNMF and the KNN classifier. Experiments performed on the fault diagnosis of diesel engine were used to validate the efficacy of the proposed method. It is shown that the fault conditions of diesel engine can be efficiently classified by the proposed method using the coefficient matrix obtained by DNMF. Compared with the original NMF (ONMF) and principle component analysis (PCA), the DNMF can represent the class information more efficiently because the class characters of basis matrices obtained by the DNMF are more visible than those in the basis matrices obtained by the ONMF and PCA. Ó 2017 Elsevier Ltd. All rights reserved.
1. Introduction Diesel engines are widely used power sources in engineering and the running conditions are closely related to the performance of the apparatuses. However, owing to the complex structure and harsh working environment, the diesel engines are inclined to break down with high probability and corresponding fault diagnosis technologies have attracted much attention in the past several decades. Many methods have been developed to diagnose the conditions of diesel engines [1–6]. Therein, the vibration based methods have proved to be effective because the acquisition is convenient and nonintrusive and variety information including the reciprocating motions, rotating motions, mechanical impacts and highspeed flow of gas can be involved in the vibration signal. Generally, the traditional vibration analysis methods are performed in either time domain or frequency domain [7,8]. With the development of signal processing, the nonstationary and nonlinear characteristic properties have attracted more and more attention and many timefrequency analysis methods are introduced to the fault diagnosis of diesel engine [5]. Several image features, such as moment invariants, gray statistical characteristics, textural features, and differential boxcounting fractal dimension, are usually selected to describe the characteristics of the timefrequency image and used as the fault features to diagnose the fault of diesel engine [9–11]. However, similar to the statistical index in the time domain, ⇑ Corresponding author. Email address:
[email protected] (A.b. Ming). http://dx.doi.org/10.1016/j.ymssp.2017.03.026 08883270/Ó 2017 Elsevier Ltd. All rights reserved.
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
159
these characteristics are nonfigurative and are inconvenient for the intuitive understanding in the fault diagnosis. Comparatively, the timefrequency image, a twodimension representation, is very convenient for the intuitive understanding of human beings and contains much more information than the statistic index. However, the high dimension representation of image requires much calculation and large memory space and is very difficult for the direct application of the image based method on the fault diagnosis. Fortunately, the development of dimension reduction technology has provided efficient ways for the application of fault diagnosis method using the high dimension index. The most popular dimension reduction methods include principal component analysis (PCA) [12–15], linear discriminative analysis (LDA) [16,17], independent component analysis (ICA) [18–21], manifold learning [22–25] and nonnegative matrix factorization (NMF) [26,27], etc. These methods provide compact lowdimensional representations of original data in different ways for further processes such as learning and visualization. PCA computes a set of orthogonal basis that have a statistic interpretation as the basis corresponds to the largest variance. In 2009, He et al. [15] introduced the PCA technique to extract the principle component representations from the time and frequency domains statistical features of the measured signals and used a mean correlation rule to select the most representative principle components to classify machine fault patterns. Experimental results from an internalcombustion engine sound analysis and an automobile gearbox vibration analysis were used to validate the efficacy of the proposed method. LDA searches for those vectors in the underlying space that best discriminate among classes rather than those that best describe the data. In 2016, Mbo’o and Hameyer [17] developed a diagnostic system based on the current feature evaluated by means of the linear discriminate analysis from the frequency selection in the stator current spectrum. Two types of bearing damages at different loads were distinguished from the healthy bearing. ICA seeks a basis under the assumption that the latent vectors are mutually independent. In 2012, He et al. [19] proposed a new feature extraction method based on ICA and reconstructed phase space for machine fault diagnosis. Experiments in an automobile transmission gearbox were used to validate the effectiveness of the proposed method. Being able to identify lowdimensional nonlinear structure hidden in highdimensional data, manifold learning has emerged in machinery fault diagnosis. In 2013, He [24] proposed a novel timefrequency manifold (TFM) technique for machine fault signature analysis and health diagnosis by combining the nonstationary and nonlinear of timefrequency distribution (TFD) of vibration. Moreover, an efficient lowdimensional feature, called TFM feature had been developed. The effectiveness and merits of TFM feature are confirmed by applications on gear wear diagnosis, bearing defect identification and defect severity evaluation. However, all these methods do not have an obvious visual interpretation because the basis and coefficient of these methods are allowed to be arbitrary sign. Comparatively, only allowing additive, not subtractive combination of the original data, NMF leads to a partsbased representation of data by the usage of negativity constraints and is widely used in different applications such as face recognition, clustering, music enhancement, sparse coding, graph match and biological data mining [27]. However, the original NMF (ONMF) is an unsupervised learning method and the coefficients derived by NMF perform not well for supervised tasks where the data are provided with class labels. In 2011, Cai et al. [28] introduced the Laplacian of neighborhood graph in the regularization term to preserve the locality property of the data. In 2013, Liu et al. [29] proposed a semisupervised NMF approach, namely Constrained NMF (CNMF) by introducing the label information into the main objective function of NMF. In the same year, Li et al. [30] proposed a semisupervised NMF method by using the tricks in GNMF and CNMF. The locality preserving and discrimination of data representation have been increased respectively. However, these semisupervised NMF is the reduction of data points based on the number of available labeled data points and those points that have the same label are represented as a single point. In 2016, Babaee et al. [31] thought that the data points belong to the same class should be very close or aligned on the same axis in the new representation but not to be merged into a single point and developed a new label constrained NMF by coupling discriminative regularizer to the main objective function of NMF. Generally, all these variants of NMF are mainly applied on the face recognition. However, the mechanical fault diagnosis is different from the face recognition owing to the research object is greatly different. Seldom NMF methods have been reported to be well applied on the mechanical fault diagnosis. In this paper, an improved NMF, called discriminative nonnegative matrix factorization (DNMF), has been investigated for the application of the diesel engine fault diagnosis. By using the timefrequency image as the original input features, the proposed method is an image based fault diagnosis method. As a key point, the DNMF plays an important role in the procedure owing to the dimension reduction effect. Inputting the coefficients of DNMF to the Knearest neighbors (KNN) classifier, the proposed fault diagnosis method uses powerful discriminate capability of the DNMF and the concision of the KNN methods. The efficiency of the proposed method is validated by the application on the fault diagnosis of diesel engine. 2. Discriminative nonnegative matrix factorization 2.1. Nonnegative matrix factorization Considering a nonnegative data matrix V nm ¼ fv 1 ; v 2 ; . . . ; v m g 2 Rnm , where v i denotes a feature vector with the þ dimension of n and m denotes the number of samples. Given a reduced dimension r, if the matrix V nm can be approximated km and H 2 Rþ , namely by a product of two nonnegative matrices W 2 Rnk þ
V nm W nr H rm
ð1Þ
160
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
thereby, this equation is called the NMF of matrix V nm . W nr can be considered as a set of basis vectors and is called basis matrix, H nr can be considered as the coordinates of each samples with respect to these basis vectors and is called coefficient matrix. Obviously, when rðm þ nÞ < mn, the new representation of matrix V nm needs less memory space than the original one. From Eq. (1), it can be seen that the nature of NMF is the calculation of matrices W nr and H nr . In Ref. [27], Lee and Seung designed an iteration algorithm based on the maximum likelihood estimation by introducing the hypothesis that all vectors of V nm is statistically independent and disturbed by the Gaussian noise. In this instance, the objective function of NMF is determined by the square of the Frobenius norm of the matrix differences, namely
min
X
kV WHk2F ¼
½v ij ðWHÞij 2
ð2Þ
ij
s:t: W; H P 0 Later, in order to increase the approximation accuracy and prevent the Poisson noise, the objective function is constructed by the KullbackLeilber divergence. Namely
min
DðVkWHÞ ¼
Xh
v v ij log ðWHÞ ij
ij
ij
v ij þ ðWHÞij
i ð3Þ
s:t: W; H P 0 It is wellknown that both objective functions are nonconvex when matrices W nr and H nr vary together. However, the minimization problems are convex with respect to W nr and H nr separately. Therefore, there are many local minimums. In Ref. [27], Lee and Seung presented the multiplicative updated algorithms for these two minimization problems and proved their convergence. The updated rules for the Frobenius norm are given by
wik
wik
ðVH T Þik
hkj
T
ðWHH Þik
hkj
X ðW T VÞkj i
ð4Þ
ðW T WHÞkj
while the update rules for the KullbackLeilber divergence are determined by
P wik
hkj v ij j ðWHÞij
wik P
u hku
P hkj
wik v ij i ðWHÞij
hkj P
ð5Þ
v wv k
2.2. Discriminative nonnegative matrix factorization From the previous section, it can be seen that no class label information is explicitly considered in the ONMF and the coefficient matrix may perform not well for the supervised task where the data are provided with class label. Hence, it is desirable to develop a nonnegative solution for matrix factorization in the supervised setting. In this instance, the DNMF method is investigated by involving the class label information in the objective function. 2.2.1. Acquisition of the basis matrix In order to achieve the discriminative power, the data is factorized by different classes and the basis matrices of different classes are obtained at first. Furthermore, to make the basis matrix separate from each other as much as possible, the orthogonality of different basis matrices are included in the objective function by minimizing the cosine between basis vectors of different classes. In mathematics, given a nonnegative data matrix V ¼ ½V ð1Þ ; V ð2Þ ; . . . V ðLÞ 2 RNM concatenated by L classes þ of nonnegative data. N denotes the dimension of the data and M denotes the total number of the data with L classes. For each class of data, let R denote the reduced dimension and R < N, then the objective function of DNMF is given by
minW ðlÞ ;HðlÞ
L X
1 kV ðlÞ 2
l¼1 ðlÞ
s:t: W ; H where V ðlÞ 2 Rþ
NM l
ðlÞ
R X R XX W ðlÞ H ðlÞ k2F þ a cosðW ðjÞ ð:; r 1 Þ; W ðlÞ ð:; r2 ÞÞ j–l r 1 ¼1r 2 ¼1
ð6Þ
>0
denotes the nonnegative matrix concatenated by the data of lth class, W ðlÞ 2 RNR denotes the discrimþ
inative basis matrix corresponding to the lth class, H ðlÞ 2 Rþ l denotes the coefficient matrix of lth class, ð:; rÞ the rth column of the matrix and a > 0 denotes the regularization parameter. cosðx; yÞ is the cosine of the included angle between vectors x and y. Obviously, the smaller the included angle between vectors x and y, the bigger the cosine value. When these two vectors are orthogonal, the cosine value reaches the minimum of zero. Namely, the minimization of cosine drives the basis vectors to be orthogonal and enhances the discriminative power. Compared with Eq. (1), it can be seen that the class label information is involved in Eq. (6) under the nonnegative constraints. The basis matrices are obtained by the calculation of different class data and their discriminative capability is enhanced by minimizing the cosine value of basis vectors corresponding to different classes. RM
161
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
2.2.2. Acquisition of the coefficient matrix Generally speaking, the basis matrix of NMF is equivalent to the subdictionary of dictionary learning. Therefore, the basis matrix of all different class can be obtained by the concatenating of the basis matrices of different classes. Namely
W ¼ ½W ð1Þ ; W ð1Þ ; . . . ; W ðLÞ
ð7Þ
In this instance, the final basis matrix involves all class information. The coefficient of each sample, as well as the test samples, is calculated with respect to the final basis matrix by nonnegative least squares (NNLS) method, which is popular in the NMF and dictionary learning. The objective function of NNLS method is given by
minx
ky Wxk2F ;
s:t: x P 0
ð8Þ
where y is the vector representation of sample and x is the coefficient of the sample on the final basis matrix. Since the class label information is involved in the basis matrix, therefore, the coefficient matrix, projections of the original samples on the basis matrix, can be used for the classification. Moreover, the basis matrix tends to be orthogonal and the coefficient matrix is inclined to be orthogonal too. 2.2.3. Update of the basis and coefficient matrices Similar to the ONMF, it is very difficult to obtain the optimal solution of Eq. (6). Naturally, the general multiplicative update rules used in the ONMF can be extended to find a local optimum here by alternately updating the basis matrix W and the coefficient matrix H [32]. Namely
0h
wij ¼ wij @h
i 1x
@JðW;HÞ @wij
0h
@JðW;HÞ @hij
þ
@JðW;HÞ @hij
i A ; hij ¼ hij @h
@JðW;HÞ @wij
i 1x i A
ð9Þ
þ
where JðW; HÞ denotes the cost function with the variables of basis matrix W and coefficient matrix H.
h
i
@JðW;HÞ @wij
denotes the
positive and negative terms of the partial derivative of elements in basis matrix W. x denotes an overrelaxation parameter and usually located in the range of ½0:5; 2. In this paper, x ¼ 1. For the ONMF, when the objective function of NMF is determined by the square of the Frobenius norm of the matrix differences, the cost function can be denoted by
JðW; HÞ ¼
1 kV WHk2F 2
ð10Þ
Then the derivation of JðW; HÞ with respect to the basis matrix W and coefficient matrix H are given by @JðW;HÞ @wij
¼ ððWH VÞH T Þij
@JðW;HÞ @hij
¼ ðW T ðWH VÞÞij
ð11Þ
In this instance, the update rules of ONMF can be obtained by replacing the positive and negative terms of Eq. (9) by those in the above equation. (1) Fix W to optimize H Apparently, the update of coefficient matrix H can be obtained by referring the multiplicative update rules and the cost function.
H ðlÞ
H ðlÞ ðW ðlÞT V ðlÞ £W ðlÞT W ðlÞ H ðlÞ Þ
ð12Þ
where denotes the Hadamard (elementwise) product and £ denotes the elementwise division. It can be seen that the update of coefficient matrix is similar to the ONMF for the reason that the second term of object function is determined by the basis matrix. (2) Fix H to optimize W Similar to the optimization of coefficient matrix H, the calculation of basis matrix W can also follow the multiplicative update rules. By referring Eqs. (6) and (9), the update of basis matrix is given by
W ðlÞ ð:; rÞ
W ðlÞ ð:; rÞ ð½rW ðlÞ ð:; rÞ £½rW ðlÞ ð:; rÞþ Þ
ð13Þ
where
ð½rW ðlÞ ð:; rÞ £½rW ðlÞ ð:; rÞþ Þ ¼
V ðlÞ H ðlÞ ð:; rÞT þ aW ðlÞ ð:; rÞð
P PR j–l
ðjÞ ðlÞ i¼1 W ð:; iÞÞW ð:; rÞ ðjÞ
W ðlÞ H ðlÞ ð:; rÞT þ aðL 1ÞRW ð:; rÞ
ð14Þ
In this instance, both basis matrix and coefficient matrix can be calculated iteratively. The procedure of calculation can be summarized as the following.
162
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
Given the nonnegative data samples V ¼ ½V ð1Þ ; V ð2Þ ; . . . V ðLÞ 2 RNM composed of L classes, set the reduced dimension R of þ each class and the weight value of a. STEP 1: Initialize the basis matrices W ðlÞ and coefficient matrices H ðlÞ : STEP 2: Normalize rows of H ðlÞ , scaling columns of W ðlÞ . Then update H ðlÞ according to Eq. (12). STEP 3: Normalize columns of W ðlÞ , scaling rows of H ðlÞ . Then update W ðlÞ according to Eq. (13). STEP 4: Terminate the iteration if the calculation reaches convergence or the maximum iteration steps and go to step 6. Otherwise, go back to step 2. STEP 5: Output the final basis matrix by concatenating the subbasis matrix of each class, namely, W ¼ ½W ð1Þ ; W ð2Þ ; W ðLÞ . Normalize the final basis matrix W . STEP 6: Calculate the final coefficient matrix H by Eq. (8). Finally, both final basis matrix and coefficient matrix are obtained by the iterative calculation. It is known that both the basis matrix and coefficient matrix are discriminative since the class information is involved in the dimension reduction. 3. K nearest neighbors KNearest Neighbors (KNN) classifies the unknown samples by the voting results of several neighbor samples [33]. In mathematics, given a sample set Sn ¼ fðx1 ; x1 Þ; ðx2 ; x2 Þ; . . . ; ðxN ; xN Þg, where xi denotes the feature vector of ith sample and xi denotes corresponding label. Suppose that all samples are included in c classes, namely xi 2 f1; 2; . . . ; cg. The distance between two samples is given by dðxi ; xj Þ ¼ kxi xj k. Then check the first kth nearest neighbors of the new sample x in the N training samples. If there are ki samples belong to the class of xi and corresponding judging function is given by
g i ðxÞ ¼ ki
ði ¼ 1; 2; . . . ; cÞ
ð15Þ
Then, the decision function of new sample x is
X ¼ xk ;
if
g i ðxÞ ¼ max g i ðxÞ i¼1;2;;c
ð16Þ
In application, the value of k, number of nearest neighbors, is usually much smaller than the total amount of samples and should be determined based on the actual problem. Generally, the value of k is odd in the classification of two classes to avoid the phenomenon that the voting results of two classes are equivalent. 4. Fault diagnosis method of diesel engine based on DNMF and KNN In this section, a novel fault diagnosis method of diesel engine is proposed by the combination of DNMF and KNN. Using the timefrequency distribution as the input feature, the proposed method is an image based fault diagnosis method and is intended to improve diagnosis efficiency by considering much information in the image. The main procedure of the proposed method is listed as the following. Given a sample set with L classes and each sample is an image feature. (1) Divide the samples to be training set and test set randomly. Transform the images to be vectors. Initial the parameters of DNMF and KNN. Such as the maximum iteration numbers MaxIter, weight a, reduced dimension R and the number of nearest neighbor k. (2) Reduce the dimension of the training set by the DNMF and obtain the subbasis matrix W ðlÞ . Then concatenate all subbasis matrices to form the final basis matrix W ¼ ½W ð1Þ ; W ð2Þ ; . . . W ðLÞ and normalize the final basis matrix. (3) Calculate the coefficient matrices of both training and test set based on Eq. (8). (4) Use the coefficient matrices as the input features of KNN classifier and obtain the classification results of test set. Finally, the diagnosis procedure can be illustrated in Fig. 1. 5. Experimental study 5.1. Test apparatus and data preprocessing Fig. 2 illustrates the diesel engine and the installation of transducers. The diesel engine is produced by the Diesel Engine Company of Shanghai with the type of 6135G. The electromagnetic pulse transducer is installed near the transmission shaft of oil injection pump. At the same time, a piece of iron is vertically fixed on the transmission shaft to record the compress top dead point of the second cylinder. It is worth noting that the collection of vibration begins at the compress top dead point that makes all vibrations share the equivalent initial phase and no phase rectification is needed before the timefrequency
163
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
Sample set of the timefrequency distrubution V
Test set Vte
Initial the parameters of DNMF and KNN classifier MaxIter, , R, k DNMF
Projection
Projection
Training set Vtr
Final basis matrix W
Coefficient matrix of training set Htr
KNN classifier
Coefficient matrix of test set Hte
Test accuracy Fig. 1. Sketch of the proposed diagnosis procedure.
Transmission shaft of oil injection pump
Sheet iron
Electron magnetic impulse transducer
(a)
(b)
Intake valve spring seat
Exhaust valve spring seat Accelerater
(c)
Fig. 2. Diesel engine and the installation of transducer (a) global view of the diesel engine; (b) installsion of the eclectromagnetic pulse transducer; (c) installsion of the accelerater.
analysis. Vibrations collected on the surface of cylinder were amplified by a charge amplifier at first and then transformed to be the digital signal at the sampling frequency of 24 kHz. In the test, seven common valve faults are introduced in the manmade manner on the second cylinder. All vibrations were collected with no load and at the speed of 1500 rpm. The parameters of eight valve condition, including the normal condition, are listed in Table 1. It is worth noting that the gaps of intake and exhaust valves are located in the ranges of ½0:25; 0:30 mm and ½0:30; 0:35 mm respectively in the normal condition. The values 0.30 mm, 0.50 mm and 0.06 mm denote that the gaps of valves are in the conditions of normal, too big and too small respectively. Fig. 3 illustrates the waveform and corresponding power spectrum of the normal condition vibration collected in a working cycle of the second cylinder. Lasting about 0.08 s, the waveform presents obvious nonstationary property since the transient amplitudes change sharply. The energy of vibration is concentrated in the range of ½6:0; 9:0 kHz. However, it is obvious that the power spectrum cannot reveal the occurrence moment of the cyclic impulses, while the waveform cannot indicate the oscillation energy. Fig. 4 illustrates the gray image and waterfall image of short time frequency transform (STFT) of the vibration collected in the normal condition. The Hamming window with the length of 64 was used in the STFT and the time shift length is 1. The length of frequency sampling is 256. It can be seen that although both representations can exactly illustrate the occurrence
164
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
Table 1 Parameters of valves in different conditions (Unit: mm). Gap of the intake valve
Gap of the exhaust valve
Notes
1 2 3 4 5 6 7 8
0.30 0.30 0.30 0.30 0.30 0.06 0.06 0.50
0.30 0.06 0.50 0.30 with a crack of 4 1 0.30 with seldom wear 0.06 0.50 0.50
Normal condition Exhaust valve gap is to small Exhaust valve gap is to large Exhaust valve experiences heavy air leak Exhaust valve experiences little air leak Both valve gaps are too small Gap of intake valve is too small, while that of exhaust valve is too large Both valve gaps are too large
Accelerate a/ms2
No.
2.0
0.0
2.0 2
0

2
Crank angle /rad
Accelerate a/ms2
(a) 0.75 0.50 0.25 0.00 0.0
3.0
6.0
9.0
12.0
Frequency f/kHz
(b) Fig. 3. Waveform and power spectrum of the normal condition.
12
Frequency f/kHz
10
8
6
4
2
0
300
200
100
0
100
200
300
Crank angle /deg
(b)
(a) Fig. 4. STFT of the vibration (a) gray image; (b) waterfall image.
moment and frequency component of the impulses, the waterfall image is obviously more understandable than the gray image because the three dimension visualization is more suitable for the cognition of human beings. However, the waterfall image needs more memory spaces than that of the gray image and may increase the calculation load of the post procedure of fault diagnosis. Therefore, to save the memory space and lighten the calculation load, the gray image is used to replace the waterfall image in the following procedures.
165
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
Fig. 5 illustrates the STFT distributions of vibrations obtained in different conditions. All images share the same size of 630 330 pixels. It can be seen that all images are sparse because only several energy concentration locations can be noticed on the timefrequency plane. Therefore, the sparse representation of these images is very important in the following analysis. Moreover, these images are distinguished from each other since the locations and energy concentrations are different. Therefore it is very suitable to use the timefrequency distribution as the input feature of fault classification. 5.2. Preprocessing It is worth noting that all timefrequency images share the equivalent size of 630 330 pixels. Apparently, the size of original image is so huge that the direct utilization of original image may make heavy trouble for any fault diagnosis method. In this instance, the dimension reduction method may play an important role in the fault diagnosis. In order to validate the efficacy of proposed method, thirteen classes timefrequency images, listed in Table 2, are used as the input feature of the method. Furthermore, in order to save the computing time, all images are down sampled to be the size of 200 100 pixels. Finally, all images are transformed to be a vector with the size of 20,000 1 pixels to begin the classification or dimension reduction. Totally 322 Gy images are used in the experiments. Four fifths numbers of images are randomly selected as the training set and the other one fifth numbers of images are used as the test set. 5.3. Results and analysis In this section, the proposed procedure is applied on the fault diagnosis of diesel engine. In order to validate the efficacy of the proposed method, the KNN method and the combination of ONMF and KNN methods (denoted by ONMF + KNN) and combination of PCA and KNN methods (denoted by PCA + KNN) are applied to the fault diagnosis of diesel engine. 5.3.1. Accuracy and test time Fig. 6 illustrates the accuracies of different methods obtained by different timefrequency representations when the reduced dimension of each class equals 2 and a equals 0.08. The number of nearest neighbors is one. For the PCA, totally
a
c
e
g
x 10 7
3
6 5 4
b
d
f
h
3 2 1
Fig. 5. STFT distributions of vibrations obtained in different conditions. Subimages (a–h) are orderly corresponding to conditions of No. 1 to No. 8 in Table 1. Corresponding to that in Fig. 4a, the horizontal and vertical ordinates are crank angle and frequency respectively.
Table 2 Timefrequency distributions used in the test. No.
Timefrequency analysis method
Abbreviation
1 2 3 4 5 6 7 8 9 10 11 12 13
S Transform Radius Gauss Kernel based distribution Improved Radius Gauss Kernel based distribution Short time Fourier Transform Pseudo WignerVille distribution WignerVille distribution Adaptive Optimal Kernel based distribution ChoiWilliams distribution Smooth Pseudo WignerVille distribution Radius Cosine Kernel based distribution Conical Kernel distribution Rearranged Smooth Pseudo WignerVille distribution Ambiguity Function
ST RGKBD IRGKBD STFT PWVD WVD AOKBD CWD SPWVD RCKBD CKD RSPWVD AF
166
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
150
KNN DNMF+KNN ONMF+KNN PCA+KNN
Accuracy
100
50
0
1
2
3
4
5
6
7
8
9
10
11
12
13
# of the time frequency distribution Fig. 6. Accuracy of difference methods on different timefrequency representations.
0.1
KNN DNMF+KNN ONMF+KNN PCA+KNN
Consuming time (s)
0.08
0.06
0.04
0.02
0
1
2
3
4
5
6
7
8
9
10
11
12
13
# of the time frequency distribution Fig. 7. Diagnosis times of different methods applied on the test set.
16 principle components are selected in order to make the reduced dimension equal to those of NMF methods. The horizontal axis denotes the timefrequency distribution numbers corresponding to those in Table 2. The average accuracies of these thirteen timefrequency distribution obtained by KNN, DNMF + KNN, ONMF + KNN and PCA + KNN are 92.6197, 93.3815, 83.1275 and 87.2579 orderly. It can be seen that the proposed method reaches similar accuracy to that of the KNN method using the original image as the input feature. At most cases, the accuracies of DNMF + KNN methods even better than those of KNN method and the average accuracy of DNMF + KNN method is the highest. The accuracies obtained by the ONMF + KNN method and PCA + KNN method are much worse than those obtained by the other two methods. Fig. 7 illustrates the diagnosis times of different methods applied on the test sets. It can be seen that the KNN method consumed the longest time in the classification of the test set for the reason that original images with very high dimensions are used as the input vectors. Comparatively, the other three methods, including DNMF + KNN, ONMF + KNN and PCA + KNN, consume much less times in the diagnosis procedure because the dimension of input features are reduced a lot. It is reasonable that the KNN method is performed by the calculation of vectors with 20,000 1 dimensions while the classifications of other methods were performed by calculation of the vector with 16 1 dimension. 5.3.2. Basis matrices Figs. 8 and 9 illustrate the basis matrices obtained by DNMF and ONMF applied on the STFT representation when the reduced dimension of each class equals 2. Therein, a1 and b1 denote the basis matrices of condition No. 1, c1 and d1 denote the basis matrices of condition No. 2, a2 and b2 denote the basis matrices of condition No. 3, c2 and d2 denote the basis matrices of condition No. 4, etc. Fig. 10 illustrates the first 16 principle components obtained by PCA. Compared with Fig. 5, it can be seen that the basis matrices obtained by DNMF maintain primary characters of different classes and the basis matrices of different classes is different from each other. For each class, the two basis matrices obtained by DNMF are very similar. Comparatively, the basis matrices obtained by ONMF is very abstract because little difference can be noticed from
167
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
1
2
3
4 11
a
10 9 8
b 7 6 5
c 4 3 2
d
1
Fig. 8. Basis matrices obtained by DNMF.
1
2
3
4 12
a 10
8
b
6
c
4
2
d Fig. 9. Basis matrices obtained by ONMF.
1
2
3
4 150
a 100
b 50
c
0
d
50
Fig. 10. Principle components obtained by PCA.
168
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
the image of basis matrices. In Fig. 10, the features of original images are reserved in the first several principle components more efficiently than those in the basis matrices obtained by ONMF. However, the differences between principle components are not very apparent compared with the basis matrices obtained by DNMF. Obviously, the discriminative capability of the basis matrices obtained by DNMF is more powerful than that obtained by ONMF and principle components obtained by PCA. However, it is worth noting that the ‘‘noise” in the basis image is originated from the coding style since all images are transformed to be vectors to start the dimension reduction and the neighbor relationship in the image has been broken. 5.3.3. Training time Fig. 11 illustrates the times consumed by dimension reduction process of DNMF, NMF and PCA on the training set. It can be seen that the consuming times of DNMF for each representation are different and are longest. Comparatively, the learning times of ONMF and PCA for each representation are nearly equivalent respectively and those of PCA are the shortest. Such phenomenon is corresponding to the mechanism of these three dimension reduction methods. By considering the discriminative information of classes, DNMF may consume different time for the reason that the timefrequency distributions of different representations are different. Moreover, both DNMF and ONMF need iteration calculation and PCA can calculate the reduced feature directly. In this instance, it is reasonable that learning times of both DNMF and ONMF are longer than that of PCA. Finally, it is worth noting the diagnosis time of DNMF + KNN, ONMF + KNN and PCA + KNN methods are the summation of the time consumed by the dimension reduction (shown in Fig. 11) and the time consumed by the classification of KNN classifier (shown in Fig. 10). In this instance, the diagnosis method of KNN method is obviously shortest, because no dimension reduction process was involved in the KNN method using original image. However, it is worth noting that the significance of dimension reduction is to save the memory and calculation load and the dimension reduction process can be performed offline. Therefore, the diagnosis time may be not extremely affected by the training time consumed by the learning of basis matrices. 5.3.4. Influence of the reduced dimension In order to observe the influence of reduced dimension on the results, the procedure was calculated with different reduced dimensions and the results are illustrated in Figs. 12 and 13. Without loss of generalization, only four timefrequency distributions including AOK, STFT, WVD and CKBD are considered in this section. Fig. 12 illustrates the divergence properties at different reduced dimensions. It can be seen that all calculation procedures with different input distributions share the similar divergence trend when the reduced dimension is equivalent. The smaller the reduced dimension, the faster is the divergence of calculation procedure and the more accurate is the calculation. Fig. 12 illustrates the influence of reduced dimension on the accurate. It can be seen that when the reduced dimension is smaller than 12, the accuracy is fluctuated by the variation of reduced dimensions. However, the increase of recued dimension does not improve the accuracy of classification. In this instance, a smaller reduced dimension may be suitable for the classification because an acceptable accuracy maybe obtain with a faster calculation. 5.3.5. Influence of the weight a The influence of the weight a on the calculation of procedure is investigated in this section and the results are illustrated in Figs. 14 and 15. Similar to the previous section, only these four timefrequency distributions including AOK, STFT, WVD and CKBD are considered. Fig. 13 illustrates the divergence property of the DNMF with different values of the weight a. It can be seen that the divergence of different distributions share the same trends when the weight a is equivalent. The bigger the weight a, the faster is the training procedure and the bigger is the error of DNMF. However, as shown in Fig. 15, the accuracies of classifiers using the features extracted from the distributions of AOK, WVD and CKBD are a little fluctuated with the increase of the weight a, while that of STFT decreases with the increase of the weight a. It is indicated that a suitable value of 300
DNMF ONMF PCA
Consuming time (s)
250 200 150 100 50 0
1
2
3
4
5
6
7
8
9
10
11
12
13
# of the time frequency distribution Fig. 11. Time consumed by the learning of basis matrices on the training set.
169
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
4
2.5 2 1.5
3 2.5 2 1.5
1
1
0.5
0.5
0
0
50
100
150
200
250
0
300
50
100
150
200
# iterations
(a)
(b)
4 3
300
R=2 R=4 R=6 R=8 R=10 R=12 R=14 R=16
3.5 3 2.5
Errors
5
250
4
R=2 R=4 R=6 R=8 R=10 R=12 R=14 R=16
6
Errors
0
# iterations
7
2 1.5
2
1
1 0
R=2 R=4 R=6 R=8 R=10 R=12 R=14 R=16
3.5
Errors
3
Errors
4
R=2 R=4 R=6 R=8 R=10 R=12 R=14 R=16
3.5
0.5 0
50
100
150
200
250
0
300
0
50
100
150
200
250
300
# iterations
# iterations
(c)
(d)
Fig. 12. Divergence of difference timefrequency with different reduced dimensions (a) AOK, (b) STFT, (c) WVD, (d) CKBD.
100 STFT WVD AOK CKBD
98
Accuracy
96 94 92 90 88
2
4
6
8
10
12
14
16
R Fig. 13. Influence of the reduced dimension on accuracy.
weight a may balance the divergence speed, reconstruction error and the classification accuracy. However, the value of weight a can be selected in a relative large range for most timefrequency distributions because the classifications are accurate and fluctuate in an acceptable range.
170
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
Fig. 14. Divergence of difference timefrequency with different weights a (a) AOK, (b) STFT, (c) WVD, (d) CKBD.
100 STFT WVD AOK CKBD
98
Accuracy
96 94 92 90 88 86
0
0.2
0.4
0.6
0.8
1
Fig. 15. Influence of the weight a on accuracy.
6. Conclusions In order to deal with the heavy calculation burthen in fault diagnosis of diesel engine using the timefrequency images, a novel diagnosis method was proposed by the combination of the DNMF and the KNN classifier. Firstly, by introducing the class information into the NMF, the DNMF was developed and the discriminative property was involved in both basis matrix and the coefficient matrix. Then the coefficient matrix was input to a KNN classifier to perform the classification. Finally, experiments were used to validate the efficacy of the proposed method by comparing with the methods of KNN, ONMF
Y.s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171
171
+ KNN and PCA + KNN. It is shown that the proposed fault diagnosis method can efficiently classify the fault classes by using the coefficient matrix of DNMF as the input vector of the KNN classifier Moreover, compared with the ONMF and PCA, the DNMF can represent the class information more efficiently because the class characters of basis matrix obtained by the DNMF are more visible than those in the basis matrix obtained by ONMF and principle components obtained by PCA. When the weight a is determined, the error of DNMF and the accuracy of classification seem to be not seriously influenced by the reduced dimension. However, the smaller the reduced dimension, the faster is the convergence of DNMF. Comparatively when the reduced dimension is fixed, the classification accuracy is hardly affected by the weight a. The bigger the weight a, the faster is the DNMF convergence and the bigger the error is. In summary, a relative small reduced dimension and weight a can be used to balance the consuming time and the classification accuracy. Acknowledgments The research work described in this paper was supported by National Natural Science Foundation of China under Grant No. 51505486 and Shaanxi Natural Science Foundation under Grant No. 2016JQ5009. References [1] H. Morgan, B. Liu, Tormos, et al, Detection and diagnosis of incipient faults in heavyduty diesel engines, IEEE Trans. Ind. Electron. 57 (10) (2010) 3522– 3532. [2] F.E.H. Tay, L. Shen, Fault diagnosis based on rough set theory, Eng. Appl. Artif. Intell. 16 (1) (2003) 39–43. [3] F. Elamin, Y. Fan, F. Gu, et al, Diesel engine valve clearance detection using acoustic emission, Adv. Mech. Eng. 2 (2010) 495741. [4] Z. Li, X. Yan, C. Yuan, et al, Intelligent fault diagnosis method for marine diesel engines using instantaneous angular speed, J. Mech. Sci. Technol. 26 (8) (2012) 2413–2423. [5] C. Wang, Y. Zhang, Z. Zhong, Fault diagnosis for diesel valve trains based on time–frequency images, Mech. Syst. Signal Process. 22 (8) (2008) 1981– 1993. [6] Z. Li, X. Yan, Z. Guo, et al, A new intelligent fusion method of multidimensional sensors and its application to tribosystem fault diagnosis of marine diesel engines, Tribol. Lett. 47 (1) (2012) 1–15. [7] H. Su, K.T. Chong, R.R. Kumar, Vibration signal analysis for electrical fault detection of induction machine using neural networks, Neural Comput. Appl. 20 (2) (2011) 183–194. [8] F. Li, J. Wang, M.K. Chyu, et al, Weak fault diagnosis of rotating machinery based on feature reduction with supervised orthogonal local fisher discriminant analysis, Neurocomputing 168 (30) (2015) 505–519. [9] M.K. Hu, Visual pattern recognition by moment invariants, Inf. Theory IRE Trans. 8 (2) (1962) 179–187. [10] A. Baraldi, F. Parmiggiani, An investigation of the textural characteristics associated with gray level cooccurrence matrix statistical parameters, IEEE Trans. Geosci. RemoteSens. 33 (2) (1995) 293–304. [11] N. Sarkar, B.B. Chaudhuri, An efficient differential boxcounting approach to compute fractal dimension of image, IEEE Trans. Syst. Man Cybern. 24 (1) (1994) 115–120. [12] C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer, 2007. [13] A. Malhi, R. Gao, PCAbased feature selection scheme for machine defect classification, IEEE Trans. Instrum. Measure. 53 (2004) 1517–1525. [14] Q. He, F. Kong, R. Yan, Subspacebased gearbox condition monitoring by kernel principal component analysis, Mech. Syst. Signal Process. 21 (4) (2007) 1755–1772. [15] Qingbo He, Ruqiang Yan, Fanrang Kong, et al, Machine condition monitoring using principle component representations, Mech. Syst. Signal Process. 23 (2009) 446–466. [16] A.M. Martinez, A.C. Kak, PCA versus LDA, IEEE Trans. Pattern Anal. Mach. Intell. 23 (2) (2001) 228–233. [17] Christelle Piantsop Mbo’o, Kay Hameyer, Fault diagnosis of bearing damage by means of the linear discriminant analysis of stator current features from the frequency selection, IEEE Trans. Ind. Appl. (2016), http://dx.doi.org/10.1109/TIA.2016.2581139. [18] A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis, vol. 46, John Wiley & Sons, the US of America, 2004. [19] Qingbo He, Ruxu Du, Fanrang Kong, Phase space feature based on independent component analysis for machine health diagnosis, J. Vib. Acoust. 134 (2) (2012) 021014–021024. [20] Qingbo He, Zhihua Feng, Fanrang Kong, Detection of signal transients using independent component analysis and its application in gearbox condition monitoring, Mech. Syst. Signal Process. 21 (5) (2007) 2056–2071. [21] A. Widodo, B.S. Yang, T. Han, Combination of independent component analysis and support vector machines for intelligent faults diagnosis of induction motors, Exp. Syst. Appl. 32 (2) (2007) 299–312. [22] Q. He, Y. Liu, J. Wang, J. Wang, C. Gong, Time–frequency manifold for gear fault signature analysis, in: Proceedings of the IEEE, International Instrumentation and Measurement Technology Conference, Hangzhou, China, May 10–12, 2011, pp. 491–495. [23] Q. He, Y. Liu, Q. Long, J. Wang, Time–frequency manifold as a signature for machine health diagnosis, IEEE Trans. Instrum. Meas. 61 (5) (2012) 1218– 1230. [24] Q. He, Timefrequency manifold for nonlinear feature extraction in machinery fault diagnosis, Mech. Syst. Signal Process. 35 (2013) 200–218. [25] Xiaoxi Ding, Qingbo He, Time–frequency manifold sparse reconstruction: a novel method for bearing fault feature extraction, Mech. Syst. Signal Process. 80 (2016) 392–413. [26] B. Li, P.L. Zhang, D.S. Liu, et al, Feature extraction for rolling element bearing fault diagnosis utilizing generalized S transform and twodimensional nonnegative matrix factorization, J. Sound Vib. 330 (10) (2011) 2388–2399. [27] D.D. Lee, H.S. Seung, Learning the parts of objects by nonnegative matrix factorization, Nature 401 (6755) (1999) 788–791. [28] D. Cai, X. He, J. Han, T.S. Huang, Graph regularized nonnegative matrix factorization for data representation, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (Aug. 2011) 1548–1560. [29] H.F. Liu, Z. Wu, D. Cai, et al, Constrained nonnegative matrix factorization for image representation, IEEE Trans. Pattern Anal. Mach. Intell. 34 (7) (2012) 1299–1311. [30] Ping Li, Jiajun Bu, Chun Chen, Can Wang, Deng Cai, Subspace learning via Locally constrained Aoptimal nonnegative projection, Neurocomputing 115 (4) (2013) 49–62. [31] Mohammadreza Babaee, Stefanos Tsoukalas, Maryam Babaee, Gerhard Rigoll, Mihai Datcu, Discriminative nonnegative matrix factorization for dimensionality reduction, Neurocomputing 173 (2) (2016) 212–223. [32] A. Cichocki, R. Zdunek, A.H. Phan, S.I. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multiway Data Analysis and Blind Source Separation, Wiley. com, 2009. [33] Dong Wang, Knearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: revisited, Mech. Syst. Signal Process. 70–71 (2016) 201–208.