Discriminative non-negative matrix factorization (DNMF) and its application to the fault diagnosis of diesel engine

Discriminative non-negative matrix factorization (DNMF) and its application to the fault diagnosis of diesel engine

Mechanical Systems and Signal Processing 95 (2017) 158–171 Contents lists available at ScienceDirect Mechanical Systems and Signal Processing journa...

5MB Sizes 0 Downloads 10 Views

Mechanical Systems and Signal Processing 95 (2017) 158–171

Contents lists available at ScienceDirect

Mechanical Systems and Signal Processing journal homepage: www.elsevier.com/locate/ymssp

Discriminative non-negative matrix factorization (DNMF) and its application to the fault diagnosis of diesel engine Yang Yong-sheng a,b, Ming An-bo c,⇑, Zhang You-yun a, Zhu Yong-sheng a a

Department of Mechanical Engineering, Xi’an Jiaotong University, Xi’an 710049, China Department of Computer Engineering, Shaanxi Academy of Governance, Xi’an 710068, China c High-Tech Research Institute of Xi’an, Xi’an 710025, China b

a r t i c l e

i n f o

Article history: Received 11 October 2016 Received in revised form 9 January 2017 Accepted 17 March 2017 Available online 1 April 2017 Keywords: Discriminative non-negative matrix factorization (DNMF) K-nearest neighborhood method Time-frequency distribution Diesel engine Fault diagnosis

a b s t r a c t Diesel engines, widely used in engineering, are very important for the running of equipments and their fault diagnosis have attracted much attention. In the past several decades, the image based fault diagnosis methods have provided efficient ways for the diesel engine fault diagnosis. By introducing the class information into the traditional non-negative matrix factorization (NMF), an improved NMF algorithm named as discriminative NMF (DNMF) was developed and a novel imaged based fault diagnosis method was proposed by the combination of the DNMF and the KNN classifier. Experiments performed on the fault diagnosis of diesel engine were used to validate the efficacy of the proposed method. It is shown that the fault conditions of diesel engine can be efficiently classified by the proposed method using the coefficient matrix obtained by DNMF. Compared with the original NMF (ONMF) and principle component analysis (PCA), the DNMF can represent the class information more efficiently because the class characters of basis matrices obtained by the DNMF are more visible than those in the basis matrices obtained by the ONMF and PCA. Ó 2017 Elsevier Ltd. All rights reserved.

1. Introduction Diesel engines are widely used power sources in engineering and the running conditions are closely related to the performance of the apparatuses. However, owing to the complex structure and harsh working environment, the diesel engines are inclined to break down with high probability and corresponding fault diagnosis technologies have attracted much attention in the past several decades. Many methods have been developed to diagnose the conditions of diesel engines [1–6]. Therein, the vibration based methods have proved to be effective because the acquisition is convenient and non-intrusive and variety information including the reciprocating motions, rotating motions, mechanical impacts and high-speed flow of gas can be involved in the vibration signal. Generally, the traditional vibration analysis methods are performed in either time domain or frequency domain [7,8]. With the development of signal processing, the non-stationary and non-linear characteristic properties have attracted more and more attention and many time-frequency analysis methods are introduced to the fault diagnosis of diesel engine [5]. Several image features, such as moment invariants, gray statistical characteristics, textural features, and differential boxcounting fractal dimension, are usually selected to describe the characteristics of the time-frequency image and used as the fault features to diagnose the fault of diesel engine [9–11]. However, similar to the statistical index in the time domain, ⇑ Corresponding author. E-mail address: [email protected] (A.-b. Ming). http://dx.doi.org/10.1016/j.ymssp.2017.03.026 0888-3270/Ó 2017 Elsevier Ltd. All rights reserved.

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

159

these characteristics are nonfigurative and are inconvenient for the intuitive understanding in the fault diagnosis. Comparatively, the time-frequency image, a two-dimension representation, is very convenient for the intuitive understanding of human beings and contains much more information than the statistic index. However, the high dimension representation of image requires much calculation and large memory space and is very difficult for the direct application of the image based method on the fault diagnosis. Fortunately, the development of dimension reduction technology has provided efficient ways for the application of fault diagnosis method using the high dimension index. The most popular dimension reduction methods include principal component analysis (PCA) [12–15], linear discriminative analysis (LDA) [16,17], independent component analysis (ICA) [18–21], manifold learning [22–25] and non-negative matrix factorization (NMF) [26,27], etc. These methods provide compact lowdimensional representations of original data in different ways for further processes such as learning and visualization. PCA computes a set of orthogonal basis that have a statistic interpretation as the basis corresponds to the largest variance. In 2009, He et al. [15] introduced the PCA technique to extract the principle component representations from the time and frequency domains statistical features of the measured signals and used a mean correlation rule to select the most representative principle components to classify machine fault patterns. Experimental results from an internal-combustion engine sound analysis and an automobile gearbox vibration analysis were used to validate the efficacy of the proposed method. LDA searches for those vectors in the underlying space that best discriminate among classes rather than those that best describe the data. In 2016, Mbo’o and Hameyer [17] developed a diagnostic system based on the current feature evaluated by means of the linear discriminate analysis from the frequency selection in the stator current spectrum. Two types of bearing damages at different loads were distinguished from the healthy bearing. ICA seeks a basis under the assumption that the latent vectors are mutually independent. In 2012, He et al. [19] proposed a new feature extraction method based on ICA and reconstructed phase space for machine fault diagnosis. Experiments in an automobile transmission gearbox were used to validate the effectiveness of the proposed method. Being able to identify low-dimensional non-linear structure hidden in high-dimensional data, manifold learning has emerged in machinery fault diagnosis. In 2013, He [24] proposed a novel time-frequency manifold (TFM) technique for machine fault signature analysis and health diagnosis by combining the non-stationary and non-linear of time-frequency distribution (TFD) of vibration. Moreover, an efficient low-dimensional feature, called TFM feature had been developed. The effectiveness and merits of TFM feature are confirmed by applications on gear wear diagnosis, bearing defect identification and defect severity evaluation. However, all these methods do not have an obvious visual interpretation because the basis and coefficient of these methods are allowed to be arbitrary sign. Comparatively, only allowing additive, not subtractive combination of the original data, NMF leads to a parts-based representation of data by the usage of negativity constraints and is widely used in different applications such as face recognition, clustering, music enhancement, sparse coding, graph match and biological data mining [27]. However, the original NMF (ONMF) is an unsupervised learning method and the coefficients derived by NMF perform not well for supervised tasks where the data are provided with class labels. In 2011, Cai et al. [28] introduced the Laplacian of neighborhood graph in the regularization term to preserve the locality property of the data. In 2013, Liu et al. [29] proposed a semi-supervised NMF approach, namely Constrained NMF (CNMF) by introducing the label information into the main objective function of NMF. In the same year, Li et al. [30] proposed a semi-supervised NMF method by using the tricks in GNMF and CNMF. The locality preserving and discrimination of data representation have been increased respectively. However, these semi-supervised NMF is the reduction of data points based on the number of available labeled data points and those points that have the same label are represented as a single point. In 2016, Babaee et al. [31] thought that the data points belong to the same class should be very close or aligned on the same axis in the new representation but not to be merged into a single point and developed a new label constrained NMF by coupling discriminative regularizer to the main objective function of NMF. Generally, all these variants of NMF are mainly applied on the face recognition. However, the mechanical fault diagnosis is different from the face recognition owing to the research object is greatly different. Seldom NMF methods have been reported to be well applied on the mechanical fault diagnosis. In this paper, an improved NMF, called discriminative non-negative matrix factorization (DNMF), has been investigated for the application of the diesel engine fault diagnosis. By using the time-frequency image as the original input features, the proposed method is an image based fault diagnosis method. As a key point, the DNMF plays an important role in the procedure owing to the dimension reduction effect. Inputting the coefficients of DNMF to the K-nearest neighbors (KNN) classifier, the proposed fault diagnosis method uses powerful discriminate capability of the DNMF and the concision of the KNN methods. The efficiency of the proposed method is validated by the application on the fault diagnosis of diesel engine. 2. Discriminative non-negative matrix factorization 2.1. Non-negative matrix factorization Considering a non-negative data matrix V nm ¼ fv 1 ; v 2 ; . . . ; v m g 2 Rnm , where v i denotes a feature vector with the þ dimension of n and m denotes the number of samples. Given a reduced dimension r, if the matrix V nm can be approximated km and H 2 Rþ , namely by a product of two non-negative matrices W 2 Rnk þ

V nm  W nr H rm

ð1Þ

160

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

thereby, this equation is called the NMF of matrix V nm . W nr can be considered as a set of basis vectors and is called basis matrix, H nr can be considered as the coordinates of each samples with respect to these basis vectors and is called coefficient matrix. Obviously, when rðm þ nÞ < mn, the new representation of matrix V nm needs less memory space than the original one. From Eq. (1), it can be seen that the nature of NMF is the calculation of matrices W nr and H nr . In Ref. [27], Lee and Seung designed an iteration algorithm based on the maximum likelihood estimation by introducing the hypothesis that all vectors of V nm is statistically independent and disturbed by the Gaussian noise. In this instance, the objective function of NMF is determined by the square of the Frobenius norm of the matrix differences, namely

min

X

kV  WHk2F ¼

½v ij  ðWHÞij 2

ð2Þ

ij

s:t: W; H P 0 Later, in order to increase the approximation accuracy and prevent the Poisson noise, the objective function is constructed by the Kullback-Leilber divergence. Namely

min

DðVkWHÞ ¼

Xh

v v ij log ðWHÞ ij

ij

ij

 v ij þ ðWHÞij

i ð3Þ

s:t: W; H P 0 It is well-known that both objective functions are non-convex when matrices W nr and H nr vary together. However, the minimization problems are convex with respect to W nr and H nr separately. Therefore, there are many local minimums. In Ref. [27], Lee and Seung presented the multiplicative updated algorithms for these two minimization problems and proved their convergence. The updated rules for the Frobenius norm are given by

wik

wik

ðVH T Þik

hkj

T

ðWHH Þik

hkj

X ðW T VÞkj i

ð4Þ

ðW T WHÞkj

while the update rules for the Kullback-Leilber divergence are determined by

P wik

hkj v ij j ðWHÞij

wik P

u hku

P hkj

wik v ij i ðWHÞij

hkj P

ð5Þ

v wv k

2.2. Discriminative non-negative matrix factorization From the previous section, it can be seen that no class label information is explicitly considered in the ONMF and the coefficient matrix may perform not well for the supervised task where the data are provided with class label. Hence, it is desirable to develop a non-negative solution for matrix factorization in the supervised setting. In this instance, the DNMF method is investigated by involving the class label information in the objective function. 2.2.1. Acquisition of the basis matrix In order to achieve the discriminative power, the data is factorized by different classes and the basis matrices of different classes are obtained at first. Furthermore, to make the basis matrix separate from each other as much as possible, the orthogonality of different basis matrices are included in the objective function by minimizing the cosine between basis vectors of different classes. In mathematics, given a non-negative data matrix V ¼ ½V ð1Þ ; V ð2Þ ; . . . V ðLÞ  2 RNM concatenated by L classes þ of non-negative data. N denotes the dimension of the data and M denotes the total number of the data with L classes. For each class of data, let R denote the reduced dimension and R < N, then the objective function of DNMF is given by

minW ðlÞ ;HðlÞ

L X

1 kV ðlÞ 2

l¼1 ðlÞ

s:t: W ; H where V ðlÞ 2 Rþ

NM l

ðlÞ

R X R XX  W ðlÞ H ðlÞ k2F þ a cosðW ðjÞ ð:; r 1 Þ; W ðlÞ ð:; r2 ÞÞ j–l r 1 ¼1r 2 ¼1

ð6Þ

>0

denotes the non-negative matrix concatenated by the data of lth class, W ðlÞ 2 RNR denotes the discrimþ

inative basis matrix corresponding to the lth class, H ðlÞ 2 Rþ l denotes the coefficient matrix of lth class, ð:; rÞ the rth column of the matrix and a > 0 denotes the regularization parameter. cosðx; yÞ is the cosine of the included angle between vectors x and y. Obviously, the smaller the included angle between vectors x and y, the bigger the cosine value. When these two vectors are orthogonal, the cosine value reaches the minimum of zero. Namely, the minimization of cosine drives the basis vectors to be orthogonal and enhances the discriminative power. Compared with Eq. (1), it can be seen that the class label information is involved in Eq. (6) under the non-negative constraints. The basis matrices are obtained by the calculation of different class data and their discriminative capability is enhanced by minimizing the cosine value of basis vectors corresponding to different classes. RM

161

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

2.2.2. Acquisition of the coefficient matrix Generally speaking, the basis matrix of NMF is equivalent to the sub-dictionary of dictionary learning. Therefore, the basis matrix of all different class can be obtained by the concatenating of the basis matrices of different classes. Namely

W ¼ ½W ð1Þ ; W ð1Þ ; . . . ; W ðLÞ 

ð7Þ

In this instance, the final basis matrix involves all class information. The coefficient of each sample, as well as the test samples, is calculated with respect to the final basis matrix by non-negative least squares (NNLS) method, which is popular in the NMF and dictionary learning. The objective function of NNLS method is given by

minx

ky  Wxk2F ;

s:t: x P 0

ð8Þ

where y is the vector representation of sample and x is the coefficient of the sample on the final basis matrix. Since the class label information is involved in the basis matrix, therefore, the coefficient matrix, projections of the original samples on the basis matrix, can be used for the classification. Moreover, the basis matrix tends to be orthogonal and the coefficient matrix is inclined to be orthogonal too. 2.2.3. Update of the basis and coefficient matrices Similar to the ONMF, it is very difficult to obtain the optimal solution of Eq. (6). Naturally, the general multiplicative update rules used in the ONMF can be extended to find a local optimum here by alternately updating the basis matrix W and the coefficient matrix H [32]. Namely

0h

wij ¼ wij  @h

i 1x

@JðW;HÞ @wij

0h



@JðW;HÞ @hij

þ

@JðW;HÞ @hij

i A ; hij ¼ hij  @h

@JðW;HÞ @wij

i 1x i A

ð9Þ

þ

where JðW; HÞ denotes the cost function with the variables of basis matrix W and coefficient matrix H.

h

i

@JðW;HÞ @wij



denotes the

positive and negative terms of the partial derivative of elements in basis matrix W. x denotes an over-relaxation parameter and usually located in the range of ½0:5; 2. In this paper, x ¼ 1. For the ONMF, when the objective function of NMF is determined by the square of the Frobenius norm of the matrix differences, the cost function can be denoted by

JðW; HÞ ¼

1 kV  WHk2F 2

ð10Þ

Then the derivation of JðW; HÞ with respect to the basis matrix W and coefficient matrix H are given by @JðW;HÞ @wij

¼ ððWH  VÞH T Þij

@JðW;HÞ @hij

¼ ðW T ðWH  VÞÞij

ð11Þ

In this instance, the update rules of ONMF can be obtained by replacing the positive and negative terms of Eq. (9) by those in the above equation. (1) Fix W to optimize H Apparently, the update of coefficient matrix H can be obtained by referring the multiplicative update rules and the cost function.

H ðlÞ

H ðlÞ  ðW ðlÞT V ðlÞ £W ðlÞT W ðlÞ H ðlÞ Þ

ð12Þ

where  denotes the Hadamard (element-wise) product and £ denotes the element-wise division. It can be seen that the update of coefficient matrix is similar to the ONMF for the reason that the second term of object function is determined by the basis matrix. (2) Fix H to optimize W Similar to the optimization of coefficient matrix H, the calculation of basis matrix W can also follow the multiplicative update rules. By referring Eqs. (6) and (9), the update of basis matrix is given by

W ðlÞ ð:; rÞ

W ðlÞ ð:; rÞ  ð½rW ðlÞ ð:; rÞ £½rW ðlÞ ð:; rÞþ Þ

ð13Þ

where

ð½rW ðlÞ ð:; rÞ £½rW ðlÞ ð:; rÞþ Þ ¼

V ðlÞ H ðlÞ ð:; rÞT þ aW ðlÞ ð:; rÞð

P PR j–l

ðjÞ ðlÞ i¼1 W ð:; iÞÞW ð:; rÞ ðjÞ

W ðlÞ H ðlÞ ð:; rÞT þ aðL  1ÞRW ð:; rÞ

ð14Þ

In this instance, both basis matrix and coefficient matrix can be calculated iteratively. The procedure of calculation can be summarized as the following.

162

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

Given the non-negative data samples V ¼ ½V ð1Þ ; V ð2Þ ; . . . V ðLÞ  2 RNM composed of L classes, set the reduced dimension R of þ each class and the weight value of a. STEP 1: Initialize the basis matrices W ðlÞ and coefficient matrices H ðlÞ : STEP 2: Normalize rows of H ðlÞ , scaling columns of W ðlÞ . Then update H ðlÞ according to Eq. (12). STEP 3: Normalize columns of W ðlÞ , scaling rows of H ðlÞ . Then update W ðlÞ according to Eq. (13). STEP 4: Terminate the iteration if the calculation reaches convergence or the maximum iteration steps and go to step 6. Otherwise, go back to step 2. STEP 5: Output the final basis matrix by concatenating the sub-basis matrix of each class, namely, W ¼ ½W ð1Þ ; W ð2Þ ;    W ðLÞ . Normalize the final basis matrix W . STEP 6: Calculate the final coefficient matrix H by Eq. (8). Finally, both final basis matrix and coefficient matrix are obtained by the iterative calculation. It is known that both the basis matrix and coefficient matrix are discriminative since the class information is involved in the dimension reduction. 3. K nearest neighbors K-Nearest Neighbors (KNN) classifies the unknown samples by the voting results of several neighbor samples [33]. In mathematics, given a sample set Sn ¼ fðx1 ; x1 Þ; ðx2 ; x2 Þ; . . . ; ðxN ; xN Þg, where xi denotes the feature vector of ith sample and xi denotes corresponding label. Suppose that all samples are included in c classes, namely xi 2 f1; 2; . . . ; cg. The distance between two samples is given by dðxi ; xj Þ ¼ kxi  xj k. Then check the first kth nearest neighbors of the new sample x in the N training samples. If there are ki samples belong to the class of xi and corresponding judging function is given by

g i ðxÞ ¼ ki

ði ¼ 1; 2; . . . ; cÞ

ð15Þ

Then, the decision function of new sample x is

X ¼ xk ;

if

g i ðxÞ ¼ max g i ðxÞ i¼1;2;;c

ð16Þ

In application, the value of k, number of nearest neighbors, is usually much smaller than the total amount of samples and should be determined based on the actual problem. Generally, the value of k is odd in the classification of two classes to avoid the phenomenon that the voting results of two classes are equivalent. 4. Fault diagnosis method of diesel engine based on DNMF and KNN In this section, a novel fault diagnosis method of diesel engine is proposed by the combination of DNMF and KNN. Using the time-frequency distribution as the input feature, the proposed method is an image based fault diagnosis method and is intended to improve diagnosis efficiency by considering much information in the image. The main procedure of the proposed method is listed as the following. Given a sample set with L classes and each sample is an image feature. (1) Divide the samples to be training set and test set randomly. Transform the images to be vectors. Initial the parameters of DNMF and KNN. Such as the maximum iteration numbers MaxIter, weight a, reduced dimension R and the number of nearest neighbor k. (2) Reduce the dimension of the training set by the DNMF and obtain the sub-basis matrix W ðlÞ . Then concatenate all subbasis matrices to form the final basis matrix W ¼ ½W ð1Þ ; W ð2Þ ; . . . W ðLÞ  and normalize the final basis matrix. (3) Calculate the coefficient matrices of both training and test set based on Eq. (8). (4) Use the coefficient matrices as the input features of KNN classifier and obtain the classification results of test set. Finally, the diagnosis procedure can be illustrated in Fig. 1. 5. Experimental study 5.1. Test apparatus and data preprocessing Fig. 2 illustrates the diesel engine and the installation of transducers. The diesel engine is produced by the Diesel Engine Company of Shanghai with the type of 6135G. The electromagnetic pulse transducer is installed near the transmission shaft of oil injection pump. At the same time, a piece of iron is vertically fixed on the transmission shaft to record the compress top dead point of the second cylinder. It is worth noting that the collection of vibration begins at the compress top dead point that makes all vibrations share the equivalent initial phase and no phase rectification is needed before the time-frequency

163

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

Sample set of the timefrequency distrubution V

Test set Vte

Initial the parameters of DNMF and KNN classifier MaxIter, , R, k DNMF

Projection

Projection

Training set Vtr

Final basis matrix W

Coefficient matrix of training set Htr

KNN classifier

Coefficient matrix of test set Hte

Test accuracy Fig. 1. Sketch of the proposed diagnosis procedure.

Transmission shaft of oil injection pump

Sheet iron

Electron magnetic impulse transducer

(a)

(b)

Intake valve spring seat

Exhaust valve spring seat Accelerater

(c)

Fig. 2. Diesel engine and the installation of transducer (a) global view of the diesel engine; (b) installsion of the eclectromagnetic pulse transducer; (c) installsion of the accelerater.

analysis. Vibrations collected on the surface of cylinder were amplified by a charge amplifier at first and then transformed to be the digital signal at the sampling frequency of 24 kHz. In the test, seven common valve faults are introduced in the man-made manner on the second cylinder. All vibrations were collected with no load and at the speed of 1500 rpm. The parameters of eight valve condition, including the normal condition, are listed in Table 1. It is worth noting that the gaps of intake and exhaust valves are located in the ranges of ½0:25; 0:30 mm and ½0:30; 0:35 mm respectively in the normal condition. The values 0.30 mm, 0.50 mm and 0.06 mm denote that the gaps of valves are in the conditions of normal, too big and too small respectively. Fig. 3 illustrates the waveform and corresponding power spectrum of the normal condition vibration collected in a working cycle of the second cylinder. Lasting about 0.08 s, the waveform presents obvious non-stationary property since the transient amplitudes change sharply. The energy of vibration is concentrated in the range of ½6:0; 9:0 kHz. However, it is obvious that the power spectrum cannot reveal the occurrence moment of the cyclic impulses, while the waveform cannot indicate the oscillation energy. Fig. 4 illustrates the gray image and waterfall image of short time frequency transform (STFT) of the vibration collected in the normal condition. The Hamming window with the length of 64 was used in the STFT and the time shift length is 1. The length of frequency sampling is 256. It can be seen that although both representations can exactly illustrate the occurrence

164

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

Table 1 Parameters of valves in different conditions (Unit: mm). Gap of the intake valve

Gap of the exhaust valve

Notes

1 2 3 4 5 6 7 8

0.30 0.30 0.30 0.30 0.30 0.06 0.06 0.50

0.30 0.06 0.50 0.30 with a crack of 4  1 0.30 with seldom wear 0.06 0.50 0.50

Normal condition Exhaust valve gap is to small Exhaust valve gap is to large Exhaust valve experiences heavy air leak Exhaust valve experiences little air leak Both valve gaps are too small Gap of intake valve is too small, while that of exhaust valve is too large Both valve gaps are too large

Accelerate a/ms-2

No.

2.0

0.0

-2.0 -2

0

-

2

Crank angle /rad

Accelerate a/ms-2

(a) 0.75 0.50 0.25 0.00 0.0

3.0

6.0

9.0

12.0

Frequency f/kHz

(b) Fig. 3. Waveform and power spectrum of the normal condition.

12

Frequency f/kHz

10

8

6

4

2

0

-300

-200

-100

0

100

200

300

Crank angle /deg

(b)

(a) Fig. 4. STFT of the vibration (a) gray image; (b) waterfall image.

moment and frequency component of the impulses, the waterfall image is obviously more understandable than the gray image because the three dimension visualization is more suitable for the cognition of human beings. However, the waterfall image needs more memory spaces than that of the gray image and may increase the calculation load of the post procedure of fault diagnosis. Therefore, to save the memory space and lighten the calculation load, the gray image is used to replace the waterfall image in the following procedures.

165

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

Fig. 5 illustrates the STFT distributions of vibrations obtained in different conditions. All images share the same size of 630  330 pixels. It can be seen that all images are sparse because only several energy concentration locations can be noticed on the time-frequency plane. Therefore, the sparse representation of these images is very important in the following analysis. Moreover, these images are distinguished from each other since the locations and energy concentrations are different. Therefore it is very suitable to use the time-frequency distribution as the input feature of fault classification. 5.2. Preprocessing It is worth noting that all time-frequency images share the equivalent size of 630  330 pixels. Apparently, the size of original image is so huge that the direct utilization of original image may make heavy trouble for any fault diagnosis method. In this instance, the dimension reduction method may play an important role in the fault diagnosis. In order to validate the efficacy of proposed method, thirteen classes time-frequency images, listed in Table 2, are used as the input feature of the method. Furthermore, in order to save the computing time, all images are down sampled to be the size of 200  100 pixels. Finally, all images are transformed to be a vector with the size of 20,000  1 pixels to begin the classification or dimension reduction. Totally 322 Gy images are used in the experiments. Four fifths numbers of images are randomly selected as the training set and the other one fifth numbers of images are used as the test set. 5.3. Results and analysis In this section, the proposed procedure is applied on the fault diagnosis of diesel engine. In order to validate the efficacy of the proposed method, the KNN method and the combination of ONMF and KNN methods (denoted by ONMF + KNN) and combination of PCA and KNN methods (denoted by PCA + KNN) are applied to the fault diagnosis of diesel engine. 5.3.1. Accuracy and test time Fig. 6 illustrates the accuracies of different methods obtained by different time-frequency representations when the reduced dimension of each class equals 2 and a equals 0.08. The number of nearest neighbors is one. For the PCA, totally

a

c

e

g

x 10 7

-3

6 5 4

b

d

f

h

3 2 1

Fig. 5. STFT distributions of vibrations obtained in different conditions. Sub-images (a–h) are orderly corresponding to conditions of No. 1 to No. 8 in Table 1. Corresponding to that in Fig. 4a, the horizontal and vertical ordinates are crank angle and frequency respectively.

Table 2 Time-frequency distributions used in the test. No.

Time-frequency analysis method

Abbreviation

1 2 3 4 5 6 7 8 9 10 11 12 13

S Transform Radius Gauss Kernel based distribution Improved Radius Gauss Kernel based distribution Short time Fourier Transform Pseudo Wigner-Ville distribution Wigner-Ville distribution Adaptive Optimal Kernel based distribution Choi-Williams distribution Smooth Pseudo Wigner-Ville distribution Radius Cosine Kernel based distribution Conical Kernel distribution Rearranged Smooth Pseudo Wigner-Ville distribution Ambiguity Function

ST RGKBD IRGKBD STFT PWVD WVD AOKBD CWD SPWVD RCKBD CKD RSPWVD AF

166

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

150

KNN DNMF+KNN ONMF+KNN PCA+KNN

Accuracy

100

50

0

1

2

3

4

5

6

7

8

9

10

11

12

13

# of the time frequency distribution Fig. 6. Accuracy of difference methods on different time-frequency representations.

0.1

KNN DNMF+KNN ONMF+KNN PCA+KNN

Consuming time (s)

0.08

0.06

0.04

0.02

0

1

2

3

4

5

6

7

8

9

10

11

12

13

# of the time frequency distribution Fig. 7. Diagnosis times of different methods applied on the test set.

16 principle components are selected in order to make the reduced dimension equal to those of NMF methods. The horizontal axis denotes the time-frequency distribution numbers corresponding to those in Table 2. The average accuracies of these thirteen time-frequency distribution obtained by KNN, DNMF + KNN, ONMF + KNN and PCA + KNN are 92.6197, 93.3815, 83.1275 and 87.2579 orderly. It can be seen that the proposed method reaches similar accuracy to that of the KNN method using the original image as the input feature. At most cases, the accuracies of DNMF + KNN methods even better than those of KNN method and the average accuracy of DNMF + KNN method is the highest. The accuracies obtained by the ONMF + KNN method and PCA + KNN method are much worse than those obtained by the other two methods. Fig. 7 illustrates the diagnosis times of different methods applied on the test sets. It can be seen that the KNN method consumed the longest time in the classification of the test set for the reason that original images with very high dimensions are used as the input vectors. Comparatively, the other three methods, including DNMF + KNN, ONMF + KNN and PCA + KNN, consume much less times in the diagnosis procedure because the dimension of input features are reduced a lot. It is reasonable that the KNN method is performed by the calculation of vectors with 20,000  1 dimensions while the classifications of other methods were performed by calculation of the vector with 16  1 dimension. 5.3.2. Basis matrices Figs. 8 and 9 illustrate the basis matrices obtained by DNMF and ONMF applied on the STFT representation when the reduced dimension of each class equals 2. Therein, a1 and b1 denote the basis matrices of condition No. 1, c1 and d1 denote the basis matrices of condition No. 2, a2 and b2 denote the basis matrices of condition No. 3, c2 and d2 denote the basis matrices of condition No. 4, etc. Fig. 10 illustrates the first 16 principle components obtained by PCA. Compared with Fig. 5, it can be seen that the basis matrices obtained by DNMF maintain primary characters of different classes and the basis matrices of different classes is different from each other. For each class, the two basis matrices obtained by DNMF are very similar. Comparatively, the basis matrices obtained by ONMF is very abstract because little difference can be noticed from

167

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

1

2

3

4 11

a

10 9 8

b 7 6 5

c 4 3 2

d

1

Fig. 8. Basis matrices obtained by DNMF.

1

2

3

4 12

a 10

8

b

6

c

4

2

d Fig. 9. Basis matrices obtained by ONMF.

1

2

3

4 150

a 100

b 50

c

0

d

-50

Fig. 10. Principle components obtained by PCA.

168

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

the image of basis matrices. In Fig. 10, the features of original images are reserved in the first several principle components more efficiently than those in the basis matrices obtained by ONMF. However, the differences between principle components are not very apparent compared with the basis matrices obtained by DNMF. Obviously, the discriminative capability of the basis matrices obtained by DNMF is more powerful than that obtained by ONMF and principle components obtained by PCA. However, it is worth noting that the ‘‘noise” in the basis image is originated from the coding style since all images are transformed to be vectors to start the dimension reduction and the neighbor relationship in the image has been broken. 5.3.3. Training time Fig. 11 illustrates the times consumed by dimension reduction process of DNMF, NMF and PCA on the training set. It can be seen that the consuming times of DNMF for each representation are different and are longest. Comparatively, the learning times of ONMF and PCA for each representation are nearly equivalent respectively and those of PCA are the shortest. Such phenomenon is corresponding to the mechanism of these three dimension reduction methods. By considering the discriminative information of classes, DNMF may consume different time for the reason that the time-frequency distributions of different representations are different. Moreover, both DNMF and ONMF need iteration calculation and PCA can calculate the reduced feature directly. In this instance, it is reasonable that learning times of both DNMF and ONMF are longer than that of PCA. Finally, it is worth noting the diagnosis time of DNMF + KNN, ONMF + KNN and PCA + KNN methods are the summation of the time consumed by the dimension reduction (shown in Fig. 11) and the time consumed by the classification of KNN classifier (shown in Fig. 10). In this instance, the diagnosis method of KNN method is obviously shortest, because no dimension reduction process was involved in the KNN method using original image. However, it is worth noting that the significance of dimension reduction is to save the memory and calculation load and the dimension reduction process can be performed off-line. Therefore, the diagnosis time may be not extremely affected by the training time consumed by the learning of basis matrices. 5.3.4. Influence of the reduced dimension In order to observe the influence of reduced dimension on the results, the procedure was calculated with different reduced dimensions and the results are illustrated in Figs. 12 and 13. Without loss of generalization, only four timefrequency distributions including AOK, STFT, WVD and CKBD are considered in this section. Fig. 12 illustrates the divergence properties at different reduced dimensions. It can be seen that all calculation procedures with different input distributions share the similar divergence trend when the reduced dimension is equivalent. The smaller the reduced dimension, the faster is the divergence of calculation procedure and the more accurate is the calculation. Fig. 12 illustrates the influence of reduced dimension on the accurate. It can be seen that when the reduced dimension is smaller than 12, the accuracy is fluctuated by the variation of reduced dimensions. However, the increase of recued dimension does not improve the accuracy of classification. In this instance, a smaller reduced dimension may be suitable for the classification because an acceptable accuracy maybe obtain with a faster calculation. 5.3.5. Influence of the weight a The influence of the weight a on the calculation of procedure is investigated in this section and the results are illustrated in Figs. 14 and 15. Similar to the previous section, only these four time-frequency distributions including AOK, STFT, WVD and CKBD are considered. Fig. 13 illustrates the divergence property of the DNMF with different values of the weight a. It can be seen that the divergence of different distributions share the same trends when the weight a is equivalent. The bigger the weight a, the faster is the training procedure and the bigger is the error of DNMF. However, as shown in Fig. 15, the accuracies of classifiers using the features extracted from the distributions of AOK, WVD and CKBD are a little fluctuated with the increase of the weight a, while that of STFT decreases with the increase of the weight a. It is indicated that a suitable value of 300

DNMF ONMF PCA

Consuming time (s)

250 200 150 100 50 0

1

2

3

4

5

6

7

8

9

10

11

12

13

# of the time frequency distribution Fig. 11. Time consumed by the learning of basis matrices on the training set.

169

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

4

2.5 2 1.5

3 2.5 2 1.5

1

1

0.5

0.5

0

0

50

100

150

200

250

0

300

50

100

150

200

# iterations

(a)

(b)

4 3

300

R=2 R=4 R=6 R=8 R=10 R=12 R=14 R=16

3.5 3 2.5

Errors

5

250

4

R=2 R=4 R=6 R=8 R=10 R=12 R=14 R=16

6

Errors

0

# iterations

7

2 1.5

2

1

1 0

R=2 R=4 R=6 R=8 R=10 R=12 R=14 R=16

3.5

Errors

3

Errors

4

R=2 R=4 R=6 R=8 R=10 R=12 R=14 R=16

3.5

0.5 0

50

100

150

200

250

0

300

0

50

100

150

200

250

300

# iterations

# iterations

(c)

(d)

Fig. 12. Divergence of difference time-frequency with different reduced dimensions (a) AOK, (b) STFT, (c) WVD, (d) CKBD.

100 STFT WVD AOK CKBD

98

Accuracy

96 94 92 90 88

2

4

6

8

10

12

14

16

R Fig. 13. Influence of the reduced dimension on accuracy.

weight a may balance the divergence speed, reconstruction error and the classification accuracy. However, the value of weight a can be selected in a relative large range for most time-frequency distributions because the classifications are accurate and fluctuate in an acceptable range.

170

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

Fig. 14. Divergence of difference time-frequency with different weights a (a) AOK, (b) STFT, (c) WVD, (d) CKBD.

100 STFT WVD AOK CKBD

98

Accuracy

96 94 92 90 88 86

0

0.2

0.4

0.6

0.8

1

Fig. 15. Influence of the weight a on accuracy.

6. Conclusions In order to deal with the heavy calculation burthen in fault diagnosis of diesel engine using the time-frequency images, a novel diagnosis method was proposed by the combination of the DNMF and the KNN classifier. Firstly, by introducing the class information into the NMF, the DNMF was developed and the discriminative property was involved in both basis matrix and the coefficient matrix. Then the coefficient matrix was input to a KNN classifier to perform the classification. Finally, experiments were used to validate the efficacy of the proposed method by comparing with the methods of KNN, ONMF

Y.-s. Yang et al. / Mechanical Systems and Signal Processing 95 (2017) 158–171

171

+ KNN and PCA + KNN. It is shown that the proposed fault diagnosis method can efficiently classify the fault classes by using the coefficient matrix of DNMF as the input vector of the KNN classifier Moreover, compared with the ONMF and PCA, the DNMF can represent the class information more efficiently because the class characters of basis matrix obtained by the DNMF are more visible than those in the basis matrix obtained by ONMF and principle components obtained by PCA. When the weight a is determined, the error of DNMF and the accuracy of classification seem to be not seriously influenced by the reduced dimension. However, the smaller the reduced dimension, the faster is the convergence of DNMF. Comparatively when the reduced dimension is fixed, the classification accuracy is hardly affected by the weight a. The bigger the weight a, the faster is the DNMF convergence and the bigger the error is. In summary, a relative small reduced dimension and weight a can be used to balance the consuming time and the classification accuracy. Acknowledgments The research work described in this paper was supported by National Natural Science Foundation of China under Grant No. 51505486 and Shaanxi Natural Science Foundation under Grant No. 2016JQ5009. References [1] H. Morgan, B. Liu, Tormos, et al, Detection and diagnosis of incipient faults in heavy-duty diesel engines, IEEE Trans. Ind. Electron. 57 (10) (2010) 3522– 3532. [2] F.E.H. Tay, L. Shen, Fault diagnosis based on rough set theory, Eng. Appl. Artif. Intell. 16 (1) (2003) 39–43. [3] F. Elamin, Y. Fan, F. Gu, et al, Diesel engine valve clearance detection using acoustic emission, Adv. Mech. Eng. 2 (2010) 495741. [4] Z. Li, X. Yan, C. Yuan, et al, Intelligent fault diagnosis method for marine diesel engines using instantaneous angular speed, J. Mech. Sci. Technol. 26 (8) (2012) 2413–2423. [5] C. Wang, Y. Zhang, Z. Zhong, Fault diagnosis for diesel valve trains based on time–frequency images, Mech. Syst. Signal Process. 22 (8) (2008) 1981– 1993. [6] Z. Li, X. Yan, Z. Guo, et al, A new intelligent fusion method of multi-dimensional sensors and its application to tribo-system fault diagnosis of marine diesel engines, Tribol. Lett. 47 (1) (2012) 1–15. [7] H. Su, K.T. Chong, R.R. Kumar, Vibration signal analysis for electrical fault detection of induction machine using neural networks, Neural Comput. Appl. 20 (2) (2011) 183–194. [8] F. Li, J. Wang, M.K. Chyu, et al, Weak fault diagnosis of rotating machinery based on feature reduction with supervised orthogonal local fisher discriminant analysis, Neurocomputing 168 (30) (2015) 505–519. [9] M.K. Hu, Visual pattern recognition by moment invariants, Inf. Theory IRE Trans. 8 (2) (1962) 179–187. [10] A. Baraldi, F. Parmiggiani, An investigation of the textural characteristics associated with gray level co-occurrence matrix statistical parameters, IEEE Trans. Geosci. RemoteSens. 33 (2) (1995) 293–304. [11] N. Sarkar, B.B. Chaudhuri, An efficient differential box-counting approach to compute fractal dimension of image, IEEE Trans. Syst. Man Cybern. 24 (1) (1994) 115–120. [12] C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer, 2007. [13] A. Malhi, R. Gao, PCA-based feature selection scheme for machine defect classification, IEEE Trans. Instrum. Measure. 53 (2004) 1517–1525. [14] Q. He, F. Kong, R. Yan, Subspace-based gearbox condition monitoring by kernel principal component analysis, Mech. Syst. Signal Process. 21 (4) (2007) 1755–1772. [15] Qingbo He, Ruqiang Yan, Fanrang Kong, et al, Machine condition monitoring using principle component representations, Mech. Syst. Signal Process. 23 (2009) 446–466. [16] A.M. Martinez, A.C. Kak, PCA versus LDA, IEEE Trans. Pattern Anal. Mach. Intell. 23 (2) (2001) 228–233. [17] Christelle Piantsop Mbo’o, Kay Hameyer, Fault diagnosis of bearing damage by means of the linear discriminant analysis of stator current features from the frequency selection, IEEE Trans. Ind. Appl. (2016), http://dx.doi.org/10.1109/TIA.2016.2581139. [18] A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis, vol. 46, John Wiley & Sons, the US of America, 2004. [19] Qingbo He, Ruxu Du, Fanrang Kong, Phase space feature based on independent component analysis for machine health diagnosis, J. Vib. Acoust. 134 (2) (2012) 021014–021024. [20] Qingbo He, Zhihua Feng, Fanrang Kong, Detection of signal transients using independent component analysis and its application in gearbox condition monitoring, Mech. Syst. Signal Process. 21 (5) (2007) 2056–2071. [21] A. Widodo, B.-S. Yang, T. Han, Combination of independent component analysis and support vector machines for intelligent faults diagnosis of induction motors, Exp. Syst. Appl. 32 (2) (2007) 299–312. [22] Q. He, Y. Liu, J. Wang, J. Wang, C. Gong, Time–frequency manifold for gear fault signature analysis, in: Proceedings of the IEEE, International Instrumentation and Measurement Technology Conference, Hangzhou, China, May 10–12, 2011, pp. 491–495. [23] Q. He, Y. Liu, Q. Long, J. Wang, Time–frequency manifold as a signature for machine health diagnosis, IEEE Trans. Instrum. Meas. 61 (5) (2012) 1218– 1230. [24] Q. He, Time-frequency manifold for nonlinear feature extraction in machinery fault diagnosis, Mech. Syst. Signal Process. 35 (2013) 200–218. [25] Xiaoxi Ding, Qingbo He, Time–frequency manifold sparse reconstruction: a novel method for bearing fault feature extraction, Mech. Syst. Signal Process. 80 (2016) 392–413. [26] B. Li, P.L. Zhang, D.S. Liu, et al, Feature extraction for rolling element bearing fault diagnosis utilizing generalized S transform and two-dimensional non-negative matrix factorization, J. Sound Vib. 330 (10) (2011) 2388–2399. [27] D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (6755) (1999) 788–791. [28] D. Cai, X. He, J. Han, T.S. Huang, Graph regularized nonnegative matrix factorization for data representation, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (Aug. 2011) 1548–1560. [29] H.F. Liu, Z. Wu, D. Cai, et al, Constrained non-negative matrix factorization for image representation, IEEE Trans. Pattern Anal. Mach. Intell. 34 (7) (2012) 1299–1311. [30] Ping Li, Jiajun Bu, Chun Chen, Can Wang, Deng Cai, Subspace learning via Locally constrained A-optimal nonnegative projection, Neurocomputing 115 (4) (2013) 49–62. [31] Mohammadreza Babaee, Stefanos Tsoukalas, Maryam Babaee, Gerhard Rigoll, Mihai Datcu, Discriminative nonnegative matrix factorization for dimensionality reduction, Neurocomputing 173 (2) (2016) 212–223. [32] A. Cichocki, R. Zdunek, A.H. Phan, S.-I. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation, Wiley. com, 2009. [33] Dong Wang, K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: revisited, Mech. Syst. Signal Process. 70–71 (2016) 201–208.