Automated emotion recognition based on higher order statistics and deep learning algorithm

Automated emotion recognition based on higher order statistics and deep learning algorithm

Biomedical Signal Processing and Control 58 (2020) 101867 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journal...

4MB Sizes 0 Downloads 0 Views

Biomedical Signal Processing and Control 58 (2020) 101867

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc

Automated emotion recognition based on higher order statistics and deep learning algorithm Rahul Sharma a,∗ , Ram Bilas Pachori b , Pradip Sircar a a b

Department of Electrical Engineering, Indian Institute of Technology Kanpur, Kanpur 208016, India Discipline of Electrical Engineering, Indian Institute of Technology Indore, Indore 453552, India

a r t i c l e

i n f o

Article history: Received 5 August 2019 Received in revised form 13 January 2020 Accepted 20 January 2020 Keywords: Emotion recognition EEG Higher order statistics Long short-term memory Deep learning

a b s t r a c t The objective of this paper is online recognition of human emotions based on electroencephalogram (EEG) signals. The emotions are originated from the central and peripheral nervous systems. Hence, it can be adequately characterized by the EEG signal, as it directly reflects changes in the human emotional states. This paper describes an automated classification of emotions-labeled EEG signals using nonlinear higher order statistics and deep learning algorithm. The discrete wavelet transform is used to decompose the studied signal into sub-bands, known as rhythms of the EEG signal. The third-order cumulants (ToC) are used to explore the nonlinear dynamics of each sub-band signal in higher dimensional space. The data in the higher dimensional space contain repeated and redundant information due to presence of various symmetries in the ToC. Hence, an evolutionary data reduction technique, namely, the particle swarm optimization, is employed to get rid of irrelevant information. The long short-term memory based deep learning technique is used to retrieve the emotion variation from the optimized data corresponding to the labeled EEG signals. This study is carried out with the web-available DEAP dataset that yields 82.01% average classification accuracy with 10-fold cross-validation technique corresponding to four-labeled emotions classes. The achieved results have confirmed that the proposed algorithm has the potential for accurate and rapid recognition of human emotions. © 2020 Elsevier Ltd. All rights reserved.

1. Introduction Emotion is a psychological experience that is characterized by intense mental activities. It entails the synchronized features, including, knowledge, expression, response, and action tendencies [1,2]. The emotions can be recognized by facial expressions, speech, body posture, physiological activities, etc. These techniques are based on externally expressed emotions, which may not capture innermost feelings; while the electroencephalogram (EEG) signals unveil this secret information and provide emotional patterns [3,4]. The emotions are originated from the central and peripheral nervous systems that cause a transient agitation due to the synchronized firing of the neurons [5]. The visual analysis of long term EEG data is very monotonous and time-consuming; also, this may lead to false key points detection. Therefore, various algorithms based on feature extraction are proposed to quantify the information of

∗ Corresponding author. E-mail addresses: [email protected] (R. Sharma), [email protected] (R.B. Pachori), [email protected] (P. Sircar). https://doi.org/10.1016/j.bspc.2020.101867 1746-8094/© 2020 Elsevier Ltd. All rights reserved.

the EEG signal. These features are able to capture the underlying complexity and non-linearity of the EEG signals. 2. Related works Many human emotion recognition algorithms are proposed using different time domain, and frequency domain signal processing approaches [6–11]. Kroupi et al. [12] proposed a correlation-based emotion classification algorithm. They measured the Spearman correlation coefficient among the normalized length density (NLD), non-stationarity index (NSI), and power spectral density (PSD) feature vectors to discriminate three emotions classes, namely, the valence, arousal and like/dislike. Authors proposed various signal decomposition algorithms to decomposed the studied signals into rhythms and various parameters such as the spatial pattern of the  band, ratio of ˇ to ˛ band PSD, and more are computed from these rhythms to discriminate human emotions [13,14]. Hadjidimitriou and Hadjileontiadis [15] measured various mixed features, namely, the spectrogram, the Hilbert Huang spectra (HHS), and the Zhao Atlas-Mark distribution from the studied signals. These features are classified with different classifiers such as the k-nearest neighbor (kNN), the quadratic discriminat-

2

R. Sharma, R.B. Pachori and P. Sircar / Biomedical Signal Processing and Control 58 (2020) 101867

ing analysis (QDA), and the Mahalanobis discriminating analysis (MDA) classifiers. They achieved 86.52 ± 0.76% classification accuracy with nine HHS based features that are classified with the kNN classifier. In literature, various nonlinear parameters are used to capture the underlying characteristics of the emotions-labeled EEG signals [16–21]. The Hjorth parameter [16,17], the fractal and multifractal dimension [18,16,20] and the different variant of differential entropy [21,19] are used to characterize human emotions. Frantzidis et al. [22] proposed a two-level emotion classification algorithm. The event-related potential (ERP) and event-related oscillations (EROs) are computed as features that are classified with the MDA and the support vector machine (SVM) classifiers. The achieved overall classification accuracies are 79.5% and 81.3% for the MDA and the SVM classifiers, respectively. Zheng and Lu [21] achieved 86.65% average classification accuracy when the differential entropy is used as a feature. Since the EEG signals exhibit non-stationary behavior that is not characterized alone by the time-domain or spectral-domain methods. These approaches do not provide any information about the frequency variations over time and the energy distribution over different frequencies present in the signal. To address the limitations, various time-frequency (TF) analysis methods have been proposed to analyze non-stationary EEG signals. These methods decompose the studied signal on time and frequency axes simultaneously, yielding better temporal localization of a signal’s spectral components. The short-term Fourier transform (STFT) is employed to recognize human emotions [23]. But, the fixed window length of the STFT leads to the poor performance in the localization of transient and oscillatory behavior of the EEG signal. The wavelet transform (WT) provides the variable-length basis wavelet kernels that provide an excellent TF resolution. The discrete wavelet transform (DWT) is widely used for automated human emotion recognition [24–26]. The DWT suffers from shift-variance and non-adjustable TF covering, leads to reduced resolution at high frequencies. To suppress this issue, various wavelet variants such as multiwavelet transform [27], flexible analytic wavelet transform (FAWT) [28], and more are introduced in emotion recognition literature. The WT is a data-dependent algorithm, and the results may vary with changes of basis wavelet. Further, the computational complexity increases with the inclusive decomposition levels. Like the WT, the empirical mode decomposition (EMD) is another nonlinear tool that is also used for automated emotion recognition [29,30]. Petrantonakis and Hadjileontiadis [25,29] proposed the higher-order zero-crossing count as a feature. This feature is measured from various sub-bands signals that are achieved from the DWT and EMD algorithms. The convolution neural network (CNN) based algorithms are also proposed for automated human emotions recognition [31,32]. The above defined algorithms are features-dependent algorithms. As there is no standardized set of parameters that can perfectly capture the signal dynamics. The irrelevant features increase the feature space that degrades the resultant accuracy, also may increase the overfitting risk. The proposed algorithm is a datadependent algorithm that automatically learns time-varying signal characteristics from data itself that reduces features sustainability issues. Instead of the conventional data reduction algorithm, an evolutionary algorithm is used, which maintains multiple potential solutions at a time that makes it faster and accurate. To classify the substantial labeled information, a memory unit containing a deep learning algorithm is used that remembers the signal information for an extended period. Fig. 1 illustrates the block representation of the proposed algorithm. The proposed method is studied on the open-source DEAP database, which is preprocessed and globally used for the analysis of human affective states. This paper is arranged as follows: The review of the studied dataset, followed

Table 1 The number of samples in different labeled emotions of DEAP-dataset. Emotion labels

Number of samples

HaHv LaHv LaLv HaLv

348 298 282 352

by the problem formulation is provided in Section 3. The pointswise details about the proposed methodology used for the feature extraction, selection, and classification are given in Section 4. Section 5 includes the results, and the discussion is given in Section 6. The conclusion is presented in Section 7. 3. Dataset A web-available database comprising of a dataset for emotion analysis using the EEG, physiological, and video signals (DEAP) is used in the present study [33]. The dataset is the collection of 32 EEG and eight peripheral physiological signals that are recorded by 40 channels, placed according to the international 10–20 electrodes placement system [34]. The 32 volunteers have watched 40 preselected one-minute lengthy excerpts of music videos, and each volunteer is asked to rate each video numerically in terms of the levels of arousal (1–9), valence (1–9), like/dislike (1–9), dominance (1–9) and familiarity (1–5). The resultant signals are downsampled at 128 Hz frequency, and a bandpass filter (4–45 Hz) is used to remove the artifacts. It is assumed that emotion labeling has not started during the early watching of the videos. Thus, the 3-s duration is considered as baseline recording. In this article, we have also studied the SEED emotion database. This database consists of three emotional-labeled (positive, neutral, and negative) EEG signals. These signals are recorded from 15 volunteers (7 males and 8 females). Each subject contributed to the experiment thrice at an interval of one week and each experiment performed 15 trials. The signals are recorded by 62-channel electrode cap and downsampled at 200 Hz sampling rate. A band-pass filter (0–70 Hz) is used to remove the artifacts. The detailed information of the SEED database can be found in [35,36]. 3.1. Classification problem formulation In this study, the four-classes classification problem is considered corresponding to the induced emotions in the four quadrants of the valence-arousal space comprising of low arousal and low valance (LaLv), high arousal and low valance (HaLv), low arousal and high valance (LaHv), and high arousal and high valance (HaHv). The criteria of the emotion discrimination among the high or low arousal and valance (LaLv, HaLv, LaHv, and HaHv) can be decided according to the mean-standard deviation limit given along with the dataset [33]. Fig. 2 shows the labeled EEG signals related to the four considered classes of DEAP-dataset. Table 1 depicts the number of different samples associated with each category. Thus, the problem statement is: To classify these four emotions classes based on the EEG signals. 4. Methodology 4.1. Proposed algorithm The proposed methodology of this paper is very easy to implement. Initially, the DWT is used to decompose the studied signal into five rhythms, namely, alpha, beta, gamma, delta, and theta present in the EEG signal. This allows the analysis of the signal into

R. Sharma, R.B. Pachori and P. Sircar / Biomedical Signal Processing and Control 58 (2020) 101867

3

Fig. 1. Block diagram of the proposed algorithm.

Fig. 2. Labeled EEG signals for DEAP-dataset.

different frequency bands. The third-order cumulant (ToC) is used to convert various decomposed sub-signals into higher dimensional space or 2D space, which not only preserves the original signal information but also consists of harmonic information. Due to presence of various symmetries of the ToC distribution, the analysis space suffers from redundant and repeated information. Hence, a dimension reduction algorithm is used to reduce the dimension of the measured features matrix comprising of the ToC coefficients. In this study, a particle swarm optimization algorithm is used to optimize the feature matrix, as it maintains multiple potential solutions at a time that makes it faster and accurate. The z-score normalization technique is subsequently applied to normalize the optimized feature matrix attributes. It eliminates the redundant data by scaling data to zero mean with unit variance and forms the

best attributes with initial smaller ranges. The normalization can be achieved as follow: features −  ˆ features = 

(1)

ˆ where features are the normalized features, with  and  being the mean and standard deviation of the features matrix, respectively. These resultant normalized attributes are input to the deep learning network. The proposed deep learning network consists of different features analysis layers along with the long-short term memory (LSTM) layer, which having memory base learning capacity that makes a higher probability of correctness. Finally, the emotionallabeling EEG signals are discriminated with the softmax classifier with a ten-fold cross-validation technique.

4

R. Sharma, R.B. Pachori and P. Sircar / Biomedical Signal Processing and Control 58 (2020) 101867

4.2. Signal decomposition

introduces repeated and redundant information for signal analysis [44]:

In this work, a cascaded filtering technique is used to implement the DWT. The signals are decomposed into rhythms with a pair high-pass filter and low-pass filter. It decomposes the signal into the approximation coefficients and detail coefficients of subbands. The approximation coefficients are further divided into new approximation and detail coefficients. This process is carried out to the defined level of decomposition [37,38]. As such there is no wavelet selection criterion that is used for simulation. The selection of the wavelet basis has been done based on literature study [24]. The mother wavelet function a,b (t) is given by: a,b (t) =

1  a

t − b

(2)

a

where a = 2j is the scale parameter and b = k. 2j is the shift parameter, and both j and k are integers. There are mainly five rhythms that are present in the EEG signals at different frequency bands. The frequency ranges covered by various rhythms are categorized as delta (0.1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz), and gamma (36–40 Hz). Fig. 3 depicts the various rhythms present in the considered four-labeled EEG signals of DEAP-dataset. The size of analysis labeled EEG dataset is the order of (number of volunteers (32) × number of channels (40) × number of rhythms (5) × data length (3840)) when last 30-s duration is considered. It is observed that small frequencies are also present in the EEG signals of DEAP-dataset that may arise due to the leakage of filter response. The presence of these small frequencies are also depicted as delta rhythm in Fig. 3. 4.3. Higher order statistics The studied signals are having highly nonlinear and nonstationary behavior. The ToC is the triple correlation of a signal. It is the function of two lag parameters along with its harmonics that illustrate the presence of non-linearity present in the non-stationary brain signals. It characterized the time-varying information into higher dimensional space that not only preserves the original signal information, but also consists of the harmonics information at higher dimension, and this may better reveal the dynamics of the nervous system. The higher order statistics (HOS) has been used in various biomedical signal processing applications [39–42]. It gives a nonlinear algorithm that allows analysis of signal in higher dimensional space. Let x[] be a zero-mean kth order stationary discrete-time random process. The kth order cumulant of x[] is defined as follows [43]: Cxk [1 , 2 , . . ., k−1 ]

=

Mk,x [1 , 2 , . . ., k−1 ]

(3)

−mk,G [1 , 2 , . . ., k−1 ]

where Mk,x is the kth order moment function of random process x[], and mk,G is the kth order moment function of an equivalent Gaussian random process. It can be noticed that the kth order cumulant is only a function of the (k − 1) lags 1 , 2 , . . ., k−1 . For orders k= 1, 2, and 3, the cumulant can be defined as follow: [44] First order cumulant: Cx1 = M1,x = E{x[]} Second order cumulant: 2 Cx2 [1 ] = M2,x [1 ] − M1,x

(4)

2 = C 2 [− ] = M2,x [−1 ] − M1,x 1 x

Third order cumulant (ToC): Cx3 [1 , 2 ]

= M3,x [1 , 2 ] −M1,x





M2,x [1 ] + M2,x [2 ] + M2,x [1 − 2 ]

3 + 2M1,x

(5)

where E is the expectation operator. It is noticeable that the ToC analyzes the studied signal in 2D space. The six symmetries of ToC

Cx3 (1 , 2 ) = Cx3 (2 , 1 ) = Cx3 (−2 , 1 − 2 ) = Cx3 (2 − 1 , −1 ) = Cx3 (1 − 2 , −2 ) =

Cx3 (−1 , 2

(6)

− 1 )

The HOS has some unique properties that help to analyze the nonGaussian signals as given below [45]: a. The ToC is equal to zero for symmetric distributed random variable, i.e., Cxk [1 , 2 , . . ., k−1 ] = 0. This makes it efficient to analyze the non-Gaussian distributed signal in the presence of Gaussian distributed signals such as noise. b. The ToC is infinitely differentiable and convex that allows analyzing the non-minimum phase and phase coupled signals. c. It can recognize Gaussian/non-Gaussian signals, linear/nonlinear systems, phase coupling, and more. The ToC computation is applied to each EEG sub-bands signal. Fig. 4 shows the ToC of the EEG signals rhythms corresponding to four emotions classes. The dimension of the data comprising of the simulated ToC coefficients is huge, i.e., (32 × 40 × 5 × length-of-ToC-data). It requires huge storage memory and much computational time. A features selection technique is employed in the next section to get rid of the redundant elements of the features matrix. 4.4. Particle swarm optimization (PSO) There are various techniques used to reduce the dimension of the analyzed data. Although they reduce the data dimension to the required limit, they suffer from certain limitations, such as, the global optimization, prior knowledge, learning rate, number of iterations, gradient problem, etc. To overcome these limitations, various evolutionary algorithms are used in literature. Nakisa et al. [46] used five evolutionary algorithms, namely, the ant colony optimization, simulated annealing, genetic algorithm, particle swarm optimization (PSO), and differential evolution to discriminate emotions based on the EEG signals. They observed that the PSO achieved the optimum solution and obtained 65.31437 ± 3.22760% classification accuracy with the probabilistic neural network (PNN) classifier on the DEAP database. In this proposed study, the PSO algorithm is introduced to extract the relevant information from the ToC coefficients. It has certain advantages over conventional optimization algorithms that are given below: • The PSO based on one global and one local minimum associated with the best solution of the function being considered. • It maintains multiple potential solutions at a time that makes it faster and accurate. • It does not require initial knowledge of functional parameters such as starting point, learning rate, etc. • Unlike the classical optimization algorithms, it does not use the gradient method such as the gradient descent and quasi-newton methods to be optimized. The PSO is a paradigm of the food search technique of a group of swarms proposed by [47]. The velocity and position of the kth swarm at time instant t are influenced by its local best-known position, i.e., pbest and gbest. The pbest is characterized by the individual best position, while gbest is the global best position, achieved so far by a group of swarms [48]. The updation in velocity and position can be defined as follows [49]:

vk (t + 1) = vk (t) + a1 × rand( · ) × (pbest − pk (t)) k

R. Sharma, R.B. Pachori and P. Sircar / Biomedical Signal Processing and Control 58 (2020) 101867

5

Fig. 3. DEAP-dataset emotion labels Rhythms (OS-original signal, , ˇ, ˛, , and ı): (a) HvHa, (b) HvLa, (c) LvHa, and (d) lvLa.

+ a2 × rand( · ) × (pgbest − pt (t))

pk (t + 1) = pk (t) + vk (t)

(7)

(8)

where p is the position, v is the velocity, a1 and a2 are two positive weights, and rand(·) is a random value that lies in the range [0,1]. The standard deviation of the ToC coefficients is considered as the fitness function. The optimized ToC data is normalized using the z-score normalization techniques [50].

6

R. Sharma, R.B. Pachori and P. Sircar / Biomedical Signal Processing and Control 58 (2020) 101867

Fig. 4. Contour polts of ToC of various rhythms present in EEG signals: (a) HvHa, (b) HvLa, (c) LvHa, and (d) LvLa, from DEAP-dataset.

4.5. Deep learning algorithm In this study, a deep learning algorithm is used to classify the emotional EEG signals. The deep learning model contains two different units, namely, the computational unit and classification unit. The computational unit consists of multiple layers to progressively extract higher-level attributes from the input data. Each layer performs a different task to extract subtle information from the analysis data by eliminating redundancy from respective input and pick only those features which improve performance. Next, the classification unit discriminates the computational unit output corresponding to the unknown labels. The proposed deep learning model consists of five layers, namely, the sequence input layer, bidirectional LSTM layer, fully connected layer, softmax layer, and a classification layer. These layers are connected progressively. The sequence input layer takes input from the data sequence in the first row of a data matrix. That is analyzed by the bi-directional LSTM layer. More details about the LSTM layer are discussed in the sequel. The LSTM layer simulates the input data that are further subjected to the fully connected layer. This layer works as a normal feed-forward network and passes the sequence of significant information to the next layer. The softmax layer converts the

Fig. 5. Architecture of deep learning.

received data to a max probability of class percentage, and the classification layer classifies this sorted data. Fig. 5 depicts the block representation of a deep learning algorithm. The LSTM remembers the long-term dependencies between the time steps of sequential data that makes a higher probability of correctness in a short time [51,52]. Fig. 6 illustrates the basic LSTM computational unit. The input gate it ∈ Rh , processes the input and previous output at time instant t, and updates the current internal state yt that is fed to the next unit. The forget gate ft ∈ Rh , as the name indicates, controls the flow of previous information. It is the output of a sigmoid activation function, that varies between 0 and 1 and multiply with the previous internal state. If ft = 1, the previous

R. Sharma, R.B. Pachori and P. Sircar / Biomedical Signal Processing and Control 58 (2020) 101867

7

accuracy is staying close to 90%, and the loss function close to 0.2. In the training accuracy and loss function figures, the two-color variations are observed. The light color (blue in case of accuracy curve, and orange in loss curve) is related to the change with every iteration while the dark color represents the smooth variation. The cross-entropy loss function is considered during the simulation. 5.1. Performance measures

Fig. 6. Basic unit of LSTM.

internal state is further processed, otherwise it becomes zero or forgotten. In the same sequel, the output gate ot ∈ Rh controls the value of the internal state using the hyperbolic tangent activation function and generates the output at time instant t [53,54]. 5. Results In this biomedical signal analysis study, a deep learning algorithm is employed due to its automatic features learning capability that makes it fast and accurate. The proposed network starts with the sequence input layer, followed by two different units of bidirectional LSTM layers that can learn long-term dependencies between the time-steps of input data in both forward and backward directions. To predict the label of the unseen emotional EEG data, the network finishes with the fully connected layer, the softmax layer, and the classification output layer. The proposed network processes the obtained data and achieves the significant information that captures the nonlinear dynamics of each studied EEG signal. The operation tanh is considered as the state activation function, while the sigmoid as the gate activation function. The LSTM layer tries to extract the input and recurrent weights along with the bias such that they can maximize the probability of classification. The size of the sequence input layer is equal to the number of normalized ToC attributes. The first bidirectional LSTM takes the initial state of the network and input to the first time-step of the sequence x1 . It computes the first output h1 and the updated cell state y1 . Similarly, at time-step t, the bidirectional LSTM blocks the input to the current state of the considered network (yt1 , ht1 ) and the sequence Xt to compute the output ht and the updated cell state yt . The lower-units second bidirectional LSTM layer follows this process. The fully connected layer combines the labeled output of the second LSTM layer. The labeled classes specify the size of the fully connected layer; in this case, the size is four, which is the number of considered emotional classes. The softmax classifier is used to classify the labeled data into four classes. It is trained with some specific constraints, such as the number of maximum epochs is equal to 100, which allows the proposed network to make 100 passes through the training data. An initial learning rate of 0.001 allows the training process to speed up with 50 minibatch sizes. The normalized ToC attributes signal is broken into smaller pieces to make sure that the machine does not go beyond the memory by simulating too much data at any time instant. The simulated normalized ToC data is divided into training (70%) and testing (30%) parts. Fig. 7 depicts the variation of training accuracy with each iteration, while the training cross-entropy loss function is displayed on Fig. 8. It can be noticed that the training

The performance of the proposed emotion classification algorithm has been validated with various measurements, i.e., accuracy (Acc), sensitivity (Sen), specificity (Spe), positive predictive value (PPV), negative predictive value (NPV), kappa value ( ), random accuracy (RA), and Matthews correlation coefficient (MCC). The value measures the amount of agreement, i.e., interrater reliability, while the MCC measures the quality of intra-classes correlation for imbalanced data. The mathematical expression of these measurements are defined in terms of confusion matrix parameters such as true positive (TP), true negative (TN), false positive (FP), and false negative (FN). Sen =

TP TP + FN

(9)

Spe =

TN TN + FP

(10)

PPV =

TP TP + FP

(11)

NPV =

TN TN + FN

(12)

Acc = =

TP + TN TP + FP + TN + FN

Acc − RA 1 − RA

RA =

(13) (14)

(TN + FP) × (TN + FN) + (FN + TP) × (FP + TP)

MCC =

(TP + FP + TN + FN)2



TP × TN − FP × FN (TP + FP) × (FN + TP) × (FP + TN) × (TN + FN)

(15)

The confusion matrices of two-classes and three-classes emotions recognition are depicted in Fig. 9. Table 2 has listed the various measurements achieved with the proposed algorithm. In the case of four-classes emotions classification, the individual class has its own parameters that discriminate it to remaining classes. The proposed algorithm achieved a classification accuracy of an order of 82.01% for four-classes (HaHv/LaLv/LaHv/HaLv) while 85.21% and 84.16% for two-classes (Ha/La and Hv/Lv) respectively with ten-fold cross-validation technique. 6. Discussion The results obtained from the proposed algorithm attain better classification accuracy as compared to the other existing algorithms listed in Table 3. Literature reveals that the HOS based algorithm is already used to recognize human emotions [41]. A set of various features such as the mean of bispectrum, bicoherence, and Hinich’s tests for Gaussian and linearity are measured directly from each emotional-labeled EEG signals. They achieved 82% binary classification accuracy with these labeled-attributes when the SVM classifier is used with the radial basis function (RBF) kernels. Their algorithm is feature-dependent algorithm and it may be that the measured features are not able to unveil the dynamics of EEG signals. In our approach, a HOS based features-independent algorithm is proposed that provides enhanced classification accuracy. Instead

8

R. Sharma, R.B. Pachori and P. Sircar / Biomedical Signal Processing and Control 58 (2020) 101867

Fig. 7. Accuracy with each iterations. (For interpretation of the references to color in the text, the reader is referred to the web version of this article.)

Fig. 8. Estimated loss variation with each iterations. (For interpretation of the references to color in the text, the reader is referred to the web version of this article.)

Fig. 9. Confusion matrix of four-classes classification: (a) Two-classes, (b) four-classes.

Table 2 Measured performance parameters. Measurements

Acc

Sen

Spe

PPV

PNV

RA



MCC

Ha/La Hv/Lv

85.21 84.16

87.57 91.64

82.25 76.50

85.62 79.89

84.56 89.98

0.5056 0.5008

0.7070 0.7829

0.6635 0.7009

83.91 89.93 71.99 81.54

93.99 93.08 96.29 92.63

83.91 79.76 84.58 80.62

93.99 96.82 92.40 93.01

0.6041 0.6269 0.6749 0.6010

0.7790 0.7949 0.7213 0.7388

0.7790 0.7973 0.7250 0.7389

HaHv/LaHv/LALv/HaLv 82.01

of a few selected HOS parameters, the whole HOS transformed features matrix is used to the labeled human emotions. The deep learning algorithm automatically extracts the relevant long term dependency features from the HOS features matrix that perfectly capture the underlying signal variations corresponding to emotions generation. The hybrid deep learning method for automated human emotions recognition is proposed in [32]. They introduced CNN with the recurrent neural networks (RNN) that is based on the LSTM learning method for automatic emotion discrimination based on the multi-channel EEG signals. They proposed the EEG multidimensional features images (MFIs) that are the 9 × 9-dimensional

features matrices of the power spectrum density (PSD) of the EEG signals. The feature matrices of these MFIs are input to the convolutional and LSTM recurrent neural networks (CLRNN) to recognize human emotions. They reported 75.21% classification accuracy for four emotions classes subjected to 32 volunteers. Generally, the CNN is used in image processing to extract structural information from images. The MFIs are the pseudo images which do not have element-wise correlation like natural images. The image pixels are highly correlated, and they respond to various types of structural filters that are not feasible with the MFIs.

R. Sharma, R.B. Pachori and P. Sircar / Biomedical Signal Processing and Control 58 (2020) 101867

9

Table 3 Comparison with existing emotions recognition techniques. Authors (year)

Methods

Classifier

Classes

No. of subjects

Accuracy (%)

Hosseini et al. [41] Li et al. [32] Gupta et al. [28]

HOS CLRNN FAWT

SVM – Random forest

Murugappan [24] Nakisa et al. [46] Yin et al. [11]

DWT Time, frequency, and TF features Time and frequency features

k-NN PNN LSSVM

2 4 4 2 (Ha/La) 2 (Hv/Lv) 2 4 2

15 32 32 32 32 – 32 –

82 75.21 71.43 79.95 79.99 71.3 67.47 ± 3.38% 78

32 32 32 ...

82.01 85.21 84.16 ...

15

90.81

Proposed method

HOS+LSTM

Softmax

Gupta et al. [28] employed the FAWT to recognize human emotions based on the EEG signals classification. The studied EEG signals are decomposed to various levels of decomposition. The information potential is measured as a feature from each decomposed sub-band. They obtained 71.43% classification accuracy for four-classes emotions and 79.99% and 79.95% accuracy for two binary-classes, respectively, with the random forest classifier. Murugappan et al. [24] computed the power spectral density from the DWT coefficients. They obtained 71.3% classification accuracy with the kNN classifier to discriminate two valence emotions classes. The proposed algorithm has suffered from the scale variance shift problem that reduces high frequencies resolution. Nakisa et al. [46] analyzed the EEG signals in time, frequency, and TF domain, and measured various features from each domain. They employed various evolutionary algorithms such as the differential evolution, genetic algorithm, and more to select the significant attributes from mixed fixture matrix. They achieved 67.47 ± 3.38% classification accuracy to discriminate four emotions classes with the probabilistic neural network (PNN) classifier. In this sequel, Yin et al. [11] computed sixteen features from the raw EEG signal in time and frequency domains. The relevant features are selected by the transfer recursive features elimination algorithm and classified with the least squares-SVM (LS-SVM) classifier. They obtained 78% classification accuracy for binary classification. The abovedefined algorithms are features-dependent algorithms that give rise to feature sustainability and computational complexity issues. The classifier performance may be reduced to process irrelevant features which are not able to track the studied signal dynamics. The proposed algorithms is also employed on web-available SEED emotion dataset. It is a collection of three-emotions labeled EEG signals (positive, neutral, and negative) of 15-experiments. Each experiment is repeated thrice at an interval of one week. Hence, the database contains the EEG signals of a total 45 experiments. Initially, the database is manually divided in two parts, i.e., training and testing parts. The training data contains the EEG signals of first two repetitions of every experiment (i.e., the EEG signals of 30-experiments), while the EEG signals of third repetition of every experiment (i.e., the EEG signals of 15-experiments) is considered as a testing data. These signals are explored by the ToC into dimensional space and gone through the steps of proposed algorithm. We achieved 90.81% discrimination accuracy to classify three-emotions label, i.e., positive, neutral, and negative emotions. In this paper, a nonlinear HOS based method is proposed for the classification of the multi-channel emotions labeled EEG data. Instead of a manually selected features algorithms, an automated feature learning algorithm is proposed that captures the nonlinear dynamics of the non-stationary EEG signals that reduces feature dependency. The proposed method shows its effectiveness in terms

DEAP-dataset 4 2 (Ha/La) 2 (Hv/Lv)) ... SEED-dataset 3

of classification accuracy along with all 40 channels for each 32subject considered. The algorithm has to evaluate the ToC of a huge dataset, and the deep algorithms parameters constraints make the process slightly slow. 7. Conclusion The LSTM-based deep learning algorithm is used for the emotions classification of the EEG signals. The EEG signals database is labeled according to the attached emotional criteria with the database benchmark. The EEG signals labeled with emotions are pre-processed and split into the rhythms present in the studied signals. The nonlinear ToC is measured on each rhythm of the EEG into higher dimensional space. To reduce the redundant information in the ToC coefficients that are introduced due to symmetries, the PSO based feature reduction technique is used to optimize the dimension of the data matrix. The resultant information is input to the LSTM-based deep learning network to classify the labeled ToC coefficients using the softmax classifier. The results show that the proposed algorithm has achieved 82.01% classification accuracy with the PSO based data reduction techniques. The proposed algorithm is fully automated and has attained a state of the art classification accuracy. Authors’ contributions Mr. R. Sharma (corresponding author) generates the idea and writes the manuscript of the present article. Prof R.B. Pachori and Prof P. Sircar have provided their appreciable guidance and participated in valuable discussions leading to mathematical formulations, methodology, simulations, and corrections for improving the manuscript. Mr. Sharma has tried his best to act upon them, incorporated all of them into the initial manuscript, and the revised manuscript. Conflicts of interest The authors declare no conflicts of interest. References [1] M. Cabanac, What is emotion, Behav. Process. 60 (2) (2002) 69–82. [2] P.J. Lang, M.M. Bradley, B.N. Cuthbert, Emotion, motivation, and anxiety: brain mechanisms and psychophysiology, Biol. Psychiatry 44 (12) (1998) 1248–1263. [3] G.G. Knyazev, J.Y. Slobodskoj-Plusnin, A.V. Bocharov, Gender differences in implicit and explicit processing of emotional facial expressions as revealed by event-related theta synchronization, Emotion 10 (5) (2010) 678. [4] M. Cabanac, Physiological role of pleasure, Science 173 (4002) (1971) 1103–1107.

10

R. Sharma, R.B. Pachori and P. Sircar / Biomedical Signal Processing and Control 58 (2020) 101867

[5] D.S. Bassett, O. Sporns, Network neuroscience, Nat. Neurosci. 20 (3) (2017) 353. [6] Z. Khalili, M. Moradi, Emotion detection using brain and peripheral signals, 2008 Cairo International Biomedical Engineering Conference (2008) 1–4. ´ S.N. Vitaladevuni, R. Prasad, Robust EEG emotion classification [7] V. Rozgic, using segment level decision fusion, IEEE International Conference on Acoustics, Speech and Signal Processing (2013) 1286–1290. [8] W.-L. Zheng, B.-N. Dong, B.-L. Lu, Multimodal emotion recognition using EEG and eye tracking data, 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (2014) 5040–5043. [9] Y.-Y. Lee, S. Hsieh, Classifying different emotional states by means of EEG-based functional connectivity patterns, PLoS One 9 (4) (2014) e95415. [10] C.T. Yuen, W. San San, T.C. Seong, M. Rizon, Classification of human emotions from EEG signals using statistical features and neural network, Int. J. Integr. Eng. 1 (3) (2009). [11] Z. Yin, Y. Wang, L. Liu, W. Zhang, J. Zhang, Cross-subject EEG feature selection for emotion recognition using transfer recursive feature elimination, Front. Neurorobot. 11 (2017) 19. [12] E. Kroupi, A. Yazdani, T. Ebrahimi, EEG correlates of different emotional states elicited during watching music videos, in: Affective Computing and Intelligent Interaction, Springer, 2011, pp. 457–466. [13] M. Li, B.-L. Lu, Emotion classification based on gamma-band EEG, Annual International Conference of the IEEE Engineering in Medicine and Biology Society (2009) 1223–1226. [14] Y. Liu, O. Sourina, EEG-based dominance level recognition for emotion-enabled interaction, IEEE International Conference on Multimedia and Expo (ICME) (2012) 1039–1044. [15] S.K. Hadjidimitriou, L.J. Hadjileontiadis, Toward an EEG-based recognition of music liking using time-frequency analysis, IEEE Trans. Biomed. Eng. 59 (12) (2012) 3498–3510. [16] K. Ansari-Asl, G. Chanel, T. Pun, A channel selection method for EEG classification in emotion assessment based on synchronization likelihood, 15th European Signal Processing Conference (2007) 1241–1245. [17] R. Horlings, D. Datcu, L.J. Rothkrantz, Emotion recognition using brain activity, Proceedings of the 9th International Conference on Computer Systems and Technologies and Workshop for PhD students in Computing (2008), 6. [18] N. Thammasan, K. Moriyama, K.-I. Fukui, M. Numao, Continuous music-emotion recognition based on electroencephalogram, IEICE Trans. Inf. Syst. 99 (4) (2016) 1234–1241. [19] W.-L. Zheng, J.-Y. Zhu, B.-L. Lu, Identifying stable patterns over time for emotion recognition from EEG, IEEE Trans. Affect. Comput. 10 (3) (2017) 417–429. [20] S. Paul, A. Mazumder, P. Ghosh, D. Tibarewala, G. Vimalarani, EEG based emotion recognition system using mfdfa as feature extractor, International Conference on Robotics, Automation, Control and Embedded Systems (RACE) (2015) 1–5. [21] W.-L. Zheng, B.-L. Lu, Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks, IEEE Trans. Auton. Mental Dev. 7 (3) (2015) 162–175. [22] C.A. Frantzidis, C. Bratsas, C.L. Papadelis, E. Konstantinidis, C. Pappas, P.D. Bamidis, Toward emotion aware computing: an integrated approach using multichannel neurophysiological recordings and affective visual stimuli, IEEE Trans. Inf. Technol. Biomed. 14 (3) (2010) 589–597. [23] D. Nie, X.-W. Wang, L.-C. Shi, B.-L. Lu, EEG-based emotion recognition during watching movies, 5th International IEEE/EMBS Conference on Neural Engineering (NER), 2011 (2011) 667–670. [24] M. Murugappan, N. Ramachandran, Y. Sazali, Classification of human emotion from EEG using discrete wavelet transform, J. Biomed. Sci. Eng. 3 (04) (2010) 390. [25] P.C. Petrantonakis, L.J. Hadjileontiadis, Emotion recognition from brain signals using hybrid adaptive filtering and higher order crossings analysis, IEEE Trans. Affect. Comput. 1 (2) (2010) 81–97. [26] M. Murugappan, M. Rizon, R. Nagarajan, S. Yaacob, Inferring of human emotional states using multichannel EEG, Eur. J. Sci. Res. 48 (2) (2010) 281–299. [27] V. Bajaj, R.B. Pachori, Detection of human emotions using features based on the multiwavelet transform of EEG signals, in: Brain–Computer Interfaces, Springer, 2015, pp. 215–240. [28] V. Gupta, M.D. Chopda, R.B. Pachori, Cross-subject emotion recognition using flexible analytic wavelet transform from EEG signals, IEEE Sens. J. 19 (6) (2018) 2266–2274.

[29] P.C. Petrantonakis, L.J. Hadjileontiadis, Emotion recognition from EEG using higher order crossings, IEEE Trans. Inf. Technol. Biomed. 14 (2) (2010) 186–197. [30] L. Xin, Q. Xiaoying, S. Xiaoqi, X. Jiali, F. Mengdi, K. Jiannan, An improved multi-scale entropy algorithm in emotion EEG features extraction, J. Med. Imaging Health Inform. 7 (2) (2017) 436–439. [31] X. Li, D. Song, P. Zhang, G. Yu, Y. Hou, B. Hu, Emotion recognition from multi-channel EEG data through convolutional recurrent neural network, IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2016) 352–359. [32] Y. Li, J. Huang, H. Zhou, N. Zhong, Human emotion recognition with electroencephalographic multidimensional features by hybrid deep neural networks, Appl. Sci. 7 (10) (2017) 1060. [33] S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, I. Patras, Deap: a database for emotion analysis; using physiological signals, IEEE Trans. Affect. Comput. 3 (1) (2012) 18–31. [34] G.H. Klem, H.O. Lüders, H. Jasper, C. Elger, et al., The ten-twenty electrode system of the international federation, Electroencephalogr. Clin. Neurophysiol. 52 (3) (1999) 3–6. [35] W.-L. Zheng, B.-L. Lu, Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks, IEEE Trans. Auton. Mental Dev. 7 (3) (2015) 162–175, http://dx.doi.org/10.1109/TAMD. 2015.2431497. [36] R.-N. Duan, J.-Y. Zhu, B.-L. Lu, Differential entropy feature for EEG-based emotion classification, 6th International IEEE/EMBS Conference on Neural Engineering (NER) (2013) 81–84. [37] S. Mallat, A Wavelet Tour of Signal Processing, Elsevier, 1999. [38] G. Meurant, Wavelets: A Tutorial in Theory and Applications, vol. 2, Academic Press, 2012. [39] R. Sharma, P. Sircar, R.B. Pachori, A new technique for classification of focal and nonfocal EEG signals using higher-order spectra, J. Mech. Med. Biol. 19 (01) (2019) 1940010. [40] R. Sharma, P. Sircar, R.B. Pachori, S.V. Bhandary, U.R. Acharya, Automated glaucoma detection using center slice of higher order statistics, J. Mech. Med. Biol. 19 (01) (2019) 1940011. [41] S.A. Hosseini, M.A. Khalilzadeh, M.B. Naghibi-Sistani, V. Niazmand, Higher order spectra analysis of EEG signals in emotional stress states, Second International Conference on Information Technology and Computer Science (ITCS) (2010) 60–63. [42] R. Sharma, P. Sircar, R.B. Pachori, Computer-aided diagnosis of epilepsy using bispectrum of EEG signals, in: Application of Biomedical Engineering in Neuroscience, Springer, 2019, pp. 197–220. [43] D.R. Brillinger, An introduction to polyspectra, Ann. Math. Stat. (1965) 1351–1374. [44] C.L. Nikias, M.R. Raghuveer, Bispectrum estimation: a digital signal processing framework, Proc. IEEE 75 (7) (1987) 869–891. [45] J. Fonoliosa, C. Nikias, Wigner higher order moment spectra: definition, properties, computation and application to transient signal analysis, IEEE Trans. Signal Process. 41 (1) (1993) 245. [46] B. Nakisa, M.N. Rastgoo, D. Tjondronegoro, V. Chandran, Evolutionary computation algorithms for feature selection of EEG-based emotion recognition using mobile sensors, Expert Syst. Appl. 93 (2018) 143–155. [47] J. Kennedy, R. Eberhart, Particle swarm optimization, Proceedings of IEEE International Conference on Neural Networks (ICNN’95) (1995). [48] R. Poli, Analysis of the publications on the applications of particle swarm optimisation, J. Artif. Evol. Appl. (2008). [49] F. Van Den Bergh, et al., An Analysis of Particle Swarm Optimizers, Ph.D. Thesis, University of Pretoria South Africa, 2001. [50] M.H. Dunham, Data Mining: Introductory and Advanced Topics, Prentice Hall PTR, Upper Saddle River, NJ, 2002. [51] Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural Netw. 5 (2) (1994) 157–166. [52] Colah, Understanding LSTM Networks, 2015 http://colah.github.io/posts/ 2015-08-Understanding-LSTMs/. [53] F.A. Gers, N.N. Schraudolph, J. Schmidhuber, Learning precise timing with LSTM recurrent networks, J. Mach. Learn. Res. 3 (2002) 115–143. [54] Z.C. Lipton, J. Berkowitz, C. Elkan, A Critical Review of Recurrent Neural Networks for Sequence Learning, 2015 (arXiv preprint) arXiv:1506.00019.