A generalized least-squares approach regularized with graph embedding for dimensionality reduction

A generalized least-squares approach regularized with graph embedding for dimensionality reduction

Pattern Recognition 98 (2019) 107023 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/patcog ...

1MB Sizes 0 Downloads 5 Views

Pattern Recognition 98 (2019) 107023

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/patcog

A generalized least-squares approach regularized with graph embedding for dimensionality reduction Xiang-Jun Shen a,∗, Si-Xing Liu a, Bing-Kun Bao b, Chun-Hong Pan c, Zheng-Jun Zha d,∗, Jianping Fan e a

School of Computer Science and Communication Engineering, JiangSu University, JiangSu, 212013 China Nanjing University of Posts and Telecommunications, JiangSu, China National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing 100190, China d School of Information Science and Technology, University of Science and Technology of China, Hefei, China e Department of Computer Science, University of North Carolina at Charlotte, NC 28223, USA b c

a r t i c l e

i n f o

Article history: Received 19 April 2017 Revised 9 August 2019 Accepted 26 August 2019 Available online 5 September 2019 Keywords: Dimensionality reduction Graph embedding Subspace learning Least-squares

a b s t r a c t In current graph embedding methods, low dimensional projections are obtained by preserving either global geometrical structure of data or local geometrical structure of data. In this paper, the PCA (Principal Component Analysis) idea of minimizing least-squares reconstruction errors is regularized with graph embedding, to unify various local manifold embedding methods within a generalized framework to keep global and local low dimensional subspace. Different from the well-known PCA method, our proposed generalized least-squares approach considers data distributions together with an instance penalty in each data point. In this way, PCA is viewed as a special instance of our proposed generalized least squares framework for preserving global projections. Applying a regulation of graph embedding, we can obtain projection that preserves both intrinsic geometrical structure and global structure of data. From the experimental results on a variety of face and handwritten digit recognition, our proposed method has advantage of superior performances in keeping lower dimensional subspaces and higher classification results than state-of-the-art graph embedding methods. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction Manifold learning has been applied in many machine learning and pattern recognition tasks, such as but not limited to, matrix decomposition [1], linear discriminant analysis [2], gait recognition [3], as well as image retrieval [4]. A manifold structure is modeled by a nearest-neighbor graph, for each data point and its neighbors to lie close to a locally linear patch of the manifold, which preserves the local structure of the data space [5,6]. When a graph Laplacian operator is connected to a Laplace-Beltrami operator on a hidden Riemannian manifold, the graph Laplacian is successfully applied in manifold-motivated studies for measuring the smoothness of loss functions [7]. Therefore the so-called “curse of dimensionality” is alleviated by mapping the high dimensional data to a compact low-dimensional subspace. In the recent search studies, there have been a number of proposed methods which can be classified as Dimensionality Reduction (DR) technologies. These methods can further be clas-



Corresponding authors. E-mail addresses: [email protected] (X.-J. Shen), [email protected] (Z.-J. Zha).

https://doi.org/10.1016/j.patcog.2019.107023 0031-3203/© 2019 Elsevier Ltd. All rights reserved.

sified into two (2) subcategories. One of such subcategories is linear manifold learning technique; with representative methods like Principal Component Analysis (PCA) [8], Linear Discriminant Analysis (LDA) [9], Locality Preserving Projections (LPP) [10,11], Neighborhood Preserving Embedding (NPE) [12], Neighborhood Preserving Projections (NPP) [13], Regularized Coplanar Discriminant Analysis (RCDA) [14], and the Sparsity Preserving Projection (SPP) [15]. Among them, PCA is the most famous technique which has been applied in the areas of science and engineering. It seeks a global low dimensionality subspace of data by minimizing the reconstruction error between the projected and original data points. The second subcategory is nonlinear manifold learning techniques. These include; Locally Linear Embedding (LLE) [16], ISOMAP [17], Laplacian Eigenmaps (LE) [18], Hessian Eigenmaps (HE) [19] and Local Tangent Space Alignment (LTSA) [20]. These are noted to be the most popular nonlinear manifold learning techniques. For example, LLE uses not only linear coefficients to represent the local geometry, but in addition seeks a low-dimensional subspace for reconstruction. Also, kernel tricks have been widely used in this subcategory, by applying kernelization to the existing linear methods, as can be seen in methods such as Kernel

2

X.-J. Shen, S.-X. Liu and B.-K. Bao et al. / Pattern Recognition 98 (2019) 107023

PCA(KPCA) [21,22], Kernel LDA (KLDA) [23,24], Kernel LPP (KLPP) [25], and Kernel NPE(KNPE) [26]. It is worth noting that, all the aforementioned linear and nonlinear DR algorithms can be unified as a graph embedding framework [7] or as a patch alignment framework [27]. As pointed out by Yan et al. [7], algorithms such as PCA, LDA, LPP, LLE, LE, ISOMAP, as well as their corresponding kernel and tensor based algorithms, could all be reformulated within a common framework. The differences among dimensionality reduction methods in graph embedding can be seen in the design of their intrinsic and penalty graphs and the types of embedding. Additionally, sparse coding [28,29], and deep learning [30–32] techniques have been introduced in graph embedding to learn optimal low dimensional subspace. Notwithstanding the fact that, graph embedding achieved so much progress in dimensionality reduction, there still exists a number of concerns on how to improve such DR techniques. (1) There are still issues on how to add orthogonality constraints to graph embedding methods. As Kokiopoulou and Saad [33] pointed out, it is more effective to add an orthogonality constraint on the projection directions for preserving the intrinsic geometrical structure of data. For example, Cai et al. [34] in their paper, Orthogonal LPP, added the orthogonality constraint to produce orthogonal basis functions. They realized that, making use of orthogonality constraints has the power of keeping more structures of locality and discrimination than LPP. Also in the same vein, Liu et al. [35] proposed orthogonal NPE (ONPE), which also requires its basis functions to be orthogonal. (2) Another issue of great concern is about achieving DR while keeping a balance between keeping local and global structures of data. Graph embedding methods construct graphs by local patches and rely on pair-wise distances; these methods are also sensitive to noise in data. Therefore, learning to keep a global structure in graph is a key problem. Coming to the rescue are techniques known as, Low-Rank Representation (LRR) methods [36–39] which are being used for learning both global and local structures of data. These methods are developed from PCA, by introducing the nuclear norm to recover the lowest rank representation of data. In [40], they combined low rank representation and graph embedding, and referred to it as Low-Rank Preserving Projections (LRPP), to keep the global structure of data during a dimensionality reduction process. In addition to LRR methods, graph ranking [41] was proposed to keep global and local structures among different relevant degree sets. Also, to overcome the lack of global structures in single-view graph embedding methods, another technique called multi-view graph embedding [7,42–44] was introduced.

(1) The technique proposed in this paper seeks a generalized orthogonality constraint based on the PCA idea of minimizing least-squares reconstruction errors. It restrains orthogonality on data while inducing a penalty factor to scale the influence of each data point. It is more effective than other traditional DR methods, since it considers projection directions by preserving both the intrinsic geometry and global structure of data. (2) The proposed generalized least-squares approach shares both advantages of Dimensionality Reduction (DR) and least-squares reconstruction error. By regulating the reconstruction error with graph embedding methods, the “out-of-sample” problem [46] in graph embedding is naturally resolved. Also, our proposed method can achieve a balance between keeping global structure by data reconstruction technique and local structure by graph embedding technique. (3) Our proposed framework can easily be extended to supervised and semi-supervised scenarios on the existing DR frameworks. Extensive experiments on a variety of face and handwritten digit recognitions reveal our proposed framework, exhibits more advantage of superior performances in keeping lower dimensional subspaces and higher classification results than the other DR methods, such as LPP, LSDA (Locality Sensitive Discriminant Analysis) [9], NPE and SPP. Finally, we structured the rest of this paper as follows. We have a presentation of related research works in linear DR in Section 2. Section 3 contains a formulation of our proposed methodology, generalized least-squares approach, follow immediately with experimental results in Section 4. Finally, we draw conclusions and future works in Section 5. 2. Related works on linear dimensionality reduction In this paper, we mainly gave a focus on linear approaches, due to their characteristics of simplicity, mathematical tractability, efficiency and effectiveness. Notwithstanding, our proposed framework can easily be extended to kernel and tensor versions, in the same vein as asserted by Yan et al. [46]. In order to make GLSRGE concise and easy for readers, we undertook review on only PCA, LPP, LSDA, NPE and SPP in this section. Also, presented in Table 1 are some important notations used in the rest of this paper. 2.1. Principal component analysis (PCA) PCA is a very popular technique [8] which seeks a global low dimensionality subspace of data by minimizing the reconstruction error using the optimal orthogonal projections under the leastsquares framework. The objective function of PCA is defined as follows.

min

wT w=1

Based on the above discussions, we develop a different method, which can keep global and local structures of data, as well as makes use of an orthogonality constraint to keep power of discrimination in low-dimensional subspace. Taking a critical look at the original idea of PCA and its low rank methods of LRR, they all made use of an idea of least-squares reconstruction error [45] to keep global structure of data. In our proposed method, we develop a PCA-Like idea in graph embedding by making the orthogonality constraint that, projections must be orthogonal to data, with consideration of data distribution and local structures of data. Otherwise, instance penalty in each data point is further considered in our model to scale the importance of every data point. By this way, a generalized PCA framework is proposed and regularized with graph embedding method for dimension reduction. We outline here, the main characteristics of our proposed framework.

n  

xi − wwT xi

2

(1)

i=1

Here {wi }di=1 is a subset of orthogonal directions in m and the set of data points {xi }ni=1 is zero-mean and m-dimensional column vectors. Eq. (1) can be written as

max WT W

WT W=I

(2)

Here  is the sample covariance matrix. The eigenvectors of  corresponding to the largest d eigenvalues, span the optimal subspace of PCA. Let us define W = [w1 , w2 , . . . , wd ] ∈ m×d . Thus, we have the d projections Y = WT X. 2.2. Locality preserving projections (LPP) Very similar to PCA, LPP [10] is an unsupervised graph embedding approach. And the objective function of LPP is defined

X.-J. Shen, S.-X. Liu and B.-K. Bao et al. / Pattern Recognition 98 (2019) 107023

3

Table 1 Some notations used in this paper. m, n X xi W H(LPP) L Nw Nb Ww Wb S N(NPE) C

Data dimensionality, the number of training points. Data matrix of size m × n. An m-dimensional column vector, represents a data point. Projection matrix. Weight coefficient matrix. Laplacian matrix. Subset of training data samples, Nw contains the neighbors having the same label as xi . Subset of training data samples, Nb contains the neighbors having different labels from xi . Weight matrix of within-class graph Gw . Weight matrix of between-class graph Gb . Sparse coefficient matrix. Weight matrix. Instance penalty matrix.

 entries are the column sums of Ww . i.e. Dw,ii = j Ww,i j . Similarly, the matrix Db is a diagonal matrix, with its entries as the  column sums of Wb . i.e. Db,ii = j Wb,i j . With imposition of a con-

as follows:

min w

n 1 T (w xi − wT x j )2 hi j = wT X(D − H )XT w 2 i=1

T

T

= w XLX w

(3)

max wT X(α Lb + (1 − α )Ww )XT w

Here H = (hi j )n×n is a similarity matrix defined as follows.



hi j =

w

exp(−xi − x j  )2 /t 0

s.t. wT XDw XT w = 1

And hi j is a weight coefficient which is controlled by a parameter t. D is a diagonal matrix with its entries being the row sums of H, and L = D − H is the Laplacian matrix. By imposing a constraint wT XDXT w = 1, LPP can be reduced as follows. T

T

min w

w XLX w

(4)

wT XDXT w

Comparatively, PCA seeks a global low dimensionality subspace by minimizing reconstruction error of data. While LPP uses a method that reconstructs each data point from its neighbors, or by choosing data points within some fixed radius, to keep the local geometry. 2.3. Locality sensitive discriminant analysis (LSDA)

min (1 − α ) w

(wT xi − wT x j )2 Ww,i j

α

n 

(wT xi − wT x j )2 Wb,i j

Wb,i j =

T

α wT X(Db − Wb )XT w

(5)



1, xi ∈ Nb (x j ) or x j ∈ Nb (xi ) , 0, otherwise

 Ww,i j =

min w

n 



T

w xi −

i=1

n 



T

Ni j w x j

2

(7)

j=1

Here x j are k neighbors of xi , Ni j is the optimal solution of the following equation. n 



xi −

i=1

n 



Ni j x j

2

(8)

j=1

To minimize Eq. (8), it deduces to the following equation:

min w

wT XMXT w wT XXT w

(9)

Here M = (I − N )T (I − N ). 2.5. Sparsity preserving projection (SPP)

w

= (1 − α )w X(Dw − Ww )X w T

Here

Similar to LPP, NPE also aims at preserving a local neighborhood structure of data. It minimizes the loss function of local approximation error [12].

min

i, j=1



2.4. Neighborhood preserving embedding (NPE)

The main idea of SPP is to preserve the sparse reconstructive relationship of data. The loss function of sparse reconstruction error is defined as follows.

i, j=1



(6)

Here Lb = Db − Wb .

(N ) = min

Linear discriminant analysis (LDA) is a popular dimensionality reduction method for studying the class relationship between data points. A major disadvantage of LDA is that it fails to discover the local geometric structure of the data manifold. Therefore, Locality Sensitive Discriminant Analysis (LSDA) [9] was introduced to maximize the margin between data points from different classes by preserving local structure of data. And the objective function of LSDA is defined as n 

straint wT XDw XT w = 1, from Eq. (5), we can deduce to the following model:

1, xi ∈ Nw (x j ) or x j ∈ Nw (xi ) . 0, otherwise

Nw (xi ) is a set that contains neighbors having the same label as xi , while Nb (xi ) is a set that contains neighbors having different labels. And the matrix Dw is a diagonal matrix. The diagonal

n 

(wT xi − wT Xsi )2

(10)

i=1

Here si = [si1 , . . . , si,i−1 , 0, si,i+1 , . . . , sin ]T is an n-dimensional vector of sparse coefficient construction from X. And the ith element is equal to zero (implying that the xi is removed from X). Then Eq. (10) is equal to the following:

max w

wT XSβ XT w wT XXT w

(11)

Here Sβ = S + ST − ST S. And S is the matrix whose elements are the n sparse coefficients {Si }ni=1 . From the above discussion, it is evident that SPP is also a local structure preserving method.

4

X.-J. Shen, S.-X. Liu and B.-K. Bao et al. / Pattern Recognition 98 (2019) 107023

Meanwhile, when C is an identity matrix and p = w, our proposed generalized least-squares method degenerates to the model which is defined in PCA objective function in Eq. (1). Furthermore, summarizing from Eqs. (4)–(6) and (9), the existing manifold learning methods can be generalized into the following form

min wT XGXT w w

s.t. wT XDXT w = 1

(14)

where G represents a generalized manifold learning graph matrix, which can be seen in SPP, LPP, NPE and LSDA, D represents a generalized discriminate matrix. To show the connections between these manifold methods and PCA, it is worthwhile to point out the covariance matrix constructions in these graphs. When we add the Lagrange multiplier to Eq. (14) with further derivatives, the following eigenvalue solution is obtained

XLXT w = λ(XDXT )

Fig. 1. Illustration of the instance penalty ci of an outlier Xi .

3. Generalized least-squares regulation in graph embedding (GLSRGE) Inspired by the original idea of PCA, which uses least-squares reconstruction error [45] to keep global structure of data, in this paper we present an expansion to this idea with the following proposed model.

min p

X − XppT 2 F

T

s.t. p p = 1

(12)

Here, p =XT w ∈ Rn is referred to as a generalization of the orthogonal vector w ∈ Rn with consideration to data distributions in its orthogonal directions. In addition to this inspiration of least squares, we further propose to include an instance penalty in each data point to obtain a generalized least squares framework. This framework is as follows:

min p

2 1 1 X − XC 2 ppT C 2 F

s.t. pT Cp = 1

(13) c1 ..

The diagonal matrix C = [

.

] injects an instance

cn penalty in each data point for adjusting local structure distribution among the training datasets. In this paper, we use the total sum distance between a data point and the rest of the data points, to measure the importance of data points among the training dataset. The greater the sum distance from a point, the more likely the point is an outlier or a noisy data point. And this instance penalty can be obtained from discriminate graph construction by supervised or unsupervised local manifold embedding methods, such as LPP [10], LSDA [9], NPE [12], SPP [15]. Fig. 1 gives an illustration of the motivation of building the instance penalty ci of data point Xi . The triangular point Xi refers to an outlier, and the instance penalty ci of this data point, which is generally build representing a total sum distance between data point xi and the rest of the data points. Intuitively, it can be inferred that the bigger the instance penalty ci for of a data point xi , the higher probability of an outlier in local structure distribution and its influence will be scaled down. Under this penalty factor, 1 2

1 2

the orthogonality constraint is changed to (C p )T (C p ) = 1. This local structure is then used in global least squares reconstruction



1

1

2

regulations. i.e., X − XC 2 ppT C 2 . F

(15)

We reformulate by multiplying both sides of Eq. (15) by 1

(D 2 XT )(XDXT )−1 together with algebraic transformations to obtain the following

(D 2 XT )(XDXT )−1 (X LX T )(XDXT )−1 (X D 2 )(D 2 X T )w = λ(D 2 X T )w (16) 1

1

1

1

1

1

where (D 2 XT )(XDXT )−1 (X LX T )(XDXT )−1 (X D 2 ) is a covariance matrix construction. Finally, combining Eqs. (13) and (14), we propose this PCALike method with graph embedding to achieve a trade-off between keeping global and local structures of data similarly to our previous work [47]. The method is formulated as follows.



1

1

2

min pT Gp + α X − XC 2 ppT C 2 p

s.t. pT Cp = 1

F

(17)

Here α is a regulation parameter, which controls the trade-off between the first (local manifold regularization) and the second (global reconstructive loss) terms. By applying this generalized least-squares method in graph embedding, we incorporate various graph embedding methods into a generalized form. By formulating the Lagrange of the above problem, it leads to the following generalized eigen-problem on p:



1

1



G − α C 2 XXT C 2 p = λCp

(18)

And finally we can obtain the projection vector w with least squares regression solutions. The process of our proposed method is shown in Algorithm 1. Algorithm 1 Generalized least-squares approach regularized with graph embedding (GLSRGE). Input: Input dataset X, and regulation parameter α . Construct graph affine matrix G from existing manifold learning methods. And construct instance penalty matrix C. 3: Obtain p from eqn.(18). 4: Compute w from eqn.(19). 5: Output: The projection vector w. 1:

2:

w = (XXT )−1 Xp

(19)

In the next section, extensive experiments are carried out on a variety of image and web text collections to verify classification performances in keeping lower dimensional subspaces, with other state-of-the-art DR methods, such as LPP, LSDA, NPE and SPP.

X.-J. Shen, S.-X. Liu and B.-K. Bao et al. / Pattern Recognition 98 (2019) 107023

5

Table 2 Classification accuracy of PCA, LPP and GLSRGE on two classes of a two-dimensional Gaussian distribution. Method

PCA

LPP

GLSRGE

Accuracy

0.9167

0.9208

0.9250

overlap at one side with some outliers to the left edge. Intuitively, the best discriminatory line appears vertical to these data distributions. The projection lines of PCA and LPP are influenced by the noisy data with a slight rotation, while GLSRGE is very close to vertical. This reflects two advantages of GLSRGE, that is, achieving a balance between global and local structures and weakening the influence of outlier data points. To obtain a quantitative comparison, we labeled these two classes of data points and randomly selected 60% of the data points for training and 40% for testing. The result is shown in Table 2. Fig. 2. The projection plane of PCA, LPP and GLSRGE on two classes of a twodimensional Gaussian distribution.

4. Experimental results In this section, to evaluate the effectiveness of our proposed Least-Squares Approach Regularized with Graph Embedding, in both unsupervised and supervised methods, we compared their performances with different graphs such as: SPP, LPP, NPE and LSDA under different public datasets. In our experiments, we first implemented the proposed model with its basic graph embedding methods and PCA on a toy dataset in Section 4.1. Then, to further demonstrate the advantages and robustness of the features of our method on different datasets, we made use of three face recognition datasets: AR1 , Extended Yale B2 and ORL3 , as well as two handwritten digit recognition datasets: MNIST4 and USPS.5 Their details are respectively discussed in Sections 4.2 and 4.3. 4.1. Toy data For this experiment, we compare the projection plane of PCA, LPP and our proposed method under the context of subspace segmentation. We construct two independent data distributions {Si }2i=1 to form two classes of data points, which are computed using a twodimensional Gaussian distribution. The mean and covariance of 0 0.9 0.5 {S1 } is set as [ ] and [ ] respectively, and that of {S2 } 0.9 0.5 0.4 0 0.9 −0.5 is set similarly as [ ] and [ ]. The distance is mea−0.9 −0.5 0.4 sured by ’Heat Kernel’ and the number of nearest neighbor is set as 5 in LPP. The parameters G and C in our proposed framework in Eq. (17), are constructed by LPP graph affine matrix. The regulation parameter α is set as 0.1. In each algorithm, the projection direction is obtained from the first principal component w1 of eigenvectors W to illustrate their discriminative abilities. In addition, the projection line of each algorithm is wT1 , which is perpendicular to their projection direction w. As shown in Fig. 2, the two classes of data points and the three projection lines obtained by PCA, LPP and GLSRGE are distinguished with different colors and shapes. The data distributions 1 2 3 4 5

http://www.cs.uiuc.edu/homes/dengcai2. http://www.uk.research.att.com/facedatabase.html. http://vision.uscd.edu/∼leekc/ExtYaleDatabase/ExtYaleB.html. http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html. http://www.cs.nyu.edu/∼roweis/data.html.

4.2. Face recognition 4.2.1. Data sets – AR1 : We used the subset of AR face database provided and preprocessed by Martinez [48]. This database contains 1400 face images of 100 persons, made up of 50 men and 50 women. Each person has 14 different images of varying illumination and expressions, with 165 × 120 resolutions for all images. In our experiments, and for convenient computation as in [15], we resized the images to 66 × 48, and rescaled the gray level values to [0, 1]. – Extended YaleB2 : The Extended YaleB database contains 2414 face images consisting of 38 persons. Each person has 64 images taken under varied laboratory lighting conditions, which were resized to a resolution of 32 × 32 for our experimentation. – ORL3 : This dataset contains ten different images, each of 40 distinct individuals taken at different times, under variable lighting, facial details and expressions, also with similar resolutions as in Extended YaleB above. 4.2.2. Parameter setting Each dataset was divided into two parts: training and testing. The Training data was randomly selected from each class as an incremental classification rate of 50%, 60% and 70%. The remaining percentages were used as testing data. Before each algorithm, we implemented pre-processing using PCA, with 0.98 contribution components to extract. For each configuration, we repeated this process 20 times independently using Simplest nearest neighbor (1-NN) classifier for classification. In setting parameters for each algorithm, we shall highlight two portions in Least-Squares Regulation in Graph Embedding: the graph construction models (supervised and unsupervised) and the Least-Squares approach. Parameter setting on graph construction models specifically refers to the graph type and its parameter settings. As mentioned above, four types such as SPP, LPP, NPE and LSDA were selected and their parameter settings are as follows: – KNN-graph (LPP, NPE, LSDA): Their samples xi and x j considered neighbors if x j is among the k nearest neighbors of xi or vice versa. In our experiments, in order to fairly compare the performance among the KNN-graph models, graphs were constructed with the same neighbors, i.e., H in LPP corresponds to Eq. (3), N in NPE corresponds to Eq. (7) as unsupervised constructions, Wb and Ww in LSDA corresponds to Eq. (6). LSDA is supervised in which class discriminate information is also used.

6

X.-J. Shen, S.-X. Liu and B.-K. Bao et al. / Pattern Recognition 98 (2019) 107023

Table 3 Classification accuracy on AR dataset. SPP Construct

LPP Construct

NPE Construct

LSDA Construct

SPP

LPP

NPE

LSDA

α

Our Method Unsupervised 0.1

Supervised 0.1

Our Method Unsupervised 0.1

Supervised 0.1

Our Method Unsupervised 0.1

Supervised 0.1

Our Method Unsupervised 0.1

Supervised 0.1

50%

Accuracy Dimension

0.8314 119

0.8486 104

0.8343 100

0.7886 138

0.8314 135

0.8363 130

0.8357 147

0.8414 139

0.8557 137

0.8257 95

0.8343 83

0.8471 91

60%

Accuracy Dimension

0.9037 125

0.9497 108

0.9397 104

0.8717 138

0.9247 136

0.9350 134

0.9583 139

0.9608 136

0.9628 134

0.9147 97

0.9527 88

0.9530 88

70%

Accuracy Dimension

0.9303 123

0.9710 108

0.9598 111

0.9198 141

0.9378 139

0.9470 136

0.9725 143

0.9758 139

0.9703 138

0.9358 98

0.9788 85

0.9717 97

Table 4 Classification accuracy on ORL dataset. SPP Construct

LPP Construct

NPE Construct

LSDA Construct

SPP

LPP

NPE

LSDA

α

Our Method Unsupervised 0.1

Supervised 0.1

Our Method Unsupervised -10

Supervised -10

Our Method Unsupervised 3

Supervised 0.1

Our Method Unsupervised 3

Supervised -0.1

50%

Accuracy Dimension

0.8647 83

0.8758 80

0.8770 74

0.8534 108

0.8724 128

0.8875 128

0.8485 129

0.8569 128

0.8675 128

0.8449 37

0.8876 46

0.8706 46

60%

Accuracy Dimension

0.8875 85

0.9056 111

0.8938 86

0.8637 149

0.8869 91

0.9206 148

0.8688 147

0.8714 149

0.8725 148

0.8794 53

0.8994 55

0.8875 57

70%

Accuracy Dimension

0.9167 80

0.9250 93

0.9253 99

0.8967 165

0.9200 92

0.9333 146

0.8917 53

0.9042 49

0.9083 87

0.9225 82

0.9258 81

0.9250 43

Table 5 Classification accuracy on Extended YaleB dataset. SPP Construct

LPP Construct

NPE Construct

LSDA Construct

SPP

LPP

NPE

LSDA

α

Our Method Unsupervised 0.1

Supervised 0.2

Our Method Unsupervised 0.1

Supervised 0.1

Our Method Unsupervised 0.2

Supervised 0.1

Our Method Unsupervised 0.2

Supervised 0.1

50%

Accuracy Dimension

0.9166 185

0.9213 187

0.9207 182

0.8993 193

0.9037 188

0.9039 190

0.9285 182

0.9320 186

0.9376 179

0.9382 124

0.9404 155

0.9391 159

60%

Accuracy Dimension

0.9283 181

0.9457 186

0.9481 184

0.9094 200

0.9490 186

0.9492 183

0.9504 192

0.9509 181

0.9452 185

0.9613 117

0.9649 158

0.9566 154

70%

Accuracy Dimension

0.9450 186

0.9468 178

0.9491 196

0.9364 207

0.9584 194

0.9570 192

0.9591 190

0.9603 195

0.9533 194

0.9696 87

0.9708 147

0.9685 154

Table 6 Classification accuracy on MNIST dataset. SPP Construct SPP

α

LPP Construct

Our Method

LPP

Unsupervised −0.8

Supervised −0.8

NPE Construct

Our Method

NPE

Unsupervised −0.8

Supervised −0.8

LSDA Construct

Our Method

LSDA

Unsupervised −0.9

Supervised −0.9

Our Method Unsupervised −0.9

Supervised −0.9

50%

Accuracy Dimension

0.8427 56

0.9050 44

0.9082 52

0.8709 42

0.9064 51

0.9086 46

0.8405 239

0.9082 36

0.8991 35

0.8850 44

0.9077 39

0.9082 46

60%

Accuracy Dimension

0.8470 49

0.9065 50

0.9113 49

0.8878 45

0.9076 46

0.9119 49

0.8465 243

0.9119 51

0.9065 49

0.8935 39

0.9054 49

0.9135 35

70%

Accuracy Dimension

0.8675 74

0.9161 61

0.9215 69

0.8966 50

0.9154 48

0.9203 49

0.8638 246

0.9228 75

0.9161 61

0.9081 31

0.9154 59

0.9233 33

– SPP-graph: For constructing the adjacency weight matrix, we employed the idea of linear label propagation as in [15]. The strategy we used is MSR, which is unsupervised and parameterfree. There are two main parameters in GLSRGE: – C: The instance penalty, this corresponds to the type of LeastSquares approach which can be constructed in two kinds, unsupervised and supervised. In our experiment, we selected LPP constraint model as unsupervised approach and LSDA con-

straint model as supervised approach. The strategy for the parameter selection is similar to KNN-graph mentioned above. – α : We appropriately selected values for α , for each method through an incremental iterative process which are indicated under the respective datasets in Tables 3–7. 4.2.3. Result analysis Recorded in Tables 3–6 are the mean of classification accuracies and reduced dimension for each algorithm on AR, Extended YaleB and ORL face recognition datasets in categories of 50%, 60% and

X.-J. Shen, S.-X. Liu and B.-K. Bao et al. / Pattern Recognition 98 (2019) 107023

7

Table 7 Classification accuracy on USPS dataset. SPP Construct

LPP Construct

NPE Construct

LSDA Construct

SPP

LPP

NPE

LSDA

α

Our Method Unsupervised 0.1

Supervised 0.1

Our Method Unsupervised −0.4

Supervised −0.4

Our Method Unsupervised 0.9

Supervised 0.9

Our Method Unsupervised 40

Supervised 40

50%

Accuracy Dimension

0.9528 51

0.9644 49

0.9647 42

0.9636 51

0.9642 51

0.9651 47

0.9630 52

0.9647 51

0.9649 44

0.9639 46

0.9644 51

0.9647 45

60%

Accuracy Dimension

0.9584 51

0.9673 49

0.9663 51

0.9669 51

0.9675 51

0.9675 51

0.9669 52

0.9667 51

0.9663 51

0.9679 51

0.9669 51

0.9663 48

70%

Accuracy Dimension

0.9654 50

0.9728 50

0.9726 46

0.9714 45

0.9723 50

0.9725 42

0.9682 51

0.9721 50

0.9728 48

0.9728 49

0.9728 50

0.9730 47

Fig. 3. The classification results obtained from competing graph construction methods by (a) SPP (b) LPP (c) NPE (d) LSDA, on AR dataset (the 70% category).

70% samples per class as training. In the tables, each Least-Squares approach with the same type of graph embedding is grouped into one type of graph construction model. In each type, the original graph is compared with the Least-Squares approach (supervised and unsupervised regulations). Additionally, under each group the highest mean classification accuracy results are bolded. Fig. 3 demonstrates the trend in classification accuracy with a change of reduced dimension on each dataset. From Table 3, the classification accuracy of our GLSRGE method is generally higher than that of its original in all samples of AR dataset. In the SPP, NPE and LSDA models, the unsupervised regu-

lations have proven to be more effective with more accuracy rates of 1 ∼ 4% higher in SPP and LSDA than their originals, whilst in NPE it is about 1%. In the LPP model, both supervised as well as unsupervised regulations are outstanding in performance; the former has better performance with classification accuracy improvement of an average of about 5% increase in the 50% to 60% training samples and about 2% improvement in the 70% training sample than the 60%. In addition, from Fig. 4, our GLSRGE classification results obtained from competing graph construction methods have shown similar trends. Particularly in NPE, all the three graphs almost

8

X.-J. Shen, S.-X. Liu and B.-K. Bao et al. / Pattern Recognition 98 (2019) 107023

Fig. 4. Six basis vectors of graph construction calculated from the training set of ORL database.

attained the same level at some point. In SPP, LPP and LSDA, the graphs of GLSRGE can be seen in almost all the cases leading with the highest accuracies. The lower dimension which corresponds to the highest accuracy in each algorithm reveals the efficient convergence property of the proposed GLSRGE method. Table 3 contains classification performance on ORL dataset. It can be seen that, our unsupervised regulation performs slightly higher than the others in SPP and NPE models. The effectiveness of both unsupervised and supervised regulations is very close in LSDA of about a difference of 1% accuracy. Also, both had minor improved performance as compared to their original graphs

which achieve about 1% improvement. The ORL dataset has the least samples among the other datasets in our experiments, where our proposed method proves it has more advantage on small samples datasets. Illustrated in Fig. 4, are the first 6 eigenvectors (or, eigen-faces) of the transformed matrix resulting from each method in the training set of ORL database. These eigen-pictures were then reshaped into a 32 × 32 matrix in conformity to the image size of the original objects. These faces were projected onto different graphs reconstruction discriminant subspaces spanned by the eigenvectors. Observing Fig. 4 critically, we can see that the eigen-pictures of the original SPP, LPP, NPE and LSDA are noisy because of much preservation of discriminant information with their local structure preserving nature. The eigen-faces of GLSRGE are generally similar to their originals and it is easier to distinguish the facial outline in GLSRGE. Especially, the face figure is distinct in our LPP construction model. More details are reflection of more local information; while clearer outline is a reflection of global structure information. In GLSRGE, the graph of local preserving projections can capture the local relationships of the data, and the Least-Squares regulation keeps the global structure of the data. Table 5 contains the experimental results on Extended YaleB dataset. Following a similar analysis as in Tables 3 and 4, here in Table 5, there is a consistent improvement in performance for

Fig. 5. The classification results obtained from competing graph construction methods by (a) SPP (b) LPP (c) NPE (d) LSDA, on MNIST dataset (the 70% category).

X.-J. Shen, S.-X. Liu and B.-K. Bao et al. / Pattern Recognition 98 (2019) 107023

9

Fig. 6. Ten basis vectors of graph construction calculated from the training set of USPS database.

unsupervised regulations in SPP and NPE, and for supervised regulation in LPP. In general, the performance of GLSRGE has a heavy dependence on the choice of graph type. Considering any of the graphs, the classification accuracy of GLSRGE is always higher than that of the original, which suggests that GLSRGE is effective for depicting data structure. With supervised and unsupervised regulations, GLSRGE acted with great adaptability to the various datasets. In addition, with the increase in the number of training samples, the classification accuracy in GLSRGE increases more greatly, this could be interpreted as the effective utilization of global and local structure information by GLSRGE. 4.3. Hand-written digit recognition 4.3.1. Data sets – MNIST4 : This database contains 28 × 28 grayscale images of 10 hand-written digits from 0 to 9. We used the subset of MNIST database with 40 0 0 samples, which are randomly and uniformly selected from the 70,0 0 0 samples in the original MNIST database. – USPS5 : The USPS database contains 11,0 0 0 hand-written digit images, composed of 10 categories from 0 to 9 with size of 16 × 16. We maintained a similar strategy for parameter setting and chart layout in hand-written digit datasets as done in Section 4.2. 4.3.2. Result analysis As presented in Table 6, on MNIST dataset, supervised regulations significantly outperform the unsupervised and original within each type of graph model. The accuracies of our supervised regulation in each classification rate are averagely about 2 ∼ 3% higher than the original, while less than 1% higher than the unsupervised regulations throughout all the sampled categorized experiments. And Fig. 5 shows the classification results on MNIST dataset obtained from competing graph construction methods in 4 graphs construction models. It also illustrates that GLSRGE can adequately utilize global and local structure information for achieving best classification performance in MNIST dataset. From Table 7, which contains the classification results on MNIST dataset, there is another interesting observation. When the training sample size is large enough to sufficiently characterize the data distribution as in the case of USPS dataset, all the methods conducted in the experiment show an approximately high effectiveness on classification accuracy. The supervised regulation achieves little improvement with full utilization of discriminate information in data distribution. In Fig. 6, we illustrate the first 10 eigenvectors (or, eigen pictures) of the number ‘8’ in USPS dataset. These eigenvectors are obtained by both the original methods and our supervised regulations of GLSRGE. Similar to Fig. 4, the eigen-pictures are reshaped into a square matrix just as the original object image size. It can be seen that GLSRGE can capture not only the local structure of

the data, but also the global structure of the data, since its eigenpictures have a clearer depicting outline than those of the original methods. 5. Conclusions and future work In this paper, we developed a different graph embedding method, which can keep global and local structures of data, as well as makes use of an orthogonality constraint to keep power of discrimination in low-dimensional subspace. The PCA-Like idea with graph embedding is applied in our work by making the orthogonality constraint that, projections must be orthogonal to data, with consideration to distributions and local structures of data. Thus, a generalized least-squares approach with graph embedding is proposed for dimension reduction. Supervised and unsupervised graph embeddings are both adopted in our proposed least-squares approach. With the adoption of graph embedding regularization and the penalty construction in this framework, our method can achieve a balance between local and global structures. The experimental results on a variety of face and handwritten digit recognitions, demonstrate that, our proposed method has advantage of superior performances in keeping lower dimensional subspaces and higher classification results than the other state-of-the-art graph embedding methods. The performance of our model is affected by the penalty weight type. Thus in our future work, we will learn the construction of this scaling factor from the instance space. Also we will extend our work to low rank representation to further improve its performance. Acknowledgements This work was funded in part by the National Natural Science Foundation of China (No.61572240, 61622211, 61620106009, 61572503, 61872424, 61930 0 0388) and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (No.20160 0 0 05). References [1] D. Cai, X. He, J. Han, T.S. Huang, Graph regularized nonnegative matrix factorization for data representation, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (2011) 1548–1560. [2] X. Shu, Y. Gao, H. Lu, Efficient linear discriminant analysis with locality preserving for face recognition, Pattern Recognit. 45 (5) (2012) 1892–1898. [3] H. Yan, J. Lu, X. Zhou, Activity-based person identification using discriminative sparse projections and orthogonal ensemble metric learning, in: European Conference on Computer Vision, Springer, 2014, pp. 809–824. [4] Y. Yang, Y. Li, L. Pan, N. Li, G. He, Image retrieval based on augmented relational graph representation, Appl. Intell. 38 (4) (2013) 489–501. [5] J. Bo, C. Ding, B. Luo, J. Bo, C. Ding, B. Luo, Robust data representation using locally linear embedding guided PCA, Neurocomputing 275 (2018) 523–532. [6] J. Wang, R.K.W. Wong, T.C.M. Lee, Locally linear embedding with additive noise, Pattern Recognit. Lett. 123 (2019), doi:10.1016/j.patrec.2019.02.030. [7] L. Tao, H.H. Ip, Y. Wang, X. Shu, Low rank approximation with sparse integration of multiple manifolds for data representation, Appl. Intell. 42 (3) (2015) 430–446.

10

X.-J. Shen, S.-X. Liu and B.-K. Bao et al. / Pattern Recognition 98 (2019) 107023

[8] A. Dutta, F. Hanzely, P. Richtarik, A nonconvex projection method for robust PCA, in: Thirty-Third AAAI Conference on Artificial Intelligence, 2019. [9] D. Cai, X. He, K. Zhou, J. Han, H. Bao, Locality sensitive discriminant analysis., in: IJCAI, vol. 2007, 2007, pp. 1713–1726. [10] X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using Laplacianfaces, IEEE Trans. Pattern Anal. Mach. Intell. 27 (3) (2005) 328–340. [11] D. Relan, L. Ballerini, E. Trucco, T. Macgillivray, Using orthogonal locality preserving projections to find dominant features for classifying retinal blood vessels, Multimed. Tools Appl. (1) (2018) 1–21. [12] X. He, D. Cai, S. Yan, H. Zhang, Neighborhood preserving embedding, in: ICCV, 2005, pp. 1208–1213. [13] Y. Pang, L. Zhang, Z. Liu, N. Yu, H. Li, Neighborhood preserving projections (NPP): a novel linear dimension reduction method, in: International Conference on Intelligent Computing, Springer, 2005, pp. 117–125. [14] K.K. Huang, D.Q. Dai, C.X. Ren, Regularized coplanar discriminant analysis for dimensionality reduction, Pattern Recognit. 62 (Complete) (2017) 87–98. [15] L. Qiao, S. Chen, X. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recognit. 43 (1) (2010) 331–341. [16] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500) (2000) 2323–2326. [17] J.B. Tenenbaum, V.D. Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (5500) (2000) 2319–2323. [18] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. 15 (6) (2003) 1373–1396. [19] D.L. Donoho, C. Grimes, Hessian eigenmaps: locally linear embedding techniques for high-dimensional data, Proc. Natl. Acad. Sci. 100 (10) (2003) 5591–5596. [20] Z. Zhang, H. Zha, Nonlinear dimension reduction via local tangent space alignment, in: International Conference on Intelligent Data Engineering and Automated Learning, Springer, 2003, pp. 477–481. [21] J. Fan, T. Chow, Exactly robust kernel principal component analysis, IEEE Trans. Neural Netw. Learn. Syst. PP (99) (2018) 1–13. [22] J. Li, H. Gao, Sparse data-dependent kernel principal component analysis based on least squares support vector machine for feature extraction and recognition, Neural Comput. Appl. 21 (8) (2012) 1971–1980. [23] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, K. Mullers, Fisher discriminant analysis with kernels, in: Neural Networks for Signal Processing IX, 1999, pp. 41–48. [24] B. Zhang, Y. Qiao, Face recognition based on gradient gabor feature and efficient kernel fisher analysis, Neural Comput. Appl. 19 (4) (2010) 617–623. [25] F. He, C. Wang, S. Fan, Nonlinear fault detection of batch processes based on functional kernel locality preserving projections, Chemom. Intell. Lab. Syst. 183 (2018) 79–89. [26] G. Wang, N. Shi, Collaborative representation-based discriminant neighborhood projections for face recognition, Neural Comput. Appl. (1) (2019) 1–18. [27] T. Zhang, D. Tao, X. Li, J. Yang, Patch alignment for dimensionality reduction, IEEE Trans. Knowl. Data Eng. 21 (9) (2009) 1299–1313. [28] F. Yin, L. Jiao, F. Shang, S. Wang, B. Hou, Fast fisher sparsity preserving projections, Neural Comput. Appl. 23 (3–4) (2013) 691–705. [29] B. Wang, Z. Tu, Sparse subspace denoising for image manifolds, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 468–475. [30] J. Song, L. Gao, F. Zou, Y. Yan, N. Sebe, Deep and fast: deep learning hashing with semi-supervised graph construction, Image Vis. Comput. 55 (2016) 101–108. [31] S. Cao, W. Lu, Q. Xu, Deep neural networks for learning graph representations, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 1145–1152. [32] J. Bruna, W. Zaremba, A. Szlam, Y. LeCun, Spectral networks and locally connected networks on graphs, in: International Conference on Learning Representations (ICLR), 2014. [33] E. Kokiopoulou, Y. Saad, Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique, IEEE Trans. Pattern Anal. Mach. Intell. 29 (12) (2007) 2143–2156. [34] D. Cai, X. He, J. Han, H. Zhang, Orthogonal Laplacianfaces for face recognition, IEEE Trans. Image Process. 15 (11) (2006) 3608–3614. [35] X. Liu, J. Yin, Z. Feng, J. Dong, L. Wang, Orthogonal neighborhood preserving embedding for face recognition, in: IEEE International Conference On Image Processing, 2007. [36] X. Wei, X. Huang, J. Silva, S. Emrani, A. Chaudhuri, Online robust principal component analysis with change point detection, IEEE Trans. Multimed. (2019), doi:10.1109/TMM.2019.2923097. [37] E.J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58 (3) (2011) 1–37. [38] G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank representation, in: Proceedings of the 27th International Conference on Machine Learning, 2010, pp. 663–670.

[39] L. Zhuang, H. Gao, Z. Lin, Y. Ma, X. Zhang, N. Yu, Non-negative low rank and sparse graph for semi-supervised learning, in: IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2328–2335. [40] Y. Lu, Z. Lai, Y. Xu, X. Li, D. Zhang, C. Yuan, Low-rank preserving projections, IEEE Trans. Cybern. 46 (2016) 1900–1913. [41] Y. Pang, Z. Ji, P. Jing, X. Li, Ranking graph embedding for learning to rerank, IEEE Trans. Neural Netw. Learn.Syst. 24 (8) (2013) 1292–1303. [42] D. Zhai, H. Chang, S. Shan, X. Chen, W. Gao, Multiview metric learning with global consistency and local smoothness, ACM Trans. Intell. Syst. Technol. 3 (3) (2012) 1–22. [43] L. Zhang, Q. Zhang, L. Zhang, D. Tao, X. Huang, B. Du, Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding, Pattern Recognit. 48 (10) (2015) 3102–3112. [44] Y. Wang, W. Zhang, L. Wu, X. Lin, M. Fang, S. Pan, Iterative views agreement: an iterative low-rank based structured optimization method to multi-view spectral clustering, in: IJCAI, 2016, pp. 2153–2159. [45] F.D. la Torre, A least-squares framework for component analysis, IEEE Trans. Pattern Anal. Mach. Intell. 34 (6) (2012) 1041–1055. [46] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, S. Lin, Graph embedding and extensions: a general framework for dimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007) 40–51. [47] T.A. Abeo, X. jun Shen, E.D. Ganaa, Q. Zhu, B.-K. Bao, Z.-J. Zha, Manifold alignment via global and local structures preserving PCA framework, IEEE Access 7 (2019) 38123–38134. [48] A.M. Martínez, A.C. Kak, PCA versus LDA, IEEE Trans. Pattern Anal. Mach. Intell. 23 (2) (2001) 228–233. Dr Xiang-Jun Shen received the M.S. degree in computer science in 2003 from the Jiangsu University, China and the Ph.D. degree in Automation in 2006 from the University of Science and Technology, China. He is currently a Professor with the Department of Software Engineering, School of Computer Science and Communication Engineering, Jiangsu University. His research interests in the areas of pattern recognition, multimedia retrieval and machine learning. He is a member of the CCF. Si-Xing Liu received his B.S. degree in communication engineering from Nangjing University of Posts and Telecommunications in 2015 and is currently pursuing master’s degree in JiangSu University. His research interests are in the area of dimensional reduction method with emphasis on mathematical modeling, simulation and performance analysis. Dr. Bing-Kun Bao received the Ph.D. degree in control theory and control application from the University of Science and Technology of China, Hefei, China, in 2009. From 2009 to 2011, she was a Research Engineer of Electrical and Computer Engineering with the National University of Singapore, Singapore. She is currently a Professor with Nanjing University of Posts and Telecommunications, JiangSu, China,. Dr. Chunhong Pan received the B.S. degree in automatic control from Tsinghua University, Beijing, China, in 1987, the M.S. degree from the Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Beijing, in 1990, and the Ph.D. degree in pattern recognition and intelligent system from the Institute of Automation, Chinese Academy of Sciences, Beijing, in 20 0 0. He is currently a Professor with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. His research interests include computer vision, image processing, computer graphics, and remote sensing. Dr. Zhen-Jun Zha received the B.E. degree in automation and Ph.D. degree in pattern recognition and intelligent system from the University of Science and Technology of China (USTC), Hefei, China, in 2004 and 2009, respectively. His current research interests include multimedia content analysis and retrieval, computer vision, as well as social media analysis. He has published over 100 book chapters, international journal and conference papers in these areas, including TMM, TCSVT, TOMCCAP, TIP, ACM MM, CVPR, SIGIR, etc. Dr. Zha received the Best Paper Award in the ACM International Conference on Multimedia (ACM MM) in 2009, and the Best Demo Runner-Up award in ACM MM 2012. He has served as Program Co-Chair, Special Session Co-Chair, Workshop Co-Chair, Demo Co-Chair, and Area chair for various prestigious multimedia conferences. He has been serving as reviewers for more than 30 major international journals, and serving as Program Committee for many prestigious conferences. He is a member of the IEEE and the ACM. Dr. Jianping Fan is a professor at University of North Carolina at Charlotte. He received his Ph.D. degree in computer science and technology from Shanghai Institute of Optics and Fine Mechanics of CAS. His research interests include semantic image and video analysis, computer vision, cross-media analysis, and statistical machine learning.