Multi-modal dimensionality reduction using effective distance

Multi-modal dimensionality reduction using effective distance

Accepted Manuscript Multi-modal Dimensionality Reduction using Effective Distance Dan Zhang , Qi Zhu , Daoqiang Zhang PII: DOI: Reference: S0925-231...

3MB Sizes 4 Downloads 51 Views

Accepted Manuscript

Multi-modal Dimensionality Reduction using Effective Distance Dan Zhang , Qi Zhu , Daoqiang Zhang PII: DOI: Reference:

S0925-2312(17)30246-1 10.1016/j.neucom.2016.07.075 NEUCOM 18042

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

23 January 2016 2 July 2016 5 July 2016

Please cite this article as: Dan Zhang , Qi Zhu , Daoqiang Zhang , Multi-modal Dimensionality Reduction using Effective Distance, Neurocomputing (2017), doi: 10.1016/j.neucom.2016.07.075

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Multi-modal Dimensionality Reduction using Effective Distance Dan Zhang, Qi Zhu, and Daoqiang Zhang * College of Computer Science and Technology, Nanjing University of Aeronautics & Astronautics, Nanjing 211106, China *

CR IP T

E-mail Address: [email protected]

Abstract: By providing complementary information, multi-modal data is usually helpful for obtaining good performance in the identification or classification tasks. As

AN US

an important way to deal with high-dimensional features in multi-modal data, multi-modal dimensionality reduction has caused extensive concern in the machine learning domain. Most of the existing dimensionality reduction methods adopt a similarity matrix to capture the structure of data, and then this matrix is computed by

M

using conventional distances (e.g. Euclidean distance) in most cases. However, Euclidean distance can only model the static structure of data, and the intrinsic

ED

dynamic structure information is usually ignored. For overcoming this problem, we develop two novel dimensionality reduction methods based on effective distance for

PT

multi-modal data, by using a probabilistically motivated effective distance rather than conventional Euclidean distance. Specifically, we first develop two approaches to

CE

compute the effective distance. Then, we propose two novel effective distance-based dimensionality reduction methods, including Effective Distance-based Locality

AC

Preserving Projections (EDLPP) and Effective Distance-based Sparsity Preserving Projections (EDSPP). Experiments on varied data sets from UCI machine learning repository and the Alzheimer’s disease Neuroimaging Initiative (ADNI) database demonstrate that the effective distance-based dimensionality reduction methods are superior to other state-of-art methods which employ only Euclidean distance.

Keywords: Multi-modal; dimensionality reduction; dynamic structure; effective distance.

ACCEPTED MANUSCRIPT

1. Introduction In machine learning domain, learning using multi-modal data has attracted much attention due to its superior performance [1-5]. Multi-modal data in comparison with

CR IP T

single modal data can provide more complementary information, which could help boost the learning performance for various learning tasks [6-8]. For instance, for the diagnosis of disease, it is difficult to discover biomarkers related to the disease based on the single modal image. As a result, some methods of fusing multi-modal image have been proposed to improve the diagnosis accuracy [9-11]. For example, Zhang et

AN US

al. [10] proposed a multimodal classification method, where three modal images are combined, for the classification of Alzheimer’s disease (AD) and its early stage (i.e., mild cognitive impairment, MCI). However, the challenging small-sample-size problem often occurs in the learning process using multi-modal data, because image

M

data usually has high dimensional features while the number of subjects is limited [12-14]. To address this problem, dimensionality reduction is proved to be an effective

features [7, 15-17].

ED

approach, which boosts the learning performance by decreasing the number of

PT

In the literatures, a lot of dimensionality reduction methods are presented for dealing with multi-modal high-dimensional data [18-20].

As one of the most widely

CE

used approaches, canonical correlation analysis (CCA) aims to find two linear projections for each view by maximizing the correlations between new variables in a

AC

low dimensional subspace [21, 22]. In recent years, many CCA variants are proposed for dimensionality reduction [15-21]. For two-view dimensionality reduction, RCCA (i.e., regularized CCA) has been introduced [23]. By using discriminant information, DCC (i.e., discriminant canonical correlations) is proposed by Kim et al. [24], and GCCA (i.e., generalized CCA ) is introduced by Sun et al. [25]. As linear dimensionality reduction methods, CCA and its variants cannot discover the non-linear relationship between different views. To solve this problem, some

ACCEPTED MANUSCRIPT

non-linear CCA extensions are designed. Melzer et al. [26] presents KCCA (i.e., kernel CCA), which aims at discovering the non-linear structure in the original space by using kernel methods. Also, LPCCA (i.e., locality preserving CCA) [27] is developed to find the intrinsic manifold structure of each view for data by keeping the neighborhood structure of data. Moreover, PLS (i.e., partial least squares) [28] and SCCA (i.e., sparse CCA) [29, 30] are also non-linear extensions of conventional

CR IP T

CCA.

Except for CCA and its variants, another common strategy for multi-modal dimensionality reduction is that one first projects each modal data into the new feature space by using some dimensionality reduction methods, and then performs fusion on

AN US

multi-modal data. In recent years, researchers have proposed various methods for single-modal dimensionality reduction [31-33]. Based on the difference in using class labels, these methods can be classified into three categories: unsupervised, supervised and semi-supervised methods [34, 35]. Specifically, supervised methods need full

M

label information [36-38] and unsupervised methods do not require any label information [39-41], while semi-supervised ones only need partial label information

ED

[42, 43]. This paper develops more effective unsupervised dimensionality reduction methods by modifying Locality Preserving Projections (LPP) and Sparsity Preserving

PT

Projections (SPP).

Tenenbaum et al. [44] indicated that there exists manifold structure in some image

CE

data. Accordingly, both LPP and SPP aim to mine the structure information of data, by means of the graph embedding framework [34, 45]. Specifically, LPP constructs a

AC

graph so that the neighborhood structure of data can be preserved, in which the π‘˜ nearest neighbors (KNN) algorithm is utilized to achieve an adjacent matrix for such graph. Once this graph is constructed, we can assign different weights to those edges. Similarly, SPP constructs an adjacent graph according to the sparse reconstructive relationship, where a sparse reconstructive matrix is obtained based on a modified sparse representation framework [40]. It is worth noting that the graphs in LPP and SPP are usually computed using conventional distance (e.g., Euclidean distance). However, the Euclidean distance just reflects the magnitude of similarity between

ACCEPTED MANUSCRIPT

different samples [46-48]. That is, the Euclidean distance cannot reveal the dynamic structure information of original data, since it ignores the effect of ambient samples when computing the similarity between two samples [49]. To reflect the dynamic structure of original data, a probabilistically motivated distance measure called effective distance is introduced [50], which is proved reliable in predicting some disease arrival time. In order to reflect the complex structure of

CR IP T

data, effective distance preserves the dynamic structure based on the probable paths, which can be achieved by a bidirectional connectivity network. As shown in [50], effective distance can reveal hidden complex structure of data, which cannot be exhibited from the conventional geographic point of view.

AN US

Accordingly, we propose two novel unsupervised effective distance-based dimensionality reduction methods, including Effective Distance-based LPP (EDLPP) and Effective Distance-based SPP (EDSPP). Specifically, in order to compute effective distance, we first construct a bidirectional network by a sparse

M

representation-based algorithm or KNN algorithm. Then we propose two effective distance-based dimensionality reduction methods, including EDLPP and EDSPP. The

ED

main contributions of our proposed methods are listed as follows. 1) We compute the similarity between different samples via effective distance based on sparse

PT

representation algorithm or KNN algorithm, which can reflect the dynamic structure of data. 2) We introduce the effective distance to dimensionality reduction methods,

CE

and propose two new dimensionality reduction algorithms (i.e. EDLPP and EDSPP). 3) These two modified dimensionality reduction methods are applied for multi-modal

AC

imaging of Alzheimer’s disease, and the performance on classification tasks validates the effectiveness of our proposed methods. The remainder of this paper is arranged as follows. Firstly, we briefly introduce

some related work about dimensionality reduction and effective distance in Section 2. Next, we propose two methods to compute the effective distance, two novel effective distance-based dimensionality reduction methods, and the multi-modal classification framework based on our proposed dimensionality reduction methods in Section 3. Section 4 elaborates the experiments on ten data sets from UCI repository and the

ACCEPTED MANUSCRIPT

ADNI database with multi-modal data. We finally summarize this paper and elaborate possible future research direction in Section 5.

2. Related Work 2.1 Locality Preserving Projection (LPP)

CR IP T

LPP is a typical unsupervised dimensionality reduction method, which can deal with the manifold structure that PCA is not able to discover [39]. The π‘˜ nearest neighbors (KNN) algorithm is used to construct the adjacency graph 𝐆 for preserving the local structure. Here, if sample π’™π’Š and sample 𝒙𝒋 are π‘˜ nearest

AN US

neighbors each other, we add an edge between π’™π’Š and 𝒙𝒋 . Then if π’™π’Š and 𝒙𝒋 are connected, we compute the weight using Eq. (1). 𝑆𝑖𝑗 = 𝑒

βˆ’

βˆ₯π’™π’Š βˆ’π’™π’‹ βˆ₯2 𝑑

Here, we can obtain a symmetric similarity matrix 𝐒 = {𝑆𝑖𝑗 }

(1) 𝑁 𝑖,𝑗=1

that defines the

M

similarity among samples, 𝑁 denotes the number of samples. If two samples π’™π’Š and

ED

𝒙𝒋 are neighbors in original space, in new subspace, the new samples π’šπ’Š and π’šπ’‹ should be close as well. Therefore, we can achieve the projection vector 𝒂 by

PT

minimizing the following formula: 1 2

2

1

2

βˆ‘π‘–π‘—(π’šπ’Š βˆ’ π’šπ’‹ ) 𝑆𝑖𝑗 = βˆ‘π‘–π‘—(𝒂𝑇 π’™π’Š βˆ’ 𝒂𝑇 𝒙𝒋 ) 𝑆𝑖𝑗 2

CE

= βˆ‘π‘– 𝒂𝑇 π’™π’Š 𝐷𝑖𝑖 π’™π‘‡π’Š 𝒂 βˆ’ βˆ‘π‘–π‘— 𝒂𝑇 π’™π’Š 𝑆𝑖𝑗 𝒙𝑇𝒋 𝒂 = 𝒂𝑇 𝐗(𝐃 βˆ’ 𝐒)𝐗 𝑇 𝒂 = 𝒂𝑇 𝐗𝐋𝐗𝒂

(2)

AC

where π’šπ’Š = 𝒂𝑇 π’™π’Š , 𝐗 is the sample matrix, 𝑖 = 1,2, … , 𝑁, and 𝐃 is the diagonal matrix, i.e., 𝐷𝑖𝑖 = βˆ‘π‘— 𝑆𝑖𝑗 . 𝐋 = 𝐃 βˆ’ 𝐒 is the Laplacian matrix. By adding a constraint 𝒂𝑇 𝐗𝐃𝐗 𝑇 𝒂 = 1, the objective function of LPP can be defined as follows: arg min 𝒂𝑇 𝐗𝐋𝐗 𝑇 𝒂 𝒂

𝑠. 𝑑. 𝒂𝑇 𝐗𝐃𝐗 𝑇 𝒂 = 1

(3)

The optimal vector a can be obtained by solving the following generalized eigenvalue problem:

ACCEPTED MANUSCRIPT 𝐗𝐋𝐗 𝑇 𝒂 = πœ†π—πƒπ— 𝑇 𝒂

(4)

2.2 Sparse Preserving Projections (SPP) SPP combines sparse representation with LPP to obtain the projection vector [40]. Utilizing the sparse reconstructive relationship of data based on a modified sparse representation (MSR) framework, SPP can obtain a matrix preserving the local

CR IP T

information of data. And SPP can avoid choosing the model parameters when the similarity matrix is obtained. Besides, even if SPP is an unsupervised dimensionality reduction algorithm, it still preserves some discriminative information.

A sparse reconstructive weight vector π’”π’Š , where the element 𝑠𝑖𝑗 (𝑖≠𝑗) denotes the

AN US

contribution of each 𝒙𝒋 to reconstruct π’™π’Š , is obtained through minimizing a modified 𝑙1 -norm problem. Accordingly, the sparse reconstructive weight matrix 𝐒 is represented as 𝐒 = ,π’”πŸ , π’”πŸ , … , 𝒔𝒏 -T . The matrix 𝐒 preserves desirable structure of original data, such as some discriminative information and intrinsic geometric

M

properties. Therefore, we can compute the projection vector 𝒂 by the following objective function:

ED

min𝒂 βˆ‘π‘›π‘–=1‖𝒂𝑇 π’™π’Š βˆ’ 𝒂𝑇 π—π’”Μƒπ’Š β€–2

(5)

To avoid degenerate solutions, the constraint 𝒂𝑇 𝐗𝐗 𝑇 𝒂 = 1 is included in SPP. And

PT

thus, the objective function reduces to:

CE

arg min 𝒂𝑇 𝐗(𝐈 βˆ’ 𝐒 βˆ’ 𝐒 𝑇 + 𝐒 𝑇 𝐒)𝐗 𝑇 𝒂 𝒂

𝑠. 𝑑. 𝒂𝑇 𝐗𝐗 𝑇 𝒂 = 1

(6)

AC

The optimal vector 𝒂 is given by solving the following generalized eigenvalue problem:

π—π’πœ· 𝐗 𝑇 𝒂 = πœ†π—π— 𝑇 𝒂

(7)

where π’πœ· = 𝐒 + 𝐒 𝑇 βˆ’ 𝐒 𝑇 𝐒. 2.3 Effective Distance Euclidean distance has always been used to measure the similarity between different samples [39, 51]. However, the underlying dynamic structure and internal

ACCEPTED MANUSCRIPT

relation of data cannot be totally represented by the connectivity matrix 𝐖. Namely, before projected into new subspace, the information described in the similarity matrix has lost some potential structure of data. Motivated by probabilistically interpretation, effective distance has been proposed to discover the underlying structure. It has been proved that the effective distance is more reliable than the Euclidean distance in predicting disease arrival times [50].

CR IP T

In order to better explain what the effective distance is, we give an illustration of effective distance in Fig. 1. Fig. 1 (a) is a graph including four samples with equal weight each other quantified by blue arrows. Specifically, sample A with sample B, sample C and sample D all have equal Euclidean distance. In Fig. 1 (b), we compute

AN US

the probability 𝑃(𝑛|π‘š) of a random walker from sample π‘š to sample 𝑛. For example, the probability 𝑃(𝐴|𝐡) at sample B moving to sample A is 1/4, where the number 4 means the number of the path from sample B to the other samples. In the definition of effective distance, a small probability 𝑃(𝑛|π‘š) denotes that the effective

M

distance from sample π‘š to sample 𝑛 is large, and vice versa. By comparing Fig. 1(a) with Fig. 1(b), we can see that the effective distance between two samples is not

ED

exactly same due to the whole topological structure. In other words, if one sample has the same Euclidean distance with two other samples at the same time, the

PT

corresponding effective distance with those samples may be different. Accordingly, the effective distance is more sensitive to reflect the underlying internal relation of

AC

CE

data.

ACCEPTED MANUSCRIPT

B

B

𝐏(𝐀|𝐁)=1/4

𝐏(𝐁|𝐀) = 𝟏/πŸ‘ 𝐏(𝐃|𝐀) = 𝟏/πŸ‘

D

C

D

𝐏(𝐂|𝐀) = 𝟏/πŸ‘

𝐏(𝐀|𝐂) = 𝟏

C

CR IP T

A

A

P(𝐀|𝐃) = 𝟏/𝟐

(a)

(b)

Fig. 1 Graphical illustration of effective distance. (a) A graph with four samples with equal weight each other quantified by blue arrows. (b) A graph where the similarity between two samples is represented

AN US

by the probability 𝑃(𝑛|π‘š) of a random walker from sample π‘š to sample 𝑛. A small probability 𝑃(𝑛|π‘š) denotes that the effective distance from sample π‘š to sample 𝑛 is large, and vice versa.

From Fig. 1, it is easy to see that the effective distance can be derived from the connectivity matrix 𝐏 with element π‘ƒπ‘šπ‘› , where 0 ≀ π‘ƒπ‘šπ‘› ≀ 1 quantifies the fraction

πΉπ‘šπ‘› 𝐹𝑛

where 𝐹𝑛 = βˆ‘π‘š πΉπ‘šπ‘› . Next, we can obtain the effective distance πΈπ·π‘šπ‘›

ED

π‘ƒπ‘šπ‘› =

M

of the passengers that arrive at sample π‘š and leave from sample 𝑛 . That is,

from 𝑛 to π‘š, which is defined as [50]: (8)

PT

πΈπ·π‘šπ‘› = (1 βˆ’ log π‘ƒπ‘šπ‘› ) β‰₯ 1

Unlike Euclidean distance, effective distance can reveal hidden geometry pattern of

CE

data, by considering all samples in the computation of similarity between two samples. Intuitively, using effective distance in dimensionality reduction, one can preserve

AC

more structure information of raw data, and thus, the performance of dimensionality reduction can be promoted. As far as we know, no previous work utilizes effective distance as similarity measure in dimensionality reduction domain. In the following, we propose two effective distance based dimensionality reduction methods, including Effective Distance-based LPP (EDLPP) and Effective Distance-based SPP (EDSPP).

3. Proposed Method

ACCEPTED MANUSCRIPT

In the following section, we first present two approaches to compute effective distance, and then propose two effective distance based dimensionality reduction methods. 3.1 Sparse Representation-based Effective Distance From Section 2, we can see that it requires constructing a bidirectional network for

CR IP T

obtaining the effective distance, and an asymmetric connectivity matrix 𝐖 can represent such a bidirectional network. Motivated by [52], in this paper, we use sparse representation algorithm [53, 54] to construct such a connectivity matrix. Since sparse representation is proved to be robust to noisy data, it can help to improve the robustness of our algorithms [55].

AN US

Sparse representation has been extensively applied in machine learning and pattern recognition domain [56-58]. Let 𝐗 = ,π’™πŸ , π’™πŸ , … , 𝒙𝒏 - ∈ 𝑅 π‘šΓ—π‘› be the sample matrix, and π’™π’Š ∈ 𝑅 π‘š represents 𝑖-th sample with π‘š-dimensional features. The objective function of sparse representation is to seek a sparest reconstructive vector π’˜ for each

M

sample π’™π’Š , which can be expressed formally as follows: minπ’˜ β€–π’˜β€–0

𝑠. 𝑑. π’™π’Š = π—π’˜

ED

(9)

where π’˜ ∈ 𝑅 𝑛 is the coefficient vector. Because the sparsest solution of Eq. (9) is

CE

PT

NP-hard, an alternative objective function is given as follows: minπ’˜ β€–π’˜β€–1 𝑠. 𝑑. π’™π’Š = π—π’˜

(10)

where β€–π’˜β€–1 is the 𝑙1 norm of π’˜. From the perspective of mathematics, 𝑙1 norm is

AC

more reliable than 𝑙0 norm for some applications [59]. Generally, the minimization problem of 𝑙1 in Eq. (10) can be solved using standard linear programming [60]. Consider that the signal π’™π’Š is noisy, researchers proposed robust extensions to

overcome this constraint. Therefore, a sparse reconstructive weight matrix is constructed by MSR framework [40, 55]. Specifically, a weight vector π’˜π’Š for each π’™π’Š , which aims to reconstruct π’™π’Š using as fewer samples as possible, can be obtained through minimizing the following modified 𝑙1 norm [60]:

ACCEPTED MANUSCRIPT

minπ’˜π’Š β€–π’˜π’Š β€–1 𝑠. 𝑑. π’™π’Š = π—π’˜π’Š

(11)

1 = πŸπ‘‡ π’˜π’Š 𝑇

where π’˜π’Š = [𝑀𝑖1 , … , 𝑀𝑖,π‘–βˆ’1 , 0, 𝑀𝑖,𝑖+1 , … , 𝑀𝑖𝑛 ] is a vector, and it means that the π‘₯𝑖 is removed from 𝐗. The contribution of each 𝒙𝒋 to reconstruct π’™π’Š is denoted by the

CR IP T

element 𝑀𝑖𝑗 (𝑗 β‰  𝑖); 𝟏 ∈ 𝑅 𝑛 is a vector that all elements are ones.

For each sample π’™π’Š , a weight vector π’˜π’Š can be obtained. Therefore, the sparse reconstructive weight matrix 𝐖 ∈ 𝑅 𝑛×𝑛 can be defined as follows:

where π’˜π’Š is obtained from Eq. (11).

AN US

𝐖 = ,π’˜πŸ , π’˜πŸ , … , π’˜π’ -𝑇

(12)

In fact, the constraint π’™π’Š = π—π’˜π’Š does not always hold. Therefore, two modified versions are given. The following function is the first one [40, 55]: minπ’˜π’Š β€–π’˜π’Š β€–1

M

𝑠. 𝑑. β€–π’™π’Š βˆ’ π—π’˜π’Š β€– < πœ€

(13)

1 = πŸπ‘‡ π’˜π’Š

min

𝑇

𝑻 [π’˜π‘‡ π’Š π’•π’Š ]

β€–,π’˜π‘‡π’Š π’•π‘‡π’Š -𝑇 β€–1

𝒙 𝐗 𝑠. 𝑑. 0 π’Š 1 = 0 𝑇 1 𝟏

CE

PT

follows [40, 55]:

ED

where Ξ΅ is the error tolerance. And the second modified version can be expressed as

𝐈 π’˜π’Š 10 1 𝟎 𝑇 π’•π’Š

(14)

where π’•π’Š is an π‘š-dimensional vector. In this way, we can obtain the similarity

AC

matrix 𝐖 using Eq. (12). Therefore, the bidirectional network can be represented by the asymmetry matrix 𝐍𝒔 = {𝑁𝑖𝑗𝑠 }

𝑁 𝑖,𝑗=1

. And the element 𝑁𝑖𝑗𝑠 is denoted as follows: π‘Šπ‘–π‘—

𝑁𝑖𝑗𝑠 = βˆ‘π‘

𝑖=1 π‘Šπ‘–π‘—

(15)

3.2 KNN-based Effective Distance To reduce the feature dimensionality, LPP uses KNN algorithm to construct the graph, which may help find the manifold structure information of data [39]. Now a

ACCEPTED MANUSCRIPT

KNN-based method is proposed to compute the effective distance. Given a sample matrix including 𝑛 samples 𝐗 = ,π’™πŸ , π’™πŸ , … , 𝒙𝒏 - ∈ 𝑅 π‘šΓ—π‘› , and π’™π’Š ∈ 𝑅 π‘š represents 𝑖-th sample with π‘š-dimensional features. Based on the KNN algorithm, an adjacency graph 𝐆 with 𝑛 samples is constructed at first. Then a similarity matrix 𝐊 = 𝑁

{𝐾𝑖𝑗 }𝑖,𝑗=1 can be obtained, and the element 𝐾𝑖𝑗 is defined as follows: βˆ₯π’™π’Š βˆ’π’™π’‹ βˆ₯2 𝑑

, 𝑖𝑓 π’™π’Š π‘Žπ‘›π‘‘ 𝒙𝒋 π‘Žπ‘Ÿπ‘’ π‘π‘œπ‘›π‘›π‘’π‘π‘‘π‘’π‘‘ 𝑖𝑛 𝐆 0, π‘œπ‘‘π‘•π‘’π‘Ÿπ‘€π‘–π‘ π‘’

(16)

CR IP T

βˆ’ 𝐾𝑖𝑗 = { 𝑒

where 𝑑 can be determined empirically. Similar to sparse representation-based effective distance, the asymmetry matrix computing effective distance is denoted as ππ’Œ = {π‘π‘–π‘—π‘˜ }

𝑁

𝐾𝑖𝑗

𝑖,𝑗=1

with element π‘π‘–π‘—π‘˜ = βˆ‘π‘

𝑖=1 𝐾𝑖𝑗

.

AN US

By using the asymmetry matrix 𝐍𝒔 or ππ’Œ , the sparse representation-based effective distance and the KNN-based effective distance are denoted as 𝐄𝐃𝒔 = {𝐸𝐷𝑖𝑗𝑠 }

𝑁 𝑖,𝑗=1

with element 𝐸𝐷𝑖𝑗𝑠 = 1 βˆ’ π‘™π‘œπ‘”π‘π‘–π‘—π‘  and π„πƒπ’Œ = {πΈπ·π‘–π‘—π‘˜ }

𝑖,𝑗=1

with element

M

πΈπ·π‘–π‘—π‘˜ = 1 βˆ’ π‘™π‘œπ‘”π‘π‘–π‘—π‘˜ respectively.

𝑁

Then, we can get the new effective distance-based similarity matrices 𝐄𝐒 𝒔 = 𝑁 𝑖,𝑗=1

and 𝐄𝐒 π’Œ = {πΈπ‘†π‘–π‘—π‘˜ }

𝑁

, where their elements represent the similarity

ED

{𝐸𝑆𝑖𝑗𝑠 }

𝑖,𝑗=1

𝐸𝑆𝑖𝑗𝑠 πΈπ‘†π‘–π‘—π‘˜

=𝑒

βˆ’

𝐸𝐷𝑠𝑖𝑗 πœ†

(17)

=𝑒

βˆ’

πΈπ·π‘˜ 𝑖𝑗 πœ†

(18)

AC

CE

follows:

PT

between different samples. And the value of 𝐸𝑆𝑖𝑗𝑠 and πΈπ‘†π‘–π‘—π‘˜ can be denoted as

where Ξ» is a bandwidth parameter. With Eq. (17) or Eq. (18), we can see that the effective distance-based similarity

matrix 𝐄𝐒 𝒔 or 𝐄𝐒 π’Œ actually represents the graph structure of data, where a node represents a specific sample, and an entry in matrix 𝐄𝐒 𝒔 or 𝐄𝐒 π’Œ denotes a weighted edge between two samples. The bidirectional connectivity network for computing the effective distance can reveal the underlying dynamic structure of data. 3.3 Effective Distance-based LPP

ACCEPTED MANUSCRIPT

Now we propose our Effective Distance-based LPP (EDLPP) approach, including two variants (i.e., EDLPP1 and EDLPP2) using two effective distance-based similarity matrices 𝐄𝐒 π’Œ and 𝐄𝐒 𝒔 , respectively. After obtaining the effective distance-based similarity matrix, EDLPP1 and EDLPP2 both minimize the following objective function: 2

βˆ‘π‘–π‘—(π’šπ’Š βˆ’ π’šπ’‹ ) 𝐸𝑆𝑖𝑗

CR IP T

(19)

Suppose π’˜ is the projection vector, that is, π’šπ’Š = π’˜π‘‡ π’™π’Š , we can achieve π’˜ by minimizing the following function: 1 2

2

βˆ‘π‘–π‘—(π’šπ’Š βˆ’ π’šπ’‹ ) 𝐸𝑆𝑖𝑗 =

1 2

1

2

βˆ‘π‘–π‘—(π’˜π‘‡ π’™π’Š βˆ’ π’˜π‘‡ 𝒙𝒋 ) 𝐸𝑆𝑖𝑗 1

AN US

= π’˜π‘‡ 𝐗 .2 𝐄𝐒 𝐒 + 2 𝐄𝐒 𝐣 βˆ’ 𝐄𝐒/ 𝐗 𝑇 π’˜

(20)

where 𝐗 = *𝐱𝟏 , 𝐱𝟐 , … , 𝐱𝐧 + , both 𝐄𝐒 𝐒 and 𝐄𝐒 𝐣 are diagonal matrices, and they j

represent row sum and column sum of 𝐄𝐒 respectively, i.e., ESiii = βˆ‘j ESij , ESjj = 1

1

2

2

1

βˆ‘i ESij . Let 𝐄𝐃𝐋 = 𝐄𝐒 𝐒 + 𝐄𝐒 𝐣 βˆ’ 𝐄𝐒, and 𝐄𝐃𝐃 = (𝐄𝐒 𝐒 + 𝐄𝐒 𝐣 ), we can obtain the 2

M

optimal 𝐰 by solving the following generalized eigenvalue problem: (21)

ED

𝐗𝐄𝐒𝐋 𝐗 T 𝐰 = λ𝐗𝐄𝐒𝐃 𝐗 T 𝐰 3.4 Effective Distance-based SPP

PT

In this subsection, we introduce the effective distance-based SPP (EDSPP). We can obtain an effective distance-based similarity matrix 𝐄𝐒 𝒔 using Eq. (17), in which the

CE

element 𝐸𝑆𝑖𝑗𝑠 denotes the similarity between the 𝑖-th and the 𝑗-th samples. Similar

AC

to SPP, the optimal 𝐰 can be achieved by solving the following generalized eigenvalue problem: π—π„π’πœ· π’˜ = πœ†π—π— 𝑇 π’˜

(22)

where π„π’πœ· = 𝐄𝐒 𝒔 + (𝐄𝐒 𝒔 )𝑇 βˆ’ (𝐄𝐒 𝒔 )𝑇 𝐄𝐒 𝒔 . According to the above description, we list the learning algorithm for effective distance-based LPP and SPP as follows:

ACCEPTED MANUSCRIPT

Algorithm 1. Learning Algorithm for Effective Distance-based LPP and SPP Input: The training data 𝐗 = *π’™π’Š +𝑁 𝑖=1 ; Output: The optimal projection vector 𝐰; Step 1: Construct an asymmetric weight matrix 𝐍 𝒔 or 𝐍 π’Œ via sparse representation algorithm or π‘˜ nearest neighbors algorithm;

π‘˜ or 𝐸𝐷𝑖𝑗 = 1 βˆ’ π‘™π‘œπ‘”π‘π‘–π‘—π‘˜ ;

CR IP T

𝑠 Step 2: Compute the effective distance matrix 𝐄𝐃 with the element 𝐸𝐷𝑖𝑗 = 1 βˆ’ π‘™π‘œπ‘”π‘π‘–π‘—π‘ 

Step 3: Construct the effective distance-based similarity matrix 𝐄𝐒, with the element 𝐸𝑆 𝑠 or 𝐸𝑆 π‘˜ is computed according to Eq. (17) or Eq. (18);

AN US

Step 4: According to the matrix 𝐄𝐒, we can compute the projection vector 𝐰 of EDLPP1, EDLPP2 and EDSPP via Eq. (21) and Eq. (22);

3.5 Multi-modal classification based on EDLPP and EDSPP

M

In Fig. 2, we illustrate the proposed multi-modal classification framework based on EDLPP and EDSPP. As shown in Fig. 2, firstly we perform EDLPP or EDSPP on each

ED

modality, by which the samples are transformed into the new feature space. Then, we compute the kernel matrix using the reduced features in each modality space, where

PT

our proposed EDLPP and EDSPP methods are used to perform dimensionality

CE

reduction. Finally, we adopt a multi-kernel SVM [10, 61] to combine multiple modal

AC

data for classification.

ACCEPTED MANUSCRIPT

Calculate

EDLPP/EDSPP

Modality 2

Kernel matrix

EDLPP/EDSPP

Calculate

Kernel

Kernel matrix

Combination

… Calculate

EDLPP/EDSPP

Modality M

Kernel matrix

CR IP T

Modality 1

SVM

classification

Dimensionality Reduction

AN US

Effective distance-based

M

Fig. 2 Multi-modal classification framework based on the proposed dimensionality reduction methods

4. Experiments

ED

4.1 Data sets

To validate our proposed methods, we make two groups of experiments. The first

PT

experiment is conducted on ten data sets from UCI repository [62]. The detailed

CE

statistics of data sets used in our experiments are listed, which are shown in Table 1.

AC

Table 1. Detailed information of UCI data sets in the experiments

Data set

# Class

# Instance

# Dimension

Anneal

5

898

90

Breast Tissue

6

106

9

Colic

2

368

60

Hepatitis

2

155

19

House

2

232

16

Hypothyroid

2

368

60

Promoter

2

106

57

Sonar

2

208

60

Wdbc

2

569

30

Wine

3

178

13

ACCEPTED MANUSCRIPT

The other experiment is conducted on ADNI database to evaluate our methods on multi-modal data set. Alzheimer’s disease (AD) is quite common among the elderly people in the world. According to the research, the number of the people who are affected will increase rapidly in a few years [63]. Mild cognitive impairment (MCI) is an early status of AD. Recently, a number of studies indicate that neuroimaging is significantly helpful for the diagnosis of this disease. It has been proved that

CR IP T

combining multiple biomarkers can further improve the diagnosis of AD and MCI [64]. Many machine learning and pattern classification methods are used in imaging analysis of AD [65].

The used ADNI database has 202 samples, which is comprised of 51 AD patients,

AN US

99 MCI patients and 52 healthy controls. To be specific, MCI patients can be divided into two categories, which include 43 MCI converts who had developed to AD within 18 months and 56 MCI non-converts who had not developed to AD within 18 months. In the experiment, we combine two modal images (i.e., MRI and PET) for the

M

classification of AD, MCI and healthy controls. Each MRI image or PET image can be represented using a vector of 93-dimension. In other words, each sample has 93

ED

features where each feature is extracted from one region of interest (ROI). More details on the image pre-processing can be found in [10].

PT

4.2 Experiment Setting

CE

For classification tasks, a 10-fold cross validation strategy is used to compute the accuracy. In detail, we firstly divide all the data into ten subsets with roughly equal

AC

size. Then each time the samples within one subset are selected as test set and the remaining samples in nine subsets are viewed as the training set. We repeat this process for 10 times independently, which can avoid any bias of randomly partition. Finally, the 1-Nearest-Neighborhood (1-N-N) classifier is used to obtain the classification accuracy on ten UCI data sets. And the accuracy is the evaluation criterion for different methods. Besides, for the multi-modal data set from ADNI database, we adopt the framework illustrated in Fig. 2 to implement the classification tasks. Then, we can obtain four results to evaluate the efficacy of our proposed

ACCEPTED MANUSCRIPT

methods, including classification accuracy (i.e., ACC), area under receiver operating characteristic curve (i.e., AUC), sensitivity (SEN) and specificity (SPE). In our experiments, the range of selected dimensionality number is set according to the number of samples, and we report the highest classification result. In neighborhood preserving embedding (NPE) [66], isometric feature mapping (ISOMAP) [67], LPP and EDLPP1, they all have one common parameter, i.e., which is tuned in a large range of candidates. The bandwidth

CR IP T

neighborhood size π‘˜,

𝑑 in Eq. (1), Eq. (16) and Ξ» in Eq. (17), Eq. (18) are set empirically. The regulation parameter for sparse representation the effective distance relied on is chosen from 0.005 to 0.1. In addition, PCARatio is used to determine the number of principal

AN US

components in PCA [41]. PCARation is defined as the ratio of the sum of the eigenvalues corresponding to principal components to the sum of all the eigenvalues of the covariance matrix, and its value is searched from {0.96, 0.97, 0.98 and 0.99}. 4.3 Experimental Results

M

To validate the effectiveness of our methods, four well-known dimensionality reduction methods (including LPP, SPP, NPE and ISOMAP) are used to make contrast

ED

experiments. For the effective distance-based LPP, EDLPP1 computes the effective distance matrix based on KNN algorithm, and EDLPP2 uses sparse representation

PT

algorithm to obtain effective distance.

CE

4.3.1 Experiments on Single-modal Data Firstly, experiments are conducted on UCI data sets, with results shown in Table 2

AC

and Fig. 3. From the classification accuracy in Table 2, we can see that EDLPP1 and EDLPP2 outperform LPP. Also, EDSPP achieves better results than SPP in most cases. From all the results, it is obvious that our proposed EDLPP1, EDLPP2 and EDSPP usually achieve much better classification accuracies than other methods.

ACCEPTED MANUSCRIPT

Table 2. Classification accuracies achieved by different dimensionality reduction methods ISO

LPP

SPP

EDLPP1

EDLPP2

EDSPP

Anneal

0.9012

0.9193

0.9020

0.9223

0.9338

0.9352

0.9559

Breast Tissue

0.6451

0.5832

0.5849

0.6516

0.6596

0.6833

0.7370

Colic

0.6314

0.6682

0.6638

0.6535

0.7391

0.7269

0.7177

Hepatitis

0.7347

0.7455

0.7270

0.7398

0.8172

0.8175

0.8507

House

0.9208

0.9298

0.9247

0.9211

0.9513

0.9460

0.9483

Hypothyroid

0.6282

0.6673

0.6678

0.6535

0.7391

0.7333

0.7269

Promoter

0.6959

0.6524

0.6703

0.6519

0.7009

0.7688

0.7655

Sonar

0.8476

0.8409

0.8301

0.8534

0.8968

0.8900

0.8941

Wdbc

0.9047

0.9083

0.9078

0.9075

0.9415

0.9530

0.9508

Wine

0.8436

0.8234

0.8337

0.8598

0.9293

0.9277

0.9385

AN US

CR IP T

NPE

Besides, we plot the curves of the classification results to present the influence of the dimensionality number, with results on several UCI data sets shown in Fig. 3. It is obvious that the performance of proposed methods (i.e., EDLPP1, EDLPP2 and EDSPP) is consistently superior to those of NPE, ISOMAP, LPP and SPP. These

M

results show that the effective distance can improve the performance of

AC

CE

PT

ED

dimensionality reduction methods.

CR IP T

ACCEPTED MANUSCRIPT

(b) Hepatitis

AN US

(a) Anneal

(d) Promoter

CE

PT

ED

M

(c) Hypothyroid

AC

(e) Sonar

(f) Wdbc

Fig. 3 Classification results of different dimensions on (a) Anneal, (b) Hepatitis, (c) Hypothyroid, (d) Promoter, (e) Sonar, and (f) Wdbc data sets.

4.3.2 Experiments using Multi-modal Data In this part, we evaluate the proposed methods on multi-modal data from ADNI database including MRI and PET data. We perform three classification tasks, including AD versus NC, MCI-C (MCI converts) versus MCI-NC (MCI

ACCEPTED MANUSCRIPT

non-converts), and MCI versus NC classification. Firstly, we conduct experiments on MRI and PET separately, and the classification accuracies of our methods are compared with some classical methods including NPE, ISOMAP, LPP and SPP. Results using only the MRI data are shown in Fig. 4, where it is obvious that our proposed methods can boost the learning performance in three classification tasks. Also, Fig. 4 shows that our proposed EDLPP1 and EDLPP2 methods achieve similar

CR IP T

results in three tasks. From Fig. 5, which presents the results of different algorithms using only the PET data, one can observe that the effective distance based dimensionality reduction algorithms are superior to those methods using Euclidean

ED

M

AN US

distance.

AC

CE

PT

Fig. 4 Comparisons on several dimensionality reduction methods on ADNI dataset with MRI modality

Fig. 5 Comparisons on several dimensionality reduction methods on ADNI dataset with PET modality

ACCEPTED MANUSCRIPT

Table 3 and Table 4 list the comparison between multi-modalities and single modality based on our proposed methods. It is obvious that the results of combining two modalities are superior to the results of experiments on MRI or PET respectively. Specifically, our proposed methods achieve the best classification accuracies of 94.78%, 84.49% and 78.61% for three classification tasks (i.e., AD versus NC, MCI

CR IP T

versus NC, and MCI-C versus MCI-NC) by combining bi-modal data (i.e. MRI and PET), while the best classification accuracies on MRI are 89.21%, 76.39% and 69.44% and the classification accuracies on PET are 91.49%, 78.09% and 67.72%. In addition, as the important evaluation criterion for the diagnosis, the SEN, SPE and AUC on

AN US

multi-modal data are always higher than those on single modal data. These results indicate that our proposed methods based on effective distance are effective in multi-modal data for dimensionality reduction.

To further evaluate the performance on multi-modal data of our proposed methods, we compare our methods with several multi-modality methods (including CCA, LPP,

M

and SPP), and the results are reported in Table 5. Specifically, baseline represents the

ED

classification results on the test data using the original features, where the different modal data is combined by multi-kernel technique. As shown in Table 5, we can see

PT

that our proposed methods (i.e., EDLPP1, EDLPP2, and EDSPP) always achieve better classification performance than the conventional methods including LPP, SPP

CE

and CCA.

AC

Table 3. Performance of AD vs. NC and MCI vs. NC classification using multi-modal data

Method

EDLPP1

EDLPP2

Modal

AD vs. NC ACC

SEN

SPE

(%)

(%)

(%)

MCI vs. NC AUC

ACC

SEN

SPE

(%)

(%)

(%)

AUC

MRI

87.14

89.21

85.00

0.87

76.39

94.41

42.93

0.69

PET

90.15

90.59

89.23

0.90

78.09

89.69

55.96

0.73

MRI+PET

94.68

95.69

93.46

0.95

84.49

96.67

60.77

0.79

MRI

89.21

89.02

87.50

0.89

74.48

89.69

43.79

0.67

PET

91.49

88.43

94.23

0.91

77.25

89.09

54.62

0.71

MRI+PET

94.78

93.33

95.96

0.94

82.98

92.73

63.85

0.78

ACCEPTED MANUSCRIPT

EDSPP

MRI

88.27

84.90

88.46

0.88

71.65

92.85

32.96

0.63

PET

88.29

86.28

90.19

0.88

77.28

90.71

51.54

0.72

MRI+PET

93.97

91.76

95.77

0.93

82.29

93.54

58.85

0.77

Table 4. Performance of MCI-C vs. MCI-NC classification using multi-modal data MCI-C vs. MCI-NC

EDLPP1

EDLPP2

ACC

SEN

SPE

(%)

(%)

(%)

MRI

69.44

46.16

PET

65.61

50.47

MRI+PET

77.22

61.16

MRI

68.50

49.07

PET

67.72

51.63

MRI+PET

78.61

67.67

CR IP T

Modal

MRI EDSPP

PET MRI+PET

AUC

83.24

0.65

61.60

0.56

88.92

0.75

83.57

0.66

79.10

0.65

86.25

0.77

AN US

Method

69.17

51.39

80.98

0.66

65.50

50.47

76.60

0.64

78.56

66.51

87.14

0.76

Table 5. Comparison on classification accuracies of different multi-modality dimensionality reduction

Baseline

M

methods

CCA

LPP

SPP

EDLPP1

EDLPP2

EDSPP

84.61%

89.24%

84.03%

87.95%

94.68%

94.78%

93.97%

MCI vs. NC

72.00%

77.83%

79.87%

75.21%

84.49%

82.98%

82.29%

MCI-C vs. MCI-NC

61.11%

69.78%

71.67%

69.00%

77.22%

78.61%

78.56%

PT

ED

AD vs. NC

CE

5. Parameter Selection

This subsection discusses the performance of our methods under different

AC

parameter settings. Two parameters, which include the PCARatio and the coefficient Οƒ of sparse representation, can be tuned in our experiments. Specifically, the value of PCARatio varies from 0.96 to 0.99 with a step size of 0.01, and the coefficient Οƒ of sparse representation ranges from 0.005 to 1. Due to the special way to compute the similarity between different samples, EDLPP1 only has one parameter (i.e., PCARatio) to be discussed. As we can observe from Fig. 6, EDLPP1 achieves one stable curve under the different values of PCARatio. In addition, we discuss the influence of

ACCEPTED MANUSCRIPT

PCARatio and Οƒ on the performance of EDLPP2 and EDSPP. As can be seen in Fig. 7, when we fix one of the parameters, the variation of classification accuracy is not obvious with the change of the other parameter. These results show that our proposed

AN US

CR IP T

effective distance-based methods are not sensitive to the parameters.

CE

PT

ED

M

Fig. 6 The classification accuracy of EDLPP1 for AD and NC with different PCARatio

AC

Fig. 7 The classification results of EDLPP2 and EDSPP for AD and NC with respect to the selection of PCARation and coefficient

Οƒ

6. Conclusion This paper presented two new effective distance-based unsupervised dimensionality reduction methods, including Effective Distance-based LPP (including EDLPP1 and EDLPP2) and Effective Distance-based SPP (EDSPP). Specifically, we adopt the effective distance to measure the similarity of samples, to preserve the dynamic

ACCEPTED MANUSCRIPT

structure of samples. The experiments on UCI and the ADNI show that our proposed methods usually outperform those methods based on conventional distance (i.e., Euclidean distance). And our proposed method can be extended into supervised approaches using supervision information (e.g., class labels or pairwise constraints [12]), which is one of our future works. Also, one can develop different methods to compute the effective distance, which may further boost the learning performance of

CR IP T

our proposed method.

Acknowledgements

This work is supported in part by the National Natural Science Foundations of

AN US

China (Nos. 61422204, 61473149, 61501230), the Jiangsu Natural Science Foundation for Distinguished Young Scholar (No. BK20130034), the Jiangsu Natural Science Foundation (No. BK20150751), the NUAA Fundamental Research Funds (No. NE2013105) and the Foundation of Graduate Innovation Center in NUAA (No.

M

KFJJ20151605).

[1]

ED

References

D. Zhang and D. Shen, "Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease," Neuroimage, vol. 59, pp.

[2]

PT

895-907, 2012.

K. W. Bowyer, K. Chang, and P. Flynn, "A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition," Computer Vision and Image Understanding, vol. 101,

[3]

CE

pp. 1-15, 2006.

W. Shao, M. Liu, and D. Zhang, "Human cell structure-driven model construction for predicting protein subcellular location from biological images," Bioinformatics, pp. 8 pp.-120,

AC

2015.

[4]

M. Liu, D. Zhang, and D. Shen, "View‐centralized multi‐atlas classification for Alzheimer's disease diagnosis," Human Brain Mapping, vol. 36, pp. 1847-1865, 2015.

[5]

Y. Gao, M. Wang, Z. Zha, J. Shen, X. Li, and X. Wu, "Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search," IEEE Transactions on Image Processing, vol. 22, pp. 363-376, 2013.

[6]

L. G. Apostolova, K. S. Hwang, J. P. Andrawis, A. E. Green, S. Babakchanian, J. H. Morra, et al., "3D PIB and CSF biomarker associations with hippocampal atrophy in ADNI subjects," Neurobiology of Aging, vol. 31, pp. 1284-303, 2010.

[7]

M. Liu, D. Zhang, and S. Chen, "Attribute relation learning for zero-shot classification," Neurocomputing, vol. 139, pp. 34-46, 2014.

ACCEPTED MANUSCRIPT

[8]

B. Gu, V. S. Sheng, K. Y. Tay, W. Romano, and S. Li, "Incremental Support Vector Learning for Ordinal Regression," IEEE Transactions on Neural Networks & Learning Systems, vol. 26, pp. 1403-1416, 2015.

[9]

B. Cheng, M. Liu, H.-I. Suk, D. Shen, and D. Zhang, "Multimodal manifold-regularized transfer learning for MCI conversion prediction," Brain Imaging and Behavior, pp. 1-14, 2015.

[10]

D. Zhang, Y. Wang, L. Zhou, H. Yuan, and D. Shen, "Multimodal classification of Alzheimer's disease and mild cognitive impairment," Neuroimage, vol. 55, pp. 856-867, 2011.

[11]

M. Liu, D. Zhang, and D. Shen, "Relationship Induced Multi-template Learning for Diagnosis of Alzheimer’s Disease and Mild Cognitive Impairment," IEEE Transactions on Medical Imaging,

[12]

CR IP T

pp. 1-1, 2016.

L.-F. Chen, H.-Y. M. Liao, M.-T. Ko, J.-C. Lin, and G.-J. Yu, "A new LDA-based face recognition system which can solve the small sample size problem," Pattern Recognition, vol. 33, pp. 1713-1726, 2000.

[13]

S. J. Raudys and A. K. Jain, "Small sample size effects in statistical pattern recognition: Recommendations for practitioners," IEEE Transactions on Pattern Analysis & Machine Intelligence, pp. 252-264, 1991.

M. Liu, L. Miao, and D. Zhang, "Two-Stage Cost-Sensitive Learning for Software Defect

AN US

[14]

Prediction," IEEE Transactions on Reliability, vol. 63, pp. 676-686, 2014. [15]

W. Jiang and F.-l. Chung, "A trace ratio maximization approach to multiple kernel-based dimensionality reduction," Neural Networks, vol. 49, pp. 96-106, 2014.

[16]

B. Jie, D. Zhang, B. Cheng, and D. Shen, "Manifold regularized multi-task feature selection for multi-modality classification in Alzheimer’s disease," in Medical Image Computing and

[17]

M

Computer-Assisted Intervention, ed: Springer, 2013, pp. 275-283.

M. Liu and D. Zhang, "Sparsity Score: A Novel Graph-preserving Feature Selection Method," International Journal of Pattern Recognition & Artificial Intelligence, vol. 28, 2014. M. Liu, D. Zhang, E. Adeli-Mosabbeb, and D. Shen, "Inherent Structure Based Multi-view

ED

[18]

Learning with Multi-template Feature Representation for Alzheimer’s Disease Diagnosis," IEEE Transactions on Biomedical Engineering, 2015. Y. Luo, D. Tao, K. Ramamohanarao, C. Xu, and Y. Wen, "Tensor Canonical Correlation Analysis

PT

[19]

for Multi-View Dimension Reduction," IEEE Transactions on Knowledge & Data Engineering, [20]

CE

vol. 27, pp. 3111-3124, 2015. J. Chen and I. D. Schizas, "Online Distributed Sparsity-Aware Canonical Correlation Analysis," IEEE Transactions on Signal Processing, vol. 64, pp. 688-703, 2016. D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, "Canonical correlation analysis: An overview

AC

[21]

with application to learning methods," Neural Computation, vol. 16, pp. 2639-2664, 2004.

[22]

H. Hotelling, "Relations between two sets of variates," Biometrika, pp. 321-377, 1936.

[23]

L. Sun, S. Ji, and J. Ye, "Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 33, pp. 194-200, 2011.

[24]

T.-K. Kim, J. Kittler, and R. Cipolla, "Discriminative learning and recognition of image set classes using canonical correlations," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 29, pp. 1005-1018, 2007.

[25]

Q.-S. Sun, Z.-d. Liu, P.-A. Heng, and D.-S. Xia, "A theorem on the generalized canonical projective vectors," Pattern Recognition, vol. 38, pp. 449-452, 2005.

ACCEPTED MANUSCRIPT

[26]

T. Melzer, M. Reiter, and H. Bischof, "Appearance models based on kernel canonical correlation analysis," Pattern Recognition, vol. 36, pp. 1961-1971, 2003.

[27]

T. Sun and S. Chen, "Locality preserving CCA with applications to data visualization and pose estimation," Image and Vision Computing, vol. 25, pp. 531-543, 2007.

[28]

Q.-S. Sun, Z. Jin, P.-A. Heng, and D.-S. Xia, "A novel feature fusion method based on partial least squares regression," in Pattern Recognition and Data Mining, ed: Springer, 2005, pp. 268-277.

[29]

D. R. Hardoon and J. Shawe-Taylor, "Sparse canonical correlation analysis," Machine Learning, vol. 83, pp. 331-353, 2011. C. Zu and D. Zhang, "Canonical sparse cross-view correlation analysis," Neurocomputing, 2016.

[31]

CR IP T

[30]

Q. Zhu, H. Sun, Q. Feng, and J. Wang, "CCEDA: building bridge between subspace projection learning and sparse representation-based classification," Electronics Letters, vol. 50, pp. 1919-1921, 2014.

[32]

M. Liu and D. Zhang, "Pairwise Constraint-Guided Sparse Learning for Feature Selection," IEEE Transactions on Cybernetics, vol. 46, pp. 298-310, 2016.

W. Yang, C. Sun, and W. Zheng, "A Regularized Least Square based Discriminative Projections

AN US

[33]

for Feature Extraction," Neurocomputing, vol. 175, 2015. [34]

S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, "Graph embedding and extensions: a general framework for dimensionality reduction," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 29, pp. 40-51, 2007.

[35]

F. Nie, D. Xu, I. W.-H. Tsang, and C. Zhang, "Flexible manifold embedding: A framework for

M

semi-supervised and unsupervised dimension reduction," IEEE Transactions on Image Processing, vol. 19, pp. 1921-1932, 2010. [36]

P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, "Eigenfaces vs. fisherfaces: Recognition

ED

using class specific linear projection," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 19, pp. 711-720, 1997. [37]

M. Li and B. Yuan, "2D-LDA: A statistical linear discriminant analysis for image matrix,"

[38]

PT

Pattern Recognition Letters, vol. 26, pp. 527-532, 2005. D. Xu, S. Yan, D. Tao, S. Lin, and H.-J. Zhang, "Marginal fisher analysis and its variants for

CE

human gait recognition and content-based image retrieval," IEEE Transactions on Image Processing, vol. 16, pp. 2811-2821, 2007.

[39]

X. He, "Locality preserving projections," Neural Information Processing Systems, vol. 45, pp.

AC

186-197, 2005.

[40]

L. Qiao, S. Chen, and X. Tan, "Sparsity preserving projections with applications to face recognition," Pattern Recognition, vol. 43, pp. 331–341, 2010.

[41]

M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of cognitive neuroscience, vol. 3, pp. 71-86, 1991.

[42]

D. Zhang, Z. H. Zhou, and S. Chen, "Semi-Supervised Dimensionality Reduction," in In: Siam International Conference on Data Mining, 2007, pp. 11--393.

[43]

D. Cai, X. He, and J. Han, "Semi-supervised Discriminant Analysis," in Proceedings / IEEE International Conference on Computer Vision. IEEE International Conference on Computer Vision, 2007, pp. 1-7.

[44]

J. B. Tenenbaum, V. De Silva, and J. C. Langford, "A global geometric framework for nonlinear

ACCEPTED MANUSCRIPT

dimensionality reduction," Science, vol. 290, pp. 2319-2323, 2000. [45]

W. Yang, Z. Wang, and C. Sun, "A collaborative representation based projections method for feature extraction," Pattern Recognition, vol. 48, pp. 20-27, 2015.

[46]

Y. Zheng, J. Byeungwoo, D. Xu, Q. M. J. Wu, and H. Zhang, "Image segmentation by generalized hierarchical fuzzy C-means algorithm," Journal of Intelligent & Fuzzy Systems, vol. 28, pp. 4024-4028, 2015.

[47]

J. Zhang, J. Liang, and H. Zhao, "Local energy pattern for texture classification using self-adaptive quantization thresholds," IEEE Transactions on Image Processing, vol. 22, pp. 31-42, 2013. J. Zhang, H. Zhao, and J. Liang, "Continuous rotation invariant local descriptors for texton

CR IP T

[48]

dictionary-based texture classification " Computer Vision & Image Understanding, vol. 117, pp. 56-75, 2013. [49]

Y. Gao, M. Wang, D. Tao, R. Ji, and Q. Dai, "3-D Object Retrieval and Recognition With Hypergraph Analysis," IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, vol. 21, pp. 4290-4303, 2012.

[50]

D. Brockmann and D. Helbing, "The hidden geometry of complex, network-driven contagion

[51]

AN US

phenomena," Science, vol. 342, pp. 1337-1342, 2013.

K. L. Elmore and M. B. Richman, "Euclidean distance as a similarity metric for principal component analysis," Monthly weather review, vol. 129, pp. 540-549, 2001.

[52]

M. Liu and D. Zhang, "Feature Selection with Effective Distance," Neurocomputing, 2016.

[53]

K. Huang and S. Aviyente, "Sparse representation for signal classification," in Neural Information Processing Systems, 2006, pp. 609-616.

J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan, "Sparse representation for

M

[54]

computer vision and pattern recognition," Proceedings of the IEEE, vol. 98, pp. 1031-1044, 2010.

J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, "Robust face recognition via sparse

ED

[55]

representation," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 31, pp. 210-227, 2009.

T. T. Cai and A. Zhang, "Sparse representation of a polytope and recovery of sparse signals

PT

[56]

and low-rank matrices," IEEE Transactions on Information Theory, vol. 60, pp. 122-132, 2014. G. Shao, Y. Wu, A. Yong, X. Liu, and T. Guo, "Fingerprint compression based on sparse

CE

[57]

representation," IEEE Transactions on Image Processing, vol. 23, pp. 489-501, 2014.

[58]

Y. Xie, W. Zhang, Y. Qu, and Y. Zhang, "Discriminative subspace learning with sparse

AC

representation view-based model for robust visual tracking," Pattern Recognition, vol. 47, pp. 1383-1394, 2014.

[59]

D. L. Donoho, "Compressed sensing," IEEE Transactions on Information Theory, vol. 52, pp. 1289-1306, 2006.

[60]

S. S. Chen, D. L. Donoho, and M. A. Saunders, "Atomic decomposition by basis pursuit," SIAM Journal on Scientific Computing, vol. 20, pp. 33-61, 1998.

[61]

M. Liu, D. Zhang, S. Chen, and H. Xue, "Joint Binary Classifier Learning for ECOC-based Multi-class Classification," IEEE Transactions on Pattern Analysis & Machine Intelligence, pp. 1-1, 2015.

[62]

A. Asuncion and D. Newman, "UCI Machine Learning Repository," University of California Irvine School of Information Andcomputer Sciences, 2007.

ACCEPTED MANUSCRIPT

[63]

F. H. Bouwman, S. N. Schoonenboom, V. D. F. Wm, E. J. van Elk, A. Kok, F. Barkhof, et al., "CSF biomarkers and medial temporal lobe atrophy predict dementia in mild cognitive impairment. Neurobiol Aging," Neurobiology of Aging, vol. 28, pp. 1070-4, 2007.

[64]

A. M. Fjell, K. N. C. Walhovd, L. K. Mcevoy, D. J. Hagler, D. Holland, J. B. Brewer, et al., "CSF biomarkers in prediction of cerebral and clinical change in mild cognitive impairment and Alzheimer's disease," Journal of Neuroscience the Official Journal of the Society for Neuroscience, vol. 30, pp. 2088-101, 2010.

[65]

B. Jie, D. Zhang, W. Chong‐Yaw, and D. Shen, "Topological graph kernel on multiple Human Brain Mapping, vol. 35, p. 2876–2897, 2014.

[66]

CR IP T

thresholded functional connectivity networks for mild cognitive impairment classification," X. He, D. Cai, S. Yan, and H.-J. Zhang, "Neighborhood preserving embedding," in Tenth IEEE International Conference on Computer Vision, 2005. ICCV 2005. , 2005, pp. 1208-1213. [67]

J. B. Tenenbaum, "Mapping a manifold of perceptual observations," Neural Information

AC

CE

PT

ED

M

AN US

Processing Systems, pp. 682-688, 1998.

ACCEPTED MANUSCRIPT

Dan Zhang, born in 1991 and received her B.S degree in Computer Science and Technology from Henan University in 2014. Currently she is a master in Nanjing University of Aeronautics and Astronautics. Her research interests include machine learning, pattern

CR IP T

recognition, data mining and medical imaging analysis etc.

Qi Zhu, born in 1985 and received his B.S., M.S. and Ph.D. degrees in Computer Science from Harbin Institute of Technology in 2007, 2010 and 2014, respectively. Currently he is an assistant professor of College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics. His research interests include medical image analysis,

AN US

feature extraction, biometrics etc.

M

Daoqiang Zhang, born in 1978 and received his B.S. degree and Ph.D. degree in Computer Science from Nanjing University of Aeronautics and Astronautics in 1999 and 2004, respectively. And he is a Professor and a PhD supervisor in Nanjing University of

ED

Aeronautics and Astronautics at present. His research interests include machine learning, pattern recognition, data mining and medical imaging analysis etc. In these areas, he has published over 100 scientific articles in refereed international journals such as

PT

Neuroimage, Pattern Recognition, Artificial Intelligence in Medicine, IEEE Trans. Neural Networks; and conference proceedings such as IJCAI, AAAI, SDM, ICDM. He is a member

CE

of the Machine Learning Society of the Chinese Association of Artificial Intelligence (CAAI), and the Artificial

AC

Intelligence & Pattern Recognition Society of the China Computer Federation (CCF).