Nonlinear dimensionality reduction with relative distance comparison

Nonlinear dimensionality reduction with relative distance comparison

ARTICLE IN PRESS Neurocomputing 72 (2009) 1719–1731 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/loca...

2MB Sizes 0 Downloads 24 Views

ARTICLE IN PRESS Neurocomputing 72 (2009) 1719–1731

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Nonlinear dimensionality reduction with relative distance comparison Chunxia Zhang a, Shiming Xiang b,, Feiping Nie c, Yangqiu Song c a b c

Software School, School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China Department of Automation, Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China

a r t i c l e in fo

abstract

Article history: Received 26 October 2007 Received in revised form 27 July 2008 Accepted 12 August 2008 Communicated by T. Heskes Available online 11 September 2008

This paper proposes a new algorithm for nonlinear dimensionality reduction. Our basic idea is to explore and exploit the local geometry of the manifold with relative distance comparisons. All such comparisons derived from local neighborhoods are enumerated to constrain the manifold to be learned. The task is formulated as a problem of quadratically constrained quadratic programming (QCQP). However, such a QCQP problem is not convex. We relax it to be a problem of semi-definite programming (SDP), from which a globally optimal embedding is obtained. Experimental results illustrate the validity of our algorithm. & 2008 Elsevier B.V. All rights reserved.

Keywords: Nonlinear dimensionality reduction (NLDR) Relative distance comparison Semi-definite programming

1. Introduction In many areas of natural and social sciences, one is often confronted with intrinsically low-dimensional data points which lie in a very high-dimensional observation space. An example might be a set of images of an individual’s face observed under different poses. If there are n  n grayscale pixels totally, then each face image yields a data point in n2 -dimensional space. But the intrinsic dimensionality of the space of all these images only equals to the number of the pose parameters. Reducing the dimensionality of such data is needed in many applications, ranging from image compression [34] to data visualization [27,4] and other tasks including recognition [33,35], vision computing [16,11], and so on [26,5]. Generally, the motivation behind dimensionality reduction is to discover a lower-dimensional structure in high-dimensional data without significant loss of information. Linear methods, such as principal component analysis (PCA) [12] and classical multidimensional scaling [6] are popularly used to perform the task of dimensionality reduction. Such linear methods are able to discover the lowerdimensional structure of data lying on or nearly lying on a linear space. However, in many cases lower-dimensional structure hidden in the data is nonlinear and directly using those linear methods may generate unsatisfactory results.

 Corresponding author. Tel.: +86 10 627 96 872; fax: +86 10 627 86 911.

E-mail address: [email protected] (S. Xiang). 0925-2312/$ - see front matter & 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2008.08.003

Recently, many nonlinear dimensionality reduction (NLDR) algorithms [15] have been developed under the assumption that the data points are sampled from an underlying manifold embedded in a high-dimensional Euclidean space. The two wellknown algorithms are Isomap [25] and locally linear embedding (LLE) [18]. Based on the multidimensional scaling algorithm, Isomap attempts to preserve globally the geodesic distances between any pair of data points. LLE tries to discover the nonlinear structure by exploiting the local geometry of the data. Later, different manifold learning algorithms have been proposed, such as manifold charting [3], Laplacian eigenmap (LE) [1], Hessian LLE (HLLE) [7], local tangent space alignment (LTSA) [36], maximum variance unfolding (MVU) [29,28], conformal eigenmap [21], and other extensive work [15,31,22,10,24,13,14], etc. Most NLDR algorithms can be considered into a common framework of thinking globally and fitting locally, in which the locally geometrical information is collected together to obtain a global optimum [19]. Locally geometrical information of data is explored and exploited in different ways. In LLE [18], each data point is linearly reconstructed with its neighbors and such a linear representation is maintained in a lower-dimensional space. LE calculates the similarity of any pair of neighboring data points to define the graph Laplacian [1]. HLLE, LTSA and LSE explore the local relations between neighboring data points in tangent spaces. Local coordinates are mapped, linearly [36,24] or nonlinearly [31], to the global coordinate system with lower dimensionality. In contrast, MVU [29,28] and conformal eigenmap [21] utilize the locally geometrical relations in a straightforward way. In MVU, Euclidean distances between neighboring data points are globally

ARTICLE IN PRESS 1720

C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

preserved in lower-dimensional space. Such an idea is extended in conformal eigenmap by preserving angle information and similar results can be obtained. This paper uses relative distance comparisons to explore the local geometrical relations between data points. As a kind of side information, employing relative comparison is not a new idea in machine learning. Actually it is used to learn distance metrics for data clustering and classification with ‘‘A is more similar to B than A is to C’’ [20,17]. In addition, as supervised information, such a formulation is translated as ‘‘the distance from A to B is not greater than that from A to C’’ in manifold alignment [32]. In this paper, all such relative comparisons derived in each neighborhood on the manifold are enumerated and maintained in lower-dimensional manifold to be learned. The task is formulated as a problem of quadratically constrained quadratic programming (QCQP) [2]. However, such an optimization problem is nonconvex. To remedy this drawback, we perform a semi-definite slack and convert the source QCQP problem to be a problem of semi-definite programming (SDP) [2], in which a global embedding is finally obtained. One advantage of relative distance comparison is that it can be easily specified not only in the local neighborhoods but also in the global region. This yields a mechanism to integrate prior knowledge or supervised information into manifold learning [32]. Performances show that adding a few relative comparisons about the global manifold structure may significantly change the learned shape. In contrast, most manifold learning algorithms could not be directly extended to integrate the global information in such a straightforward way without changing the nature of the optimization model. In distance preserving framework [29,28],global structure can also be specified with distances between non-neighboring data points. However, supplying such distances is not an easy task. One reason is that the values in equality constraints should be carefully input to avoid conflicts between equalities. Another reason is that geodesic distance or manifold distance should be considered for the non-neighboring data points since the commonly used Euclidean distance metric can only be suitable for neighboring data points. On the one hand, calculating geodesic distances usually needs much time. On the other hand, the obtained geodesic distance may not reflect the true distance, specially when the data are sparsely sampled from the manifold or the topological structure of the manifold is not convex. Such drawbacks can be easily avoided with relative comparisons. The remainder of this paper is organized as follows: Section 2 will develop the optimization model for NLDR problem. Section 3 will discuss how to solve the model with SDP. The extension of the model for using global relative distance comparisons is also presented in this section. Section 4 gives the algorithm. We report the experimental results in Section 5 and draw conclusions in Section 6.

2. The model The NLDR problem can be formulated as follows. Given a set of n scattered data points xi 2 Rm lying on a manifold M embedded

in a m-dimensional Euclidean space. The goal is to invert an underlying generative model x ¼ f ðyÞ to find the corresponding lower-dimensional parameters (embedding coordinates) yi 2 Rd such that xi ¼ f ðyi Þ, that is, construct Y ¼ fyi gni¼1 from X ¼ fxi gni¼1 . Differing from those with locally linear reconstruction and tangent space representation [18,7,36,31], here we use purely geometrical representations to explore the relations between data points. For each data points xi 2 X ði ¼ 1; . . . ; nÞ, denote its neighborhood by Ni, which contains k nearest neighbors of xi obtained with Euclidean distance metric. Further let Ni ¼ fxij gkj¼1 , in which subscript ij stands for an index and ij 2 f1; 2; . . . ; ng. Now we can enumerate all the triples Si ¼ fði; ij ; ik Þ j xij ; xik 2 Ni g and each triple ði; ij ; ik Þ corresponds to a relative distance comparison: kxi  xij kpkxi  xik k

(1)

Here k  k stands for the L2-norm. Eq. (1) indicates that the distance from xi to xij is not greater than that from xi to xik . Such a distance comparison reflects a weak geometry between data points. Now we hope such a relative comparison is also maintained in lower-dimensional intrinsic space. It follows kyi  yij kpkyi  yik k

(2)

Obviously, only relative comparisons cannot yield a unique solution. Additional constraints should be introduced to develop the optimization model. First, the lower-dimensional coordinates Y ¼ fyi gni¼1 should be embedded into a definitely specified place. Such a place can be selected near the coordinate origin: n X

yi ¼ 0

(3)

i¼1

Second, to avoid generating zero solutions, a distance guard should be introduced. It can be selected as the minimum distance among all the pairs of neighboring data points. Without loss of generality, suppose x1 and x2 can give the minimum distance. Then we have ky1  y2 k ¼ c

(4)

Here c equals to the distance from x1 to x2 . Note that there only exists a scaling difference if c is specified as any other positive number. Therefore, we simply take c ¼ 1 in this paper. Unfortunately, only satisfying the constraints in (2)–(4) can still not generate a unique solution. Fig. 2 shows an example. In Fig. 2(a) five source data points A, B, C, D and E are sampled from a circle. This is a one-dimensional manifold with angle parameters as the intrinsic parameters. Given the distance from A to B as a distance guard, the line segments ‘‘ABCDE’’, ‘‘ABC1D1E1’’ and ‘‘ABC2D2E2’’ in Fig. 1(b) are all feasible solutions. Among those feasible solutions, we select that with maximum variance since with this solution the manifold is largely unfolded. This solution is just the one as ‘‘ABCDE’’ in Fig. 1(b), which shows a perfect onedimensional embedding.

Fig. 1. (a) Five source data points sampled from a circle; (b) feasible solutions satisfying the constraints in Eqs. (2)–(4).

ARTICLE IN PRESS C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

Based on the above discussions, we can give the following optimization model: max Y

s:t:

trðYYT Þ kyi  yij kpkyi  yik k ði; ij ; ik Þ 2 Si ; i ¼ 1; 2; . . . ; n n X

yi ¼ 0

i¼1

ky1  y2 k ¼ 1

(5)

where tr is a trace operator of matrix and Y ¼ ½y1 ; y2 ; . . . ; yn  2 Rdn collects all the lower-dimensional embedding coordinates to be learned. The relative comparisons and the distance guard in Model (5) are nonlinear. Equivalently, they can be replaced by square distances. Thus we get the following model of QCQP [2]: max Y

s:t:

1721

Eq. (12) gives us n equality constraints in Ke ¼ 0. From the perspective of mathematical optimization, satisfying such n equality constraints is not an easy task when n is very large. Note that we use this constraint only to specify the embedding place. A reasonable compromise is to reduce them to be one equality constraint by summing all of them together [32]. Thus we have X K i;j ¼ 0 (13) i;j

Now according to Eqs. (9), (10), (11) and (13), Model (6) is developed as the following problem: max K

s:t:

trðKÞ  2K i;ij þ K ij ;ij þ 2K i;ik  K ik ;ik p0 ði; ij ; ik Þ 2 Si ; i ¼ 1; 2; . . . ; n

K i;j ¼ 0

i;j

T

trðYY Þ

K 1;1  2K 1;2 þ K 2;2 ¼ 1 Kk0

kyi  yij k2 pkyi  yik k2 ði; ij ; ik Þ 2 Si ; i ¼ 1; 2; . . . ; n n X

P

yi ¼ 0

i¼1

ky1  y2 k2 ¼ 1

(6)

Unfortunately, Model (6) cannot be solved with convex programming algorithms since the relative comparison constraints are not convex. Deducing the solution by gradient ascend algorithms is intractable and usually they can only output a local optimum. In the next section, Model (6) will be converted into a SDP [2], which makes the problem tractable and yields a global optimum.

(14)

Note that the constraints and the objective function are all formulated as linear functions. Compared with the standard linear programming, however, an additional positive semi-definite constraint is imposed on matrix K. This results in that Problem (14) changes to be a typical SDP problem [2]. It is convex and thus has a global optimum, which can be effectively solved with existing SDP algorithms. In Problem (14), the relative comparison constraints are dominant. They force the SDP solver to faithfully maintain the local distance relations between data points. To obtain a solution when facing with real-world data, we relax each relative comparison constraint by introducing a small positive variable e. Namely,

3. Solving the model with SDP

2K i;ij þ K ij ;ij þ 2K i;ik  K ik ;ik pei;ij ;ik

In this section, we will give the details about how to convert the QCQP problem to be a SDP problem. For the relative comparison and distance guard, we have

here ei;ij ;ik is a small positive number. Such a slack trick indicates that the relative comparison can be violated but controlled into a small region. Before reformulating Problem (14), we further consider to introduce the constraints about the global structure of the manifold so that we can develop a general model for manifold learning. Actually, the local representations of relative distance comparisons can be naturally extended to specify the whole structure of the data with global relative distance comparisons. Global relative comparisons can be supplied according to prior knowledge about the manifold or simply given by the user [32]. In this way, we can achieve the goal of performing supervised manifold learning in a unified model. Suppose the known relative comparisons about the global structure of the manifold are collected into S ¼ fði; j; kÞ j kyi  yj k2 pkyi  yk k2 g. By introducing the slack variables, Problem (14) can be extended as follows:

kyi  yij k2 pkyi  yik k2 ()  2yTi yij þ yTij yij þ 2yTi yik  yTik yik p0 (7) and ky1  y2 k2 ¼ 1 () yT1 y1  2yT1 y2 þ yT2 y2 ¼ 1

(8)

In Eqs. (7) and (8), the constraints are formulated in terms of inner products of lower-dimensional embedding coordinates to be learned. To be consistent with this formulation, using the property of trace operator (here trðABÞ ¼ trðBAÞ), the objective function can be rewritten as follows: trðYYT Þ ¼ trðYT YÞ

(9) T

nn

Denoting K ¼ Y Yð2 R Þ, we can see that (1) K is a Gram matrix since its element K i;j equals to the inner product of yi and yj , that is, K i;j ¼ yTi yj ; and (2) K is positive semi-definite, denoted by Kk0. Now the relative distance comparisons and distance guard can be written as follows: 2K i;ij þ K ij ;ij þ 2K i;ik  K ik ;ik p0

(10)

0

min K; e

s:t:

a@

n X

X

ei;ij ;ik þ

i¼1 ði;ij ;ik Þ2Si

X

1

ei;j;k A  ð1  aÞ  trðKÞ

ði;j;kÞ2S

 2K i;ij þ K ij ;ij þ 2K i;ik  K ik ;ik pei;ij ;ik ði; ij ; ik Þ 2 Si ; i ¼ 1; 2; . . . ; n  2K i;j þ K j;j þ 2K i;k  K k;k pei;j;k ði; j; kÞ 2 S

X

K i;j ¼ 0

i;j

and

K 1;1  2K 1;2 þ K 2;2 ¼ 1

K 1;1  2K 1;2 þ K 2;2 ¼ 1

Kk0

(11)

8ði; ij ; ik Þ 2 Si ; ei;ij ;ik 40

Pn

Note that the zero-centralized constraint i¼1 yi ¼ 0 should be further treated. With matrix form, it can be rewritten as Ye ¼ 0, in which e ¼ ½1; 1; . . . ; 1T 2 Rn . Then we have Ye ¼ 0¼)YT Ye ¼ 0¼)Ke ¼ 0

(12)

8ði; j; kÞ 2 S; ei;j;k 40

(15)

here a 2 ð0; 1Þ is a positive trade-off parameter. Note that the source maximum problem in (14) is now formulated as a minimum problem in (15). The problem in (15) is also a SDP

ARTICLE IN PRESS 1722

C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

Fig. 2. (a) Five hundred data points sampled from the Swiss roll surface; (b) the true 2D parameters; (c) the results learned by standard MVU; (d) the results learned by shrunk MVU; (e) the results learned by our method.

Fig. 3. Results learned from teapot images. (a) The results learned by standard MVU; (b) the results learned by shrunk MVU; (c) the results learned by our method. Representative images are illustrated near their positions with symbols of squares.

Fig. 4. Results learned from a half of the teapot images used in Fig. 3. (a) The results learned by standard MVU; (b) the results learned by shrunk MVU; (c) the results learned by our method. Representative images are illustrated near their positions with symbols of squares.

problem. Thus, given a a, according to the theories of convex optimization, we can obtain a global optimum K. Finally we point out that Y can be obtained by performing eigenvalue decomposition of K as K ¼ YT Y. Here Y 2 Rdn and the lower dimensionality d can be determined with prior knowledge or deduced according to the distribution of eigenvectors of K [21,8]. Some remarks: First, we point out here that during model development in Section 2, we do not consider how to solve the model. Thus, no slack variables are introduced for the implicit constraints as formulated in Eq. (2). In this section, we first perform the SDP slack and then convert the source problem to be a SDP problem. Finally, to avoid generating empty solution, we add the slack variables and develop the general model as formulated in (15). In view of mathematical optimization, in the case of S ¼ ;, the solution to (15) is just one of the feasible solution to (5). Second, to perform unsupervised manifold learning such as Isomap, LLE, HLLE, LE, LTSA and local spline embedding (LSE) [31], we can simply set the constraint set S to be empty. If some relative distance comparisons about the global structure of the manifold can be obtained, we can simply add them into S. In this way, supervised manifold learning is performed. In contrast, such

Fig. 5. Sixty-five head pose images.

a uniform model for unsupervised and supervised manifold learning could not be naturally obtained from most existing manifold learning algorithms without significantly changing the model description [25,18,1,7,36,31]. Third, in experiments in this paper, a will be manually supplied. Note that the objective function in (15) is a combination of the regularization term and the data term trðKÞ. Therefore, a

ARTICLE IN PRESS C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

1723

Fig. 6. Results learned from the head pose images. (a) The results learned by standard MVU; (b) the results learned by shrunk MVU; (c) the results learned by our method. Representative images are illustrated near their positions with symbols of squares.

larger a indicates that the regularization term should be a smaller number. Accordingly, each of slack variables should be also smaller. In this way, we hope all the relative constraints are satisfied. In addition, a larger a indicates that the coefficient (1  a) of the data term is smaller. From the objective function, we see that in this case the SDP optimizer may in turn encourage a larger trðKÞ. Note that finding a larger trðKÞ is just the goal of Model (14). Thus, we recommend to use a large a in real applications.

4. The algorithm Before listing the steps of the algorithm, we further discuss about the distance comparisons. Note that each relative comparison includes xi , thus for each Ni with k neighbors, totally we can get C 2k triples. Actually most of such triples are redundant. For example, suppose we have ði; i1 ; i2 Þ¼)kxi  xi1 kpkxi  xi2 k and ði; i2 ; i3 Þ¼)kxi  xi2 kpkxi  xi3 k. Then we can see that ði; i1 ; i3 Þ¼)kxi  xi1 kpkxi  xi3 k holds naturally. Actually, ‘‘p’’ is a transitive relation, under which only k  1 relative comparisons should be necessarily derived from Ni . In this way, the number of constraints can be significantly reduced from nC 2k to nðk  1Þ. This will help the SDP optimizer to reduce the computation time. For example, in the case of n ¼ 200 and k ¼ 7, there are totally 200  7  3 constraints if we do not perform redundance removal. With CSDP package,1 it will take about 46.23 s on a PC with 2.4 G CPU and 1 G RAM, using Matlab 7.0. After reducing the redundant constraints, there are 200  6 constraints. To solve this task, it will only take about 8.97 s. To this end, we sort k distances kxi  xij k, j ¼ 1; 2; . . . ; k, just in ascending order. Without loss of generality, we still denote the 1

Available at http://infohost.nmt.edu/borchers/csdp.html

Fig. 7. The teapot images with 20% salt and pepper noise.

sorted indices by i1 ; i2 ; . . . ; ik, indicating kxi  xi1 kpkxi  xi2 kp    pkxi  xik k. Finally, we can get k  1 triples ði; i1 ; i2 Þ; ði; i2 ; i3 Þ; . . . ; ði; ik1 ; ik Þ, and k  1 relative comparisons can then be written out. Now we can list the steps of our algorithm as follows: Algorithm. Manifold learning with relative distance comparisons. Input: n source data points X ¼ fxi gni¼1  Rm , lower dimensionality d ðdomÞ, and number of nearest neighbors k, trade-off parameter a. Output: n lower-dimensional coordinates Y ¼ fyi gni¼1  Rd . (1) For each xi , i ¼ 1; 2; . . . ; n, construct Ni with k nearest neighbors. (2) For each Ni , rank the distances in ascending order and then construct k  1 triples: Si ¼ fði; ij1 ; ij Þ j j ¼ 1; 2; . . . ; kg. Each triple ði; ij1 ; ij Þ indicates a relative distance comparison: kxi  xij1 kpkxi  xij k.

ARTICLE IN PRESS 1724

C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

Fig. 8. The results learned from the teapot images with noise. (a) The results learned by standard MVU; (b) the results learned by shrunk MVU; (c) the results learned by our method. Representative images are illustrated near their positions with symbols of squares.

(3) Find the two data points in X which supply the minimum distance and construct the distance guard constraint according to Eq. (11). (4) Develop the model as formulated in (15). (5) Solve the model with SDP algorithm and get matrix K 2 Rnn . (6) Perform eigenvalue decomposition of K. Namely, K ¼ VDVT . (7) V ¼ VD0:5 (8) Y ¼ ½Vð1Þ; . . . ; VðdÞT 2 Rdn . In step (5), the column vectors of V ð2 Rnn Þ are the n eigenvectors of K, and D is a diagonal matrix of eigenvalues sorted in descending order. In step (6), VðiÞ is the ith column of V. When solving the SDP problem, we use the CSDP package to fulfill the task. Note that in (15) if S is not empty, the relative distance comparisons can be simply added into the constraint set. Then we can perform the above steps to achieve supervised manifold learning.

5. Experimental results In this section, we validate our algorithm, using the toy data points and real-world images. Comparative experiments are reported and details about parameter selection are discussed. 5.1. Learning with relative distance comparisons In this subsection, we test our algorithm and compare it with MVU. Two versions of MVU are implemented. One is the model with distance equality constraints, and the other is that with distance inequality constraints. Intrinsically, in the model with distance inequality constraints, we allow all distances to shrink by setting distance preserving constraints to inequalities. Thus, the number of the constraints is as the same as that in the model with equality constraints. To be clarity, here we call the former model as standard MVU and the latter model as shrunk MVU.2 In experiments in this subsection, we set S in (15) to be empty. That is, here we perform unsupervised manifold learning. Fig. 2 demonstrates an example. There n ¼ 500 source data points are uniformly sampled from a Swiss roll surface. Geometrically, the parameter domain of this surface is a twodimensional (2D) rectangular segment. Thus, the intrinsic dimensionality d equals to 2. For reference, the true 2D parameters are given in Fig. 2(b), and from the left to the right are the parameters 2

html

Code available at: http://www.seas.upenn.edu/kilianw/Downloads/MVU.

of the data points from the inner to the outer of the surface (Fig. 2(a)). The true parameters are uniformly located in the rectangle. In experiment, we take a ¼ 0:9999 and k ¼ 6. Figs. 2(c) and (d) show the results learned by standard MVU and shrunk MVU, respectively. As can be seen these two implementations of MVU yield similar results. Fig. 2(e) illustrates the results learned by our method. In contrast, the results are more uniformly distributed. For example, there exists a big hole near the middle region in Fig. 2(c). But the true shape as shown in Fig. 2(a) does not contain such a hole. To test our algorithm with real-world data points, we run it on a data set with n ¼ 200 color images which are taken from a teapot with different viewpoints along a circle. The size of the images is 76  101. This manifold is embedded in R23028 . The intrinsic dimensionality d equals to one. But we take d ¼ 2 to explore the structure of the closed curve. The results are illustrated in Fig. 3. In experiment, we take k ¼ 3 and set a ¼ 0:9. Figs. 3(a) and (b) illustrate the results learned by standard MVU and shrunk MVU. Fig. 3(c) shows the results learned by our method. Representative images are illustrated near their positions with symbols of squares. Globally, MVU yields the needed circle. However, not all of the embedded data points are faithfully located on the circle. In contrast, more accurate results are obtained with our method. In another example, we consider to embed a half of the data points used in Fig. 3. The intrinsic dimensionality d also equals to one and the intrinsic shape of the manifold now changes to be a line segment of 100 uniformly located data points. In experiment, we set k ¼ 3 and a ¼ 0:98. Figs. 4(a) and (b) show the results learned by standard MVU and shrunk MVU, respectively. Fig. 4(c) shows the results learned by our method. Globally, all the methods generate a line segment. In contrast, with standard MVU, many data points are separated from each other in vertical direction. The data points learned by shrunk MVU are faithfully located on the line, but neighboring data points are dragged together near the same position. Some of the data points learned by our method are also slightly separated from the line. But generally, the data points are uniformly located on the line. To further test our algorithm on more complex data, we conduct an experiment on 65 head pose images [9]. To be clear, we show these images in Fig. 5. They are taken with horizontal pose angles varying from 90 to 90 and the vertical pose angles varying from 30 to 30 . The underlying intrinsic parameters are the horizontal and vertical pose angles of the head. Thus, it is a 2D manifold and the intrinsic shape is a rectangle. Figs. 6(a) and (b) show the results learned by standard MVU and shrunk MVU. Fig. 6(c) illustrates the results learned by our method. Globally, the shape obtained by our method is more compact and closer to a rectangle, compared with those obtained by MVU.

ARTICLE IN PRESS C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

To investigate the performance of our algorithm for noise data, here we conduct two experiments on two data sets. In the first experiment, the data points used in Fig. 3 are considered. For each image ‘‘I’’ with 101  76 pixels, we use the flowing Matlab sentences to add slat and pepper noise: dRatio ¼ 0:2 Inoised ¼ imnoiseðI; ‘‘salt & pepper’’; dRatioÞ Here ‘‘dRatio ¼ 0.2’’ means that totally there are 20% of the pixels which are added noise. This proportion indicates that a significant amount of noise is added to the image. Some examples are illustrated in Fig. 7. In experiment, we take k ¼ 4 and set a ¼ 0:9. Figs. 8(a) and (b) show the results learned by standard MVU and shrunk MVU. Fig. 8(c) shows the results learned by our method. We see that many data points obtained by standard MVU

Fig. 9. The head pose images with 10% salt and pepper noise.

1725

are embedded far away from the circle. Such a phenomenon can also be witnessed in the results obtained by shrunk MVU. In contrast, our method still yields a circle, indicating that the results are acceptable. We also tested these images with more noises. Experimental results show that when ‘‘dRatio’’ is greater than 0.3 (namely 30% pixels are noised), the results learned by MVU and our method may be distorted. In the second experiment, the data points used in Fig. 6 are considered. For each of the 65 poses images, 10% slat and pepper noise is added. To be clear, the noise images are illustrated in Fig. 9. In experiment, we take k ¼ 5 and set a ¼ 0:9. Figs. 10(a) and (b) show the results learned by standard MVU and shrunk MVU. Fig. 10(c) shows the results learned by our method. Representative images are also illustrated near their positions with symbols of squares. The pose variance is globally preserved with standard MVU and our method. However, compared with those illustrated

Fig. 11. The results learned by our method from the teapot images, with different parameter k.

Fig. 10. The results learned from the head pose images with noise. (a) The results learned by standard MVU; (b) the results learned by shrunk MVU; (c) the results learned by our method. Representative images are illustrated near their positions with symbols of squares.

ARTICLE IN PRESS 1726

C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

Fig. 12. The results learned by our method from the head pose images, with different parameter k.

in Fig. 6, some of the data points on the border are now embedded into the inner region of the shape. This indicates that the shapes are not well preserved. Additionally, with shrunk MVU, some neighboring data points are dragged together. Thus, the transition between head poses is not well depicted. We also conducted experiments with more noises. Experiments show that when ‘‘dRatio’’ is greater than 0.15, the learned results with MVU and our method may be drastically distorted. Finally, we report the computation time. For each neighborhood with k neighbors, totally MVU constructs C 2kþ1 equality constraints. In contrast, our method only construct k  1 relative comparisons. Actually, the larger the number of the constraints of the SDP problem contains, the more time the SDP solver will take to try to satisfy them. Additionally, in view of mathematical optimization, usually it will take more time for the optimizer to satisfy equality (‘‘ ¼ ’’) constraints, compared with satisfying inequality (‘‘p’’) constraints. With CSDP package, for n ¼ 500 and k ¼ 6 in Fig. 2, MVU takes about 579.8 s, while our method needs about 421.1 s on a PC with 2.4 GHz CPU and 1 G RAM, using Matlab 7.0. For the task in Fig. 3, MVU needs about 8.7 s, while our method takes about 5.9 s. For the task in Fig. 4, MVU needs about 3.6 s, while our method takes about 2.3 s. For the task in Fig. 6, MVU needs about 3.6 s, while our method takes about 1.4 s. CSDP package is proved to be much faster than SeDuMi package [23]. For example, to solve the 500 data points in Fig. 2, with SeDuMi, MVU takes about 3188.6 s, while our method only needs about 932.5 s. 5.2. Parameter selection In our algorithm, there have three parameters: d, k and a. d is the low dimensionality, and several algorithms have been developed for how to select a proper d from the data points to be learned [21,8]. Here we only discuss k and a. The selection of parameter k is a common issue in manifold learning algorithms. Fig. 11 shows the results learned by our method from the 200 teapot images, with k ¼ 3; 4; . . . ; 10, respectively. The k nearest neighbors are determined with Euclidean distance metric. We can see that with a larger k, the learned results may be unsatisfactory. Actually, the teapot manifold is intrinsically one-dimensional. It is sufficient to employ a small k to obtain a graph which faithfully represents the underlying manifold. If k is very large, the neighbors may not be true neighbors on the manifold. Thus, increasing k would inevitably degrade the results. Fig. 12 shows the results learned by our method from the head pose images used in Fig. 6. The head pose manifold is intrinsically 2D. When k is less than 4, not all of the nearest neighbors (for example, the four surrounding left, right, up and down poses) are fully used. Accordingly, unsatisfactory results are yielded with k ¼ 3. When kX5, however, the shape is well unfolded.

α = 0.4

α = 0.5

α = 0.6

α = 0.7

α = 0.8

α = 0.9

α = 0.95

α = 0.99

Fig. 13. The results learned by our method from the teapot images, with different parameter a.

As mentioned in Section 3, for a fixed parameter a, the problem in (15) is a SDP problem, which has a global optimum. Now we report the experimental results with differ a. Fig. 13 illustrates the results learned from the teapot images with different a parameters, with k ¼ 3. Experiments show that satisfactory results can be obtained when a is located into [0.45, 0.99]. Fig. 14 illustrates another example, in which we run our method on the head pose images with different a, with k ¼ 5. Experiments show that satisfactory results can be obtained when a is located into [0.3, 0.99]. From the results in these two experiments, we see that a large region for us can be selected for a. This advantage may facilitate the real-world applications. In real applications, when the data contains noise, some distance relative comparison constraints may be incorrectly identified. Thus, in this case, introducing the slack variables and weighting their effects on the objective function is necessary for the SDP optimizer to generate a solution. However, this motivates us to consider the ideal situation that the data do not contain noise and there are also no redundant constraints. We conducted an experiment on the 200 teapot images used in Fig. 3. Now a task is to identify the redundant constraints. It would be difficult to find them directly in source R23028 space. Alternatively, we select to consider this task in tangent subspace [36]. For each data point ‘‘A’’, PCA [12] is employed to project the four neighbors (including itself) onto R2 subspace. Totally, there are n ¼ 200 neighborhoods, each of which contains four neighboring teapot images. Thus, we should perform PCA 200 times. We observe that after performing PCA, each group of four neighboring images will be linearly transformed as one of the two patterns in Fig. 15(a). There ‘‘B’’, ‘‘C’’, ‘‘D’’ are the three neighbors of ‘‘A’’. According to the method introduced in Section 4, for the left configuration in Fig. 15(a), we can derive two constraints kA  CkpkA  Bk and kA  BkpkA  Dk. Note that kC  AkpkC  Dk is also a constraint for ‘‘C’’. Thus, we can discard the constrains kA  BkpkA  Dk. Similarly, for the right configuration in Fig. 15(a), we can discard

ARTICLE IN PRESS C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

1727

Fig. 14. The results learned by our method from the head pose images, with different parameter a.

D

B

D B

A

C

A

C

Fig. 15. (a) Two basic kinds of configurations of groups of four neighboring teapot images in R2 subspace, after performing PCA; (b) the results learned from teapot images after reducing redundant constraints, using the model formulated in (14).

Fig. 16. The results learned by Isomap, LLE, LE, HLLE, LTSA and LSE, with k ¼ 6. The source data points are used in Fig. 2.

the constraint kA  CkpkA  Dk. After the constraints are reduced, we do not consider the regularization weight a in objective function and directly solve Model (14). The results are shown in Fig. 15(b). Compared with those results reported in Fig. 3(c), we see that the data points are more uniformly located on the circle. This indicates reducing the redundant constraints may help to improve the performance.

5.3. Comparisons with other manifold learning algorithms In the previous subsections, we report the experimental results as well as compare our algorithm with MVU. In this subsection,

we compare our algorithm with other six manifold learning algorithms, including Isomap [25], LLE [18], LE [1], HLLE [7], LTSA [36] and LSE [31]. Fig. 16 illustrates the results from the data points used in Fig. 2 with k ¼ 6. For LE, we take s ¼ 5:5. The intrinsic structure is a long rectangle. The shapes learned by LLE, Isomap, and LE are seriously distorted. Only Isomap generates a long rectangle. However, the inner data points are not uniformly located. Actually, data points are dragged together and large holes appear in the rectangle. In contrast, HLLE, LTSA and LSE generate more satisfactory results. Note that the learned results are all confined into a square. The reason is that these algorithms employ a constraint YT Y ¼ I to avoid degenerate solutions. In our method,

ARTICLE IN PRESS 1728

C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

Fig. 17. The results learned by Isomap, LLE, LE, HLLE, LTSA and LSE, with k ¼ 4. The source data points are used in Fig. 3.

we need not this constraint. Instead, we use zero-centralized constraint (in Eq. (13)) and distance guard constraint (in Eq. (11)) to avoid degenerate solutions. When the relative comparison constraints are satisfied, the data points are unfolded into a long rectangle. When we attempt to learn the data set of teapot images with k ¼ 3, in our implementation only Isomap and LE run successfully. So we report the results in Fig. 17 with k ¼ 4. In experiment, we take s ¼ 400:0 for LE. Only Isomap and LE generate the true structure and other four algorithms all fail to yield a circle. To further investigate the affection of parameter k, we conducted another experiment with k ¼ 5. The results are reported in Fig. 18. In this case, only LLE and HLLE fail to generate the true structure. Fig. 19 illustrates the results obtained from the data used in Fig. 4. In experiments, we take k ¼ 4 and set s ¼ 400:0 for LE. Compared with the results illustrated in Fig. 4, globally only Isomap generates the line shape, but the data points are not faithfully located on a line. Fig. 20 illustrates the results obtained with these six algorithms from the data used in Fig. 6. As can be seen, HLLE embeds most of the data points near the same positions. Isomap, LLE, LE, LTSA and LSE correctly capture the transitions of head pose images. But globally the shapes learned by LLE, LE, LTSA and LSE are far from the true shape (a rectangle). In contrast, Isomap yields more compact results, which are similar to those reported in Fig. 6. Compared with the results reported in Section 5.1, our method can generate the true structure with a small k. In contrast, most existing algorithms need a larger k to explore local geometry [18,7,36,31]. This is related to an open issue of how to select a proper k in manifold learning. Generally, k may be determined by the intrinsical dimensionality d. For example, for one-dimensional manifold, MVU, Isomap, LLE, LE and our method will require k greater than or equal to two, while HLLE, LTSA and LSE will require it grater than two since the tangent line should be estimated in R2 . For 2D manifold, with all of these algorithms, k should be greater than two since the neighbors of each data point may surround it. Thus, manifolds with larger dimensionality may need more neighbors to explore the local geometry. Besides the intrinsic dimensionality, the sparsity of data may also influence the selection of k. Actually, to unfold the nonlinear structure hidden in the data, the data graph should be connected. Consequently, k should be large enough to guarantee the connectivity of data graph. However, the drawback of our method is also obviously. The computational complexity of our algorithm is very high. When the

Fig. 18. The results learned by Isomap, LLE, LE, HLLE, LTSA and LSE, with k ¼ 5. The source data points are used in Fig. 3.

Fig. 19. The results learned by Isomap, LLE, LE, HLLE, LTSA and LSE, with k ¼ 4. The source data points are used in Fig. 4.

number of the data points is greater than 400, even with the faster solver (CSDP), it will take tens of seconds to solve the model. In contrast, most of these six algorithms can fulfill the tasks in a few seconds and about tens of seconds are enough for HLLE to generate an embedding.

5.4. Learning with global relative distance comparisons In manifold learning, many algorithms are performed in an unsupervised way [25,18,1,7,36,31]. There lacks of a straightforward extension to embed global information about the manifold shape. In contrast, our method can be easily extended to consider the global information. This goal can be achieved by adding the global relative distance comparisons into S in (15). Here we report two experiments in which prior knowledge about the global structure of the manifold is utilized in our method. Fig. 21(a) shows an open circle with 1:5p rad. The 100 uniformly sampled data points are illustrated in Fig. 21(b). The intrinsic structure hidden in these data points is a line segment, as that learned by our method in Fig. 21(d). As can be seen, the distance from ‘‘B’’ to ‘‘C’’ is greater than that from ‘‘A’’ to ‘‘B’’ and that from ‘‘A’’ to ‘‘C’’. The reason is that on the manifold the geodesic distance is measured. From ‘‘B’’ to ‘‘C’’, the data point ‘‘A’’

ARTICLE IN PRESS C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

1729

Fig. 20. The results learned by Isomap, LLE, LE, HLLE, LTSA and LSE, with k ¼ 5. The source data points are used in Fig. 6.

Fig. 21. (a) An open circle with 1:5p rad; (b) 100 source data points on the circle in (a); (c) the results obtained with two specified relative comparisons derived from A, B and C; (d) the results obtained without global relative comparisons. In this experiment, k is set to be 2.

should be visited. This is not true if the missed 0.5 rad circle is added. For data points on a closed circle, the distance from ‘‘B’’ to ‘‘C’’ is shorter than that from ‘‘A’’ to ‘‘B’’ and that from ‘‘A’’ to ‘‘C’’. Such global comparisons can be added into S in (15) to specify the global structure to be learned. The results shown in Fig. 21(c) are learned from (15) with only two additional global relative distance comparisons in S: kyB  yC kpkyB  yA k and kyC  yB kpkyC  yA k. Compared with the results in Fig. 21(d), we see that with such two additional constraints, the learned structure is significantly changed to be the structure we need. We conduct another experiment to further test the algorithm on real data set. The data points to be learned are the teapot images used in Fig. 3. We cut off a subsequence of 20 images and construct a data set containing 180 images. These images are sequentially illustrated in Fig. 22(a), each of which corresponds to a camera direction (or simply a pose to the camera). The camera direction of the first image (labeled as ‘‘001’’) is near to that of the last image (labeled as ‘‘180’’). The angle between these two directions is about 36 . The angle between the first and the median image (labeled as ‘‘090’’) is about 162 . The angle between

the last and the median image is also about 162 . As a result, the teapot poses in the first and last images are similar to each other. In other words, their distance is smaller, compared with the median image. Now we hope this distance relation should also be preserved in the intrinsic lower-dimensional space. With no global relative comparisons (S ¼ ; in (15)), we can only obtain an approximate line segment. Fig. 22(b) shows the result learned with k ¼ 3. As can be seen, the first and the last images are embedded far away from each other. In view of manifold learning, this result is acceptable since no any prior knowledge is utilized during learning. However, such a result fails to reflect the true similarity between images. With this result, difficulties may occur when dealing with out-of-samples (new data points) [31,30]. For example, suppose we are given a new teapot image whose camera direction is located in-between the first and last images. Ideally, it should be embedded into some in-between position. But based on the learned shape in Fig. 22(b), it may be mapped near one of the two ends of the line segment. To stipulate the learned structure which can reflect the similarity between the first and last images, we add two relative

ARTICLE IN PRESS 1730

C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

Fig. 22. (a) The 180 teapot images which are sequentially illustrated from the left to the right; (b) the results obtained without global relative comparisons; (c) the results obtained with two specified relative comparisons derived from the 1st, 90th and 180th data points. In this experiment, k is set to be 3.

distance comparisons to S in (15): ky1  y180 kpky1  y90 k and ky180  y1 kpky180  y90 k. Fig. 22(c) shows the result learned with k ¼ 3. Now the line segment is changed to be a curved line segment. Thus, the global information is faithfully utilized.

6. Conclusion This paper proposes a purely geometrical algorithm for nonlinear dimensionality reduction. The relative distance comparisons derived from local neighborhoods are used to explore the local geometry of the manifold. Each relative distance comparison is treated as a constraint on the manifold to be learned. The task is formulated as a quadratically constrained quadratic programming (QCQP). The original QCQP problem is further slacked to a semidefinite programming (SDP), from which we can obtain a globally optimal embedding. Comparative experimental results illustrate the effectiveness of our method. In our framework developed from relative distance comparisons, one advantage is that satisfactory results can be obtained with a small number of nearest neighbors. This will be very useful in applications where the data are sparsely sampled from the manifold. Another advantage is that relative distance comparison can be easily used to specify the global structure of the manifold. This yields a mechanism for manifold learning with prior knowledge or learning according to the user specified geometry. However, the computational complexity of our method is very high. The reason is that solving the SDP problem is time consuming. In the future, we would like to develop coarse-tofine framework to reduce the computation time.

Acknowledgments This work is supported by the Project (Grant no. 60705022) of the National Natural Science Foundation of China. The anonymous reviewers have helped to improve the quality and representation of this paper. References [1] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. 15 (6) (2003) 1373–1396. [2] S.P. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press, UK, 2004. [3] M. Brand, Charting a manifold, in: Advances in Neural Information Processing Systems, vol. 15, MIT Press, Cambridge, MA, USA, 2003, pp. 961–968.

[4] A. Brun, H.J. Park, H. Kuntsson, C.F. Westin, Coloring of dt-mri fiber traces using Laplacian eigenmaps, in: Proceedings of International Conference on Computer Aided Systems Theory, Las Palmas, Spain, 2003, pp. 518–529. [5] C.J.C. Burges, Geometric methods for feature extraction and dimensional reduction, in: L. Rokach, O. Maimon (Eds.), Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, Kluwer Academic Publishers, Dordrecht, 2005. [6] T.F. Cox, M.A.A. Cox, Multidimensional Scaling, Chapman & Hall, London, England, 1994. [7] D.L. Donoho, C. Grimes, Hessian eigenmaps: locally linear embedding techniques for highdimensional data, Proc. Natl. Acad. Arts Sci. 100 (10) (2003) 5591–5596. [8] A.M. Farahmand, C. Szepesvari, J.-Y. Audibert, Manifold-adaptive dimension estimation, in: Proceedings of International Conference on Machine learning, Corvallis, OR, USA, 2007, pp. 265–272. [9] N. Gourier, D. Hall, J. Crowley, Estimating face orientation from robust detection of salient facial features, in: ICPR Workshop on Visual Observation of Deictic Gestures, 2004. [10] J. Ham, D.D. Lee, S. Mika, B. Schokopf, A kernel view of the dimensionality reduction of manifolds, in: Proceedings of International Conference on Machine learning, Banff, Canada, 2004, pp. 369–376. [11] O.C. Jenkins, M.J. Mataric, A spatio-temporal extension to isomap nonlinear dimension reduction, in: ICML, Banff, Alberta, Canada, 2004, pp. 441–448. [12] I.T. Jolliffe, Principal Component Analysis, Springer, NY, USA, 1986. [13] J. Lee, A. Lendasse, M. Verleysen, Nonlinear projection with curvilinear distances: isomap versus curvilinear distance analysis, Neurocomputing 57 (1) (2004) 49–76. [14] J. Lee, M. Verleysen, Nonlinear dimensionality reduction of data manifolds with essential loops, Neurocomputing 67 (1) (2005) 29–53. [15] J. Lee, M. Verleysen, Nonlinear Dimensionality Reduction, Springer, Berlin, 2007. [16] M. Niskanen, O. Silven, Comparison of dimensionality reduction methods for wood surface inspection, in: Proceedings of International Conference on Quality Control by Artificial Vision, Gatlinbura, TN, USA, 2003, pp. 178–188. [17] R. Rosales, G. Fung, Learning sparse metrics via linear programming, in: SIGKDD, NY, USA, 2006, pp. 367–373. [18] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290. [19] L.K. Saul, S.T. Roweis, Thinking globally, fit locally: unsupervised learning of low dimensional manifolds, J. Mach. Learn. Res. 4. [20] M. Schultz, T. Joachims, Learning a distance metric from relative comparisons, in: Advances in NIPS, MIT Press, Cambridge, MA, USA, 2004. [21] F. Sha, L.K. Saul, Analysis and extension of spectral methods for nonlinear dimensionality reduction, in: Proceedings of International Conference on Machine Learning, Bonn, Germany, 2005, pp. 784–791. [22] V. de Silva, J.B. Tenenbaum, Global versus local methods in nonlinear dimensionality reduction, in: Advances in Neural Information Processing Systems, vol. 15, MIT Press, Cambridge, MA, USA, 2003, pp. 721–728. [23] J. Sturm, Using sedumi 1.02, a Matlab toolbox for optimization over symmetric cones, Optim. Methods Software 11–12. [24] Y.W. Teh, S. Roweis, Automatic alignment of local representations, in: Advances in Neural Information Processing Systems, vol. 15, MIT Press, Cambridge, MA, USA, 2003, pp. 841–848. [25] J.B. Tenenbaum, V. de Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290. [26] M. Verleysen, Machine learning of high-dimensional data: local artificial neural networks and the curse of dimensionality, Technical Report, UCL, Available at: hhttp://www.dice.ucl.ac.be/verleyse/publications.htmi, 2001.

ARTICLE IN PRESS C. Zhang et al. / Neurocomputing 72 (2009) 1719–1731

[27] M. Vlachos, C. Domeniconi, D. Gunopulos, Non-linear dimensionality reduction techniques for classification and visualization, in: International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, 2002, pp. 645–651. [28] K.Q. Weinberger, L.K. Saul, Unsupervised learning of image manifolds by semidefinite programming, in: Proceedings of Computer Vision and Pattern Recognition, Washington, DC, USA, 2004, pp. 988–995. [29] K.Q. Weinberger, F. Sha, L.K. Saul, Learning a kernel matrix for nonlinear dimensionality reduction, in: Proceedings of International Conference on Machine Learning, Banff, Canada, 2004, pp. 888–905. [30] S.M. Xiang, F.P. Nie, Y.Q. Song, C.S. Zhang, C.X. Zhang, Embedding new data points for manifold learning via coordinate propagation, in: The 11th PacificAsia Conference on Knowledge Discovery and Data Mining, Nanjing, China, 2007, pp. 332–343. [31] S.M. Xiang, F.P. Nie, C.S. Zhang, C.X. Zhang, Spline embedding for nonlinear dimensionality reduction, in: European Conference on Machine Learning, Berlin, Germany, 2006, pp. 825–832. [32] L. Xiong, F. Wang, C.S. Zhang, Semi-definite Manifold Alignment, in: European Conference on Machine Learning, Warsaw, Poland, 2007, pp. 773–781. [33] M.-H. Yang, Face recognition using extended isomap, in: Proceedings of IEEE International Conference on Image Processing, Singapore, 2002, pp. 117–120. [34] J. Ye, R. Janardan, Q. Li, Gpca: an efficient dimension reduction scheme for image compression and retrieval, in: International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 2004, pp. 354–363. [35] J. Zhang, S.Z. Li, J. Wang, Nearest manifold approach for face recognition, in: Proceedings of International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, 2004, pp. 223–228. [36] Z. Zhang, H. Zha, Principal manifolds and nonlinear dimensionality reduction via tangent space alignment, SIAM J. Sci. Comput. 26 (1) (2004) 313–338. Chunxia Zhang received her B.S. degree from Shanxi University, China, in 1996 and M.S. degree from Yunnan Normal University, China, in 2000, and Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, China, in 2005. She is currently a lecturer of the Software School, Beijing Institute of Technology. Her research interests include information extraction, machine learning, etc.

1731

Shiming Xiang received his B.S. degree from Chongqing Normal University, China, in 1993 and M.S. degree from Chongqing University, China, in 1996, and Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, China, in 2004. From 1996 to 2001, he was a lecturer of Huazhong University of Science and Technology, Wuhan, China. He was a post Doctor in the Department of Automation, Tsinghua University until 2006. He is currently an Associate Professor in the Institute of Automation, Chinese Academy of Science. His research interests include computer vision, pattern recognition, machine learning, etc.

Feiping Nie received his B.S. degree from North China University of Water Conservancy and Electric Power, China, in 2000, and M.S. degree from Lanzhou University, China, in 2003. He is currently a Ph.D. candidate in the Department of Automation, Tsinghua University. His research interests focus on machine learning and its applications.

Yangqiu Song received his B.S. degree from Tsinghua University, China, in 2003. He is currently a Ph.D. candidate in the Department of Automation, Tsinghua University. His research interests focus on machine learning and its applications.