- Email: [email protected]

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs

UL-Isomap based nonlinear dimensionality reduction for hyperspectral imagery classiﬁcation Weiwei Sun a,⇑, Avner Halevy b, John J. Benedetto c, Wojciech Czaja c, Chun Liu d,e, Hangbin Wu d, Beiqi Shi d, Weiyue Li d a

College of Architectural Engineering, Civil Engineering and Environment, Ningbo University, Zhejiang 315211, China Department of Mathematics, Randolph-Macon College, Ashland, Virginia 23005, USA Department of Mathematics, University of Maryland, College Park, Maryland 20742, USA d College of Surveying and Geo-informatics, Tongji University, Shanghai 200092, China e Key Laboratory of Advanced Engineering Survey of NASMG, Shanghai 200092, China b c

a r t i c l e

i n f o

Article history: Received 22 June 2013 Received in revised form 7 December 2013 Accepted 17 December 2013

Keywords: Nonlinear dimensionality reduction UL-Isomap LIsomap Hyperspectral imagery classiﬁcation Vector quantization Landmark selection

a b s t r a c t The paper proposes an upgraded landmark-Isometric mapping (UL-Isomap) method to solve the two problems of landmark selection and computational complexity in dimensionality reduction using landmark Isometric mapping (LIsomap) for hyperspectral imagery (HSI) classiﬁcation. First, the vector quantization method is introduced to select proper landmarks for HSI data. The approach considers the variations in local density of pixels in the spectral space. It locates the unique landmarks representing the geometric structures of HSI data. Then, random projections are used to reduce the bands of HSI data. After that, the new method incorporates the Recursive Lanczos Bisection (RLB) algorithm to construct the fast approximate k-nearest neighbor graph. The RLB algorithm accompanied with random projections improves the speed of neighbor searching in UL-Isomap. After constructing the geodesic distance graph between landmarks and all pixels, the method uses a fast randomized low-rank approximate method to speed up the eigenvalue decomposition of the inner-product matrix in multidimensional scaling. Manifold coordinates of landmarks are then computed. Manifold coordinates of non-landmarks are computed through the pseudo inverse transformation of landmark coordinates. Five experiments on two different HSI datasets are run to test the new UL-Isomap method. Experimental results show that UL-Isomap surpasses LIsomap, both in the overall classiﬁcation accuracy (OCA) and in computational speed, with a speed over 5 times faster. Moreover, the UL-Isomap method, when compared against the Isometric mapping (Isomap) method, obtains only slightly lower OCAs. Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved.

1. Introduction Since the invention of the imaging spectrometer, the classiﬁcation problem of hyperspectral imagery (HSI) has attracted continued research interest. HSI classiﬁcation techniques are used in a variety applications, such as environment monitoring (Cheng et al., 2008; Eillis and Scott, 2004), vegetation mapping (Adam et al., 2012; Belluco et al., 2006; Brantley et al., 2011), geological surveying (Chabrillat et al., 2002; Chen et al., 2007) and land use analysis (Jin et al., 2012; Mathieu et al., 2009; Thenkabail et al., ⇑ Corresponding author. Address: College of Architectural Engineering, Civil Engineering and Environment, Ningbo University, 818 Fenghua Rd, Ningbo, Zhejiang 315211, China. Tel.: +86 18258796120. E-mail addresses: [email protected] (W. Sun), [email protected] (A. Halevy), [email protected] (J.J. Benedetto), [email protected] (W. Czaja), [email protected] (C. Liu), [email protected] (H. Wu), [email protected] (B. Shi), [email protected] (W. Li).

2005). However, numerous bands as well as strong intra-band correlations cause HSI data to have the ‘‘curse of dimensionality’’ (Plaza et al., 2005; Scott, 1992). Therefore, reducing the dimensionality of HSI data before classiﬁcation can improve results (Chen and Qian, 2009). Dimensionality reduction methods can be divided into two main categories, feature selection and feature extraction (Jia et al., 2013). Feature selection selects the best band combinations and feature extraction preserves most important spectral features using mathematical transformations. Feature extraction includes linear methods and nonlinear methods. Linear methods such as principal component analysis and linear discriminant analysis behave well in HSI data and need short computational time. Nonlinear manifold learning methods extract features by reconstructing the underlying manifold from which the HSI data was sampled and require high computational complexity (Belkin and Niyogi, 2003; Tenenbaum et al., 2000). Manifold learning methods are

0924-2716/$ - see front matter Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.isprsjprs.2013.12.003

26

W. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 89 (2014) 25–36

better suited to HSI data because of the nonlinear structure of HSI data that originates from multi-scattering and the heterogeneity of pixels (Bachmann et al., 2005). Therefore, we focus our research on dimensionality reduction using manifold learning methods. Currently, many manifold learning methods are available, such as Isometric mapping (Isomap) (Tenenbaum et al., 2000), locally linear embedding (LLE) (Roweis and Saul, 2000), local tangent space alignment (LTSA) (Zhang and Zha, 2003), laplacian eigenmaps (LE) (Belkin and Niyogi, 2003) and self-organizing maps (SOMs) (Tamayo et al., 1999). Manifold learning assumes that the high dimensional data is evenly sampled from a uniform manifold, and it tries to ﬁnd the underlying manifold by mapping the original dataset into a predeﬁned low-dimensional embedded space (Tenenbaum et al., 2000). Of the listed methods, Isometric mapping (Isomap) is representative of global methods, since it preserves the global geometric features of the initial data. Before its applications in HSI data, Isomap had been used in applications such as facial recognition (Yang, 2002), handwritten digit image classiﬁcation (Tenenbaum et al., 2000) and web image retrieval (Datta et al., 2008). Isomap has been used in a variety of ways with HSI data. Bachmann used Isomap to extract underlying manifold features in HSI ocean data (Bachmann et al., 2005, 2006, 2009). Chen and Crawford used Isomap to extract manifold features to achieve more accurate land cover classiﬁcation results (Chen et al., 2005, 2006; Crawford et al., 2011). Uto investigated the non-linear Isometric manifolds in spectral distribution of a set of leaves and estimated the lambert parameter of HSI data on leaf-scale (Uto and Kosugi, 2013). The above work greatly improved the applications of Isomap to HSI data. However, the Isomap method has a problem of computational cost, which increases polynomially with the number of pixels N at a rate of O(N3) (Tenenbaum et al., 2000). This brings about a long computational time even with a HSI dataset that has a small image scene. For example, on a computer with a Pentium IV 3.06 GHZ processor, the CPU time of Isomap with PROBE2 hyperspectral dataset having 100 100 pixels and 114 bands reaches up to 920 s (Bachmann et al., 2005). To address the computational complexity of Isomap, Silva proposed the landmark Isometric mapping (LIsomap) method (Silva and Tenenbaum, 2003). LIsomap randomly selects some pixels from the whole dataset as landmarks and builds the shortest path graph between each pair of pixels and landmarks rather than between all pairwise pixels. Compared with Isomap, LIsomap has a lower complexity both in the shortest path graph construction and the eigenvalue decomposition in multidimensional scaling (MDS). However, two main challenges still exist with the LIsomap method. (1) The shortcoming of random landmarks. The random scheme in landmark selection assumes pixels in the spectral space are located uniformly, and hence evenly selects landmarks from HSI data. That contradicts the reality of HSI data. Experiments show that pixels in the spectral space are scattered

as clusters with high variations in local density (Bachmann et al., 2006). In this situation, random landmarks are mainly located in high-density regions, and locally low-density clusters have few landmarks. Random landmarks then cannot represent the real geometric structures of HSI data. LIsomap embeddings miss important information from the initial data, and therefore the further classiﬁcation result suffers. (2) The slow computational speed of LIsomap. Even though LIsomap has a lower complexity cost than Isomap, the processing time for the HSI dataset with a large image size is still expensive. Moreover, the number of landmarks greatly affects the computation complexity of LIsomap. For example, Table 5 shows that on a Mac OS X computer with 2 2.26 GHZ Quad Core Processor and 16 GB RAM, for an AVIRIS dataset having 145 145 pixels and 172 bands, the CPU time of LIsomap using 10% of pixels as random landmarks reaches up to 359.37 s. In realistic applications, the HSI dataset consist of millions of pixels and a large number of landmarks, and therefore it is of practical importance to reduce the complexity of LIsomap. Some strategies have been presented to address the above two problems. Bachmann proposed the ENH-Isomap (enhanced Isometric mapping) method, a method based on LIsomap, to improve the manifold representations of large-scale HSI data (Bachmann et al., 2006). The innovations of the ENH-Isomap method include fast neighborhood calculations using vantage point forests, landmark selection via skeletonization, and improved backbone approach of scaling and merging manifold coordinates. ENH-Isomap reduces the complexity and memory requirements of LIsomap in large-scale HSI data analysis. However, the skeletonization scheme does not guarantee that the selected landmarks fully span the embedding space. Also, the scaling and merging schemes cannot accurately preserve the geometric structure between each pair of landmarks and pixels, which in a sense contradicts the idea of LIsomap. Moreover, the classiﬁcation accuracies of manifold coordinates that are achieved from improved backbone approach need to be veriﬁed by further experiments. Finally, the strategies used by ENH-Isomap are too complicated for the realistic applications. To address the shortcomings cited above, we propose a method named upgraded landmark-Isometric mapping (UL-Isomap). First, we use vector quantization (VQ) to improve the landmark selection (Gray, 1984). The VQ method considers the variations in local density of pixels in the spectral space and achieves time-invariant landmarks which fully span the embedded space of HSI data. Second, we accelerate the computational speed of LIsomap via three schemes: random projections (Baraniuk et al., 2008; Baraniuk and Wakin, 2009), fast approximate neighborhood construction using Recursive Lanczos Bisection (RLB) (Chen et al., 2009), and fast randomized low-rank singular value decomposition (SVD) (Rokhlin et al., 2010). Random projections map HSI data into a lower

Table 1 The contrast in computational complexity between UL-Isomap and LIsomap. Steps

Step 1: The landmark selection Step 2: Reducing bands with random projection Step 3: The KNNs searching Step 4: The geodesic distance graph construction Step 5: The eigenvalue decomposition of MDS Step 6: The computation of non-landmark coordinates Total complexity

Computational complexity UL-Isomap

LIsomap

O(3tnN) O(DPN) O(PNa) O(nNlogN + N2k) O(dn2) O(nd2 + n(N–n))

O(DN2) O(nN log N + N2k) O(n3) O(nd2 + n(N–n))

O((3t + 1 + log N)nN) + O(P(DN + Na)) + O(N2k + (d1)n2) + nd2) O(DN2) + O(nN log N + N2k) + O(n3) + O(nd2 + n(N–n))

27

W. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 89 (2014) 25–36 Table 2 The ground truth information in training and testing samples for each class in Indian and Urban datasets. Class Label

Sample Name

Class

Sample

Train

Test

Label

Name

Train

Test

(a) Indian dataset 1 Alfalfa 2 Corn-notill 3 Corn-min 4 Corn 5 Grass/pasture 6 Grass/trees 7 Grass/pasture-mowed 8 Hay-windowed Total 2052

9 286 166 47 97 146 7 96 8197

37 1142 664 190 386 584 21 382

9 10 11 12 13 14 15 16

Oats Soybeans-notill Soybeans-min Soybeans-clean Wheat Woods Bldg-grass-tree drives Stone-steel towers

4 194 491 119 41 253 77 19

16 778 1964 474 164 1012 309 74

(b) PaviaU dataset 1 Asphalt 2 Meadows 3 Gravel 4 Trees 5 Painted metal sheets

839 437 420 310 269

3356 1748 1679 1240 1076

6 7 8 9 Total

Bare soil Bitumen Self-blocking bricks Shadows 4202

1006 266 469 186 16,087

4023 1064 1878 743

Table 3 The parameter settings of all experiments on Indian and PaviaU datasets. Experiments

Experiment (1) Experiment (2)

Experiment (3) Experiment (4) Experiment (5)

Methods

VQ-LIsomap LIsomap LIsomap VQ-LIsomap UL-LIsomap Isomap UL-Isomap LIsomap Random Projections UL-Isomap

Dataset Indian dataset

PaviaU dataset

d = 2–80; k = 10%; k = 30; r = 500; r = 0.005 d = 2–80; k = 10%; k = 30 k = 10–50%; d = 50; k = 30 k = 10–50%; d = 50; k = 30; r = 500; r = 0.005 k = 10–50%; d = 50; k = 30; r = 500; r = 0.005; P = 80; a = 0.15; d = 50; k = 30 k = 10–50%; d = 50; k = 30; r = 500; r = 0.005; P = 100; a = 0.1 k = 10–50%; d = 50; k = 30 P = 1–160

d = 2–80; k = 10%; k = 20; r = 500; r = 0.007 d = 2–80; k = 10%; k = 20 k = 2.73–13.6%; d = 40; k = 25 k = 2.73–13.6%; d = 40; k = 25; r = 500; r = 0.01 k = 2.73–13.6%; d = 40; k = 25; r = 500; r = 0.01; P = 60; a = 0.1 d = 40; k = 25 k = 2.73–13.6%; d = 40; k = 20; r = 500; r = 0.007; P = 60; a = 0.2 k = 2.73–13.6%; d = 40; k = 20 P = 1–100

a = 0.05–0.4; d = 55; k = 30; r = 500; r = 0.005; P = 80; k = 20%

a = 0.05–0.4; d = 35; k = 25; r = 500; r = 0.01; P = 60; k = 5%

dimensional space and reduce the dimension of input data for the UL-Isomap method. The RLB method speeds up the k-nearest neighbors (KNNs) graph construction and lowers its complexity from O(DN2) to O(DNa), where a = 1/(1 log2(1 + a)) for a mutually selected overlapping parameter a. D and N are the number of bands and pixels in the image scene, respectively. The fast randomized low-rank approximate SVD decomposition decreases the complexity of MDS eigenvalue decomposition from O(n3) to O(dn2), where d is the dimension of embeddings and n is the number of landmarks. The rest of the paper is organized as follows. Section II brieﬂy reviews the LIsomap method in HSI data. Section III presents the strategies of the UL-Isomap method. Section IV analyzes ﬁve groups of experiments on two different HSI datasets that test our new method. Section V draws conclusions and discusses future work. 2. The LIsomap method in HSI data Isomap constructs the geodesic distance graph from the KNNs graph using the Dijkstra algorithm. It then uses eigenvalue decomposition of MDS on the geodesic distance matrix to achieve lowdimensional embeddings (Tenenbaum et al., 2000). Isomap tries to preserve the Euclidean distances between pairwise pixels in the embeddings. Speciﬁcally, it tries to preserve the same graph distances between homologous pixels in the spectral space.

However, two computational bottlenecks exist in Isomap: the shortest path graph construction and the MDS eigenvalue decomposition. LIsomap was proposed to improve the computational speed of Isomap (Silva and Tenenbaum, 2003). LIsomap tries to preserve geometric structures between landmarks and all the pixels. Let real vector sets X = [x1, , xN]T e RD and Y = [y1, , yN]T e Rd represent the HSI dataset and the low-dimensional embedding respectively, where N is the number of pixels, and D and d denote the number of bands and dimension of embedding respectively. Let n the landmark set be XL ¼ fxtl gt¼1 2 X with n 6 N; and let manifold coordinates of landmarks and non-landmarks be YL and YC = Y/YL, respectively. The LIsomap algorithm in HSI data includes four main steps. First, the KNNs graph is constructed using Euclidean distance. If xi and xj lie within a neighborhood deﬁned by a set of KNNs, the length of edge between them is the Euclidean distance; otherwise, the length is 0. Second, based on the selected random landmark set XL and the neighborhood graph, the geodesic distance matrix DnN between landmark set XL and the total pixels X are calculated. For each landmark xtl and each pixel xi within the same neighborhood, the geodesic distance is the Euclidean distance. Otherwise, the geodesic distance between them is replaced by the shortest path distance using the Dijkstra algorithm (Golden, 1976; Knuth, 1997). During the process, only matrix DnN of geodesic distance is computed, which reduces the computational

28

W. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 89 (2014) 25–36

Table 4 The comparison in OCAs (%) and standard deviations (Stds.) among several dimensionality reduction methods using different classiﬁers and different proportions of landmarks on Indian and PaviaU datasets (k means the proportions of landmarks in the total pixels). Data

Indian dataset

k (%)

Classiﬁers

10

20

30

Average PaviaU dataset

2.73

5.50

8.18

Average

OCAs (Stds.) LIsomap

VQ-LIsomap

UL-Isomap

Isomap

KNN NB SVM KNN NB SVM KNN NB SVM 74.66(±0.030)

70.73(±0.027) 73.12(±0.032) 78.72(±0.025) 71.02(±0.019) 73.73(±0.037) 79.24(±0.042) 71.46(±0.020) 74.08(±0.048) 79.86(±0.024) 80.51(±0.032)

75.56(±0.018) 79.97(±0.026) 85.12(±0.042) 75.91(±0.033) 80.24(±0.043) 85.33(±0.025) 76.24(±0.029) 80.48(±0.041) 85.75(±0.028) 79.56(±0.034)

74.44(±0.032) 79.04(±0.036) 83.86(±0.029) 74.87(±0.030) 79.36(±0.042) 84.62(±0.039) 75.30(±0.040) 79.63(±0.027) 84.91(±0.031) 81.03(±0.037)

76.82(±0.038) 80.57(±0.42) 85.72(±0.030) 76.82(±0.038) 80.57(±0.042) 85.72(±0.030) 76.82(±0.038) 80.57(±0.042) 85.72(±0.030)

KNN NB SVM KNN NB SVM KNN NB SVM 82.54(±0.033)

82.76(±0.019) 79.32(±0.023) 84.61(±0.028) 82.93(±0.032) 79.71(±0.026) 84.97(±0.037) 83.32(±0.039) 79.96(±0.045) 85.32(±0.041) 88.31(±0.040)

88.64(±0.047) 84.56(±0.036) 90.43(±0.035) 88.82(±0.041) 85.17(±0.033) 91.29(±0.032) 89.09(±0.050) 85.32(±0.047) 91.45(±0.035) 87.27(±0.036)

87.27(±0.034) 83.74(±0.032) 89.28(±0.041) 88.04(±0.038) 84.01(±0.036) 89.86(±0.031) 88.27(±0.040) 84.36(±0.042) 90.59(±0.029) 89.00(±0.033)

89.47(±0.032) 86.03(±0.037) 91.49(±0.030) 89.47(±0.032) 86.03(±0.037) 91.49(±0.030) 89.47(±0.032) 86.03(±0.037) 91.49(±0.030)

Table 5 The comparison in the computational speed between UL-Isomap and LIsomap (The Ratio means the quotient of the computational time in LIsomap divided by that of UL-Isomap, and k means the proportions of landmarks in the total pixels of HSI data). Data

k (%)

Computational time (Seconds)

Ratio (LIsomap/UL-Isomap)

UL-Isomap

LIsomap

Indian dataset

10 20 30 40 50

71.46 127.95 252.93 345.05 492.39

359.37 674.09 1.452e + 03 2.771e + 03 6.982e + 03

5.03 5.27 5.74 6.58 14.18

PaviaU dataset

2.73 5.50 8.18 10.9 13.6

1.528e + 03 4.663e + 03 9.062e + 03 1.572e + 04 2.290e + 04

8.657e + 03 3.460e + 04 8.521e + 04 1.651e + 05 2.720e + 05

5.67 7.42 9.40 10.05 11.80

complexity from O(N3) to O(nN log N). Third, the d-dimensional embedded coordinates of landmarks YL are found with MDS. The solutions of YL are the eigenvectors of the ﬁrst d-largest eigenvalues of the inner-product matrix Dnn = HDGH/2, where DG is the squared distance matrix between pairwise landmarks, and H is the mean-centering matrix. Finally, the coordinates of non-landmarks YC are obtained through the transformation via þ Y C ¼ 12 Y þ L ðD DnN Þ, where Y L is the pseudo inverse of landmark is the column mean of geodesic distance coordinates YL, and D matrix DnN. The number of landmarks n is larger than the dimension of embeddings d, in order to locate non-landmark points uniquely in the embedded space. Moreover, the selection of landmarks should guarantee that the counterparts in the embedding fully span the sparse d-dimensional space.

3. The UL-Isomap method in HSI data In this section, the four main parts of the UL-Isomap method are presented, which together address the above shortcomings of the LIsomap method. The proper landmarks from HSI dataset consider the high variations in local density and always fully span the geometric space of the dataset. First, UL-Isomap uses the VQ method

to select proper landmarks in the HSI data. Second, the method uses random projections to reduce the bands of HSI data. Third, it improves the speed of KNNs graph construction via the fast approximate method using RLB. Fourth, the method uses fast randomized low-rank approximate SVD decomposition algorithm to reduce the complexity of eigenvalue decomposition in MDS. The section closes by summarizing the UL-Isomap method. 3.1. Proper landmarks with the VQ method The random landmark selection scheme selects a certain number of landmarks at random. The selected landmarks conform to the probability distribution of points, and points in higher density regions are then more likely to be chosen as landmarks. Within a Gaussian random point set, points in any certain dimensional space have high variations in local density and are always clustered in the centering region. For the HSI dataset, experiments show that pixels in high dimensional space are frequently compounded by multiple clusters with high variations in local density. The random points then can be regarded as a good choice to explain detailed local distributions of pixels within high dimensional space. Considering better effect from intuitive expression, we therefore implement two-dimensional Gaussian random points to illustrate the random landmark selection method. The plots

W. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 89 (2014) 25–36

show variations in the local density of points. Fig. 1(a–c) shows the distributions when changing the proportions of random landmarks k in the total data points from 10% to 50%. In Fig. 1(a), the random landmarks are located mainly in the central regions which have the highest density of points and do not cover most of the outlying regions which have a lower point density. Fig. 1(a–c) shows that the increasing proportions k ameliorate the spread of the random landmarks, but the random landmarks still miss some important outlying points that are highlighted with red ellipses. That means the random scheme cannot guarantee that even a large number of landmarks will fully span the geometric space of the total data points. The distributions of pixels in two-dimensional space are more complicated than those of random points in above ﬁgures. Accordingly, when using the random landmark selection method, the landmarks will neglect some important pixels which are in lower local density regions and will degrade classiﬁcation results. Therefore, the random landmark scheme is not an appropriate landmark selection scheme for HSI data. In this paper, we introduce the VQ method to quantize the landmarks for UL-Isomap of HSI data. The VQ method is a well-known method for lossy data compression that considers the local density of points (Gray, 1984). The method has been widely used in applications such as signal compression (Gersho and Gray, 1992), pattern recognition, and texture classiﬁcation (Sertel et al., 2008). VQ aims to replace the original dataset with a smaller set of points called centroids (or code vectors) and a codebook of centroids. The probability density function of the centroids resembles that of the initial dataset (Lee and Verleysen, 2007). Fig. 1(d–f) illustrates the distributions of the VQ landmarks when changing the percentage of data points that are landmarks from 10% to 50%. The Linde– Buzo–Gray (LBG) algorithm explained below is utilized to minimize the quantization distortion in the VQ method. The data points in the ﬁgures are sampled from the same two-dimensional Gaussian distribution as the data points of Fig. 1(a–c). From Fig. 1(d–f), we observe the density of VQ landmarks changes with the variations in local density of the total points. The VQ landmarks represent well the distribution of the total points, despite a smaller number of landmarks. Let the HSI dataset be X = [x1, , xN]T e RD, and let the corren sponding codebook be C ¼ fcðjÞgj¼1 2 RD with n << N. The VQ method approaches the original density of HSI data by minimizing the quantization distortion. Currently, many algorithms are available for minimizing the quantization distortion, such as the LBG algorithm (also called Generalized Llyod Algorithm, GLA) (Linde et al., 1980), the pairwise nearest neighbor (PNN) algorithm (Equitz, 1989), the Mean-distance-ordered Partial Codebook Search (MPS) algorithm (Ra and Kim, 1993), and the Enhanced LBG algorithm (Patan and Russo, 2001). In this paper, we use the LBG algorithm to minimize the quantization distortion because it is both fast and effective. The complexity of LBG only scales to O(3tnN) or Oð3tkN 2 Þ; where t is the iteration times which depends on the threshold of total distortion r, and k is the proportions of landmarks with k ¼ n=N. In the LBG algorithm, the initial codebook of HSI data is obtained by randomly selecting n pixels from X. For each iterative step, the LBG algorithm encodes each pixel into the discrete Voronoi regions Vj of the nearest code vector c(j) using the Euclidean distance measurement and then updates c(j) by P 1 moving it into the new centroid of Vj through cðjÞ xi 2V j xi . jV j j The iterations stop when the quantization distortion converges to the desired threshold using Eq. (1):

EVQ ¼

n n X 1X 1X EjVQ ¼ kxi cðjÞk2 N j¼1 N j¼1 x 2V i

ð1Þ

j

Eq. (1) shows that the LBG algorithm is similar in spirit to K-means clustering, since both of them try to minimize the total distortion.

29

However, the LBG algorithm for vector quantization does differ from the K-means clustering. First, the methods differ in the way they use the spectral vector, and hence their resulting quantizers are different (Linde et al., 1980). Both methods try to partition spectral vectors of HSI data into several regions. At each iteration, the Kmeans clustering sequentially incorporates a new spectral vector into the partitions and the procedure stops when the last vector is incorporated. Therefore, a quantizer is not produced until the procedure is stopped. In contrast, the LBG algorithm uses all of spectral vectors at each iteration to update the Voronoi regions, and then produces a quantizer. In other words, the clustering result from Kmeans changes with time, whereas code vectors from the LBG algorithm are time-invariant (Gersho and Gray, 1992). Second, the theoretical assumptions of both methods are different (Linde et al., 1980). The theorem of K-means requires the pixels in HSI data be inter-independent in order to guarantee the total distortion converges into local optimum. However, the LBG algorithm has no restrictions on the inter-dependence of spectral vectors. The calculated centroids are not pixel points from the initial HSI data, and therefore the pixel nearest to the centroid of each Voronoi region is set as the VQ landmark. The spectrum angle is used as the similarity measure between pixels and centroids. 3.2. Reducing bands of HSI data using random projections Random projections are a fast and an accurate dimensionality reduction method for high dimensional data. It has been used in the classiﬁcation of text data (Bingham and Mannila, 2001), facial recognition (Bouzalmat et al., 2011), and anomaly detection and imagery reconstruction of HSI data (Du et al., 2011; Fowler and Du, 2012). We use random projections to reduce the bands of HSI data and thereby speed up the LIsomap method. The Johnson–Lindenstrauss Lemma provides the theoretical support for the use of random projections. The lemma states that if points from a high dimensional manifold-modeled dataset X = [x1, , xN]T are projected onto a randomly selected subspace with a moderate dimension P, the geodesic distances between pairwise points are approximately preserved with high probability (Baraniuk et al., 2008; Baraniuk and Wakin, 2009). Random projections map HSI data XT onto a P-dimensional subspace with P << D, using a random matrix U whose columns have unit lengths. The process of random projections is illustrated with Eq. (2)

X RP ¼ U X T PN

PD ND

ð2Þ

where U is the random matrix, and XRP is the projected HSI data in the P-dimensional space. Random projections have a low computational cost of O(DPN) and are linear and non-adaptive. The mapping result relies on the matrix U. The random matrix U usually consists of the Gaussian entries that are independently and identically distributed with zero mean and 1/P variance. The independent and identical distribution of random components in U guarantees its columns are almost independent and orthogonal. In UL-Isomap of HSI data, random projections reduce the dimension of HSI data from RD to RP, while approximately preserving its inner geodesic distances. 3.3. Fast approximate KNNs graph construction via RLB The computational complexity of regular KNNs graph construction in LIsomap scales as O(DN2) (Kleinberg, 1997). The RLB method is used instead to construct a fast approximate KNNs graph and provide a speed improvement. The method only slightly compromises the accuracy of KNNs graph (Chen et al., 2009). The RLB algorithm recursively divides the HSI dataset into two overlapping subsets, computes the KNNs graph for each subset,

30

W. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 89 (2014) 25–36

Fig. 1. The illustration of random landmarks and the VQ landmarks with different proportions in the total data points (a) 10% of random landmarks, (b) 30% of random landmarks, (c) 50% of random landmarks, (d) 10% of VQ landmarks, (e) 30% of VQ landmarks, and (f) 50% of VQ landmarks.

and then merges the results into a ﬁnal KNNs graph. Details of the algorithm follow. First, the HSI dataset is recursively divided into halves with overlaps. The size of overlap between two subsets is controlled with a parameter a. The spectral bisection algorithm splits the pixels into two subsets using the hyperplane based on the largest singular triplet of the centered data (Boley, 1998). Let XS = [x1, ..., xq] be the centered HSI dataset, and let (#, u, v) be the largest singular triplet of XS using the Lanczos algorithm with uTXS = #vT (Berry, 1992; Lanczos, 1950). For any hyperplane wTXS = 0, the squared sum kwT X S k22 6 kX S k22 ¼ #2 maximizes when the unit vector w = u. The optimal separation is then achieved. Second, if the size of subset is less than the predeﬁned threshold r (e.g. r = 500), the approximate KNNs graph is computed with the regular method. Finally, the approximate KNNs graph of each subset is merged to constitute the whole graph. If a pixel point belongs to more than one subset, then its KNNs are selected from its neighbors in each subset. Due to the divide-and-conquer nature of the RLB approach, the fast approximate KNNs method combined with random projections reduces the computational complexity of regular KNNs graph construction from O(DN2) to O(PNa), where a = 1/(1 log2(1 + a)) and 0 < a < 1, for a certain predeﬁned overlapping parameter a.

3.4. Fast randomized low-rank approximate SVD decomposition in MDS In MDS, the eigenvalue decomposition of inner-product matrix has a computational complexity of O(n3) when using the regular algorithm, where n is the number of landmarks. The fast randomized low-rank approximate algorithm is adopted to reduce the complexity to O(dn2), where d is the dimension of embeddings.

The algorithm also provides accurate approximate low-rank eigenvectors of the inner-product matrix (Rokhlin et al., 2010). Let the inner-product matrix in L-MDS be Dnn. The algorithm ﬁnds the best d-rank approximation to D with d < < n, which is shown in Eq. (3)

Dnn U nd Rdd V Tnd

ð3Þ

where R is the matrix with the d greatest eigenvalues appear in decreasing order on the diagonal and U is the corresponding eigenvectors of the d greatest eigenvalues. Let l be an integer with l = d + 2 and l < n d. First, the real random matrix Gln is constructed with entries that are zero mean and unit variance, and the product matrix Rln is computed with R = G(DDT)D. Second, the SVD decomposition of RT is used to form a real matrix Qnd, where Qnd is the leftmost n d block of eigenvectors of RT and the product matrix Tnd is computed with T = DQ. Third, the SVD decomposition of T is performed as T = URWT, where Und and Wdd are real matrixes with orthonormal columns and Rdd is a real nonnegative matrix with zero entries off the diagonal. Finally, the product matrix Vnd is obtained with V = QW. The eigenvectors Und is the desired manifold representations of landmarks YL in UL-Isomap. 3.5. Summary of the UL-Isomap method The UL-Isomap method improves LIsomap in two ways: landmark selection and computational speed. The VQ method, using LBG, selects proper landmarks for UL-Isomap and improves the classiﬁcation results. Three schemes are adopted to reduce the computational complexity: (1) random projections reduce the dimension of HSI data from RD to RP and speed up the KNNs graph

W. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 89 (2014) 25–36

construction; (2) fast approximate KNNs graph construction via RLB reduces the computational complexity from O(PN2) to O(PNa); (3) the fast randomized low-rank approximate SVD decomposition algorithm reduces the computational complexity in eigenvalue decomposition of the inner-product matrix from O(n3) to O(dn2). The proposed method of UL-Isomap is shown in Fig. 2. It consists of the following steps: (1) The VQ landmarks for HSI data are found using the LBG algorithm by minimizing (1). (2) The HSI data is preprocessed with random projections in (2) in order to reduce the number of bands. (3) The fast approximate KNNs graph is constructed with the RLB algorithm. (4) The shortest path distances between landmarks and the total pixels are computed via the Dijkstra algorithm, and the geodesic distance graph is produced. (5) In MDS, manifold coordinates of landmarks YL are obtained with the fast randomized low-rank approximate SVD decomposition of inner-product matrix D using (3). (6) The manifold coordinates of non-landmarks YC are computed through the coordinate transformation of landmarks, and the total manifold coordinates Y from UL-Isomap are produced. Table 1 compares the computational complexity of UL-Isomap and LIsomap, where a = 1/(1 log2(1 + a)) and 0 < a < 1; P is the projected dimension from random projections; N and n stand for the number of pixels and landmarks; t is the iteration times for the LBG algorithm; k denotes the neighborhood size; and d is the dimension of embeddings. We observe from the table that the VQ landmark selection only contributes a small amount to the total complexity of UL-Isomap. Moreover, the computational complexity of UL-Isomap is lower than the LIsomap method.

31

computational speed of the proposed UL-Isomap method. Section 4.1 describes the detailed information of both datasets. And Section 4.2 lists and analyzes the results from ﬁve groups of experiments in sequence. 4.1. Description of datasets The Indian dataset was downloaded from the website of Multispectral Image Data Analysis System at Purdue University (available at https://engineering.purdue.edu/~biehl/MultiSpec/ aviris_documen-tation.html). It was AVIRIS data collected by JPL and ﬂown by NASA. The dataset was acquired on June 12, 1992, with about 20 m spatial resolutions and 10 nm spectral resolutions covering the spectral range from 200 to 2400 nm. The image scene shown in Fig. 3 is a subset of a larger image that covers an area 6 miles west of West Lafayette, Indiana. The subset contains 145 145 pixels. Preprocessing of the dataset included radiometric correction and bad band removal. This left 172 bands with the calibrated DN value proportional to radiance and reﬂectance. The image scene has sixteen classes of ground objects, and the ground truth information for training and testing samples in each class is listed in Table 2(a). The PaviaU dataset was taken from the Computational Intelligence Group from the Basque University (UPV/EHU) (available at http://www.ehu.es/ccwintco/index.php/Hyperspectral_Remote_ Sensing_Scenes). The dataset covers the area of Pavia University, with 115 bands and 1.3 m spatial resolutions. The image shown in Fig. 4 is a small subset of the larger dataset, containing 350 340 pixel sizes. Processing work included the removal of 12 most noisy bands and radiometric correction, and the left 103 bands with the DN value proportional to radiance and reﬂectance are used for the experiments. Nine classes of ground objects exist in the image scene, including shadows, and the ground truth information for training and testing samples in each class is listed in Table 2(b).

4. Experimental results and analysis 4.2. Experimental results In this section, ﬁve groups of experiments using two different datasets are run to analyze the classiﬁcation accuracies and

Fig. 2. The method of UL-Isomap.

We conduct ﬁve groups of experiments with the above two datasets, in order to test our new method. First, we compare the classiﬁcation accuracies between VQ-LIsomap and LIsomap by changing the embedding dimension d. VQ-LIsomap is an advanced edition of LIsomap that are improved by replacing random landmarks with the VQ landmark selection scheme. The difference

Fig. 3. The image of Indian dataset.

32

W. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 89 (2014) 25–36

Fig. 4. The image of PaviaU dataset.

between VQ-LIsomap and UL-Isomap is that VQ-LIsomap only includes the VQ landmark selection scheme whereas the UL-LIsomap includes all the four improvement schemes in Section 3. The experiment aims to analyze the effect of replacing the random landmark selection with the VQ landmark selection in terms of classiﬁcation accuracy. Second, we make comparisons between classiﬁcation accuracies of LIsomap, VQ-LIsomap, UL-Isomap and Isomap. The experiment is to test the classiﬁcation performance of UL-Isomap when changing the proportions of landmarks. Third, to test the computational speed of UL-Isomap, we compare the computational times of UL-Isomap and LIsomap when changing the percentage of landmarks in the total pixels. Fourth, we investigate the relationships between projected dimension P in random projections and the classiﬁcation accuracies of HSI data. The experiment helps to select a proper P to guarantee the classiﬁcation performance of HSI data. Finally, the relationships between overlapping parameter a in fast approximate KNNs graph construction and classiﬁcation accuracies of UL-Isomap are analyzed. The ﬁnal experiment aims to select a proper a to guarantee the classiﬁcation performance of UL-Isomap for HSI data. We do not analyze the susceptibility of fast randomized low-rank approximate SVD decomposition on the performance of UL-Isomap because it has no parameters included. Three widely used classiﬁers are utilized in the experiments including KNN (Cover and Hart, 1967), Naïve Bayes (NB) (McCallum and Nigam, 1998), and Support Vector Machine (SVM) (Steinwart and Christmann, 2008) classiﬁers. The overall classiﬁcation accuracy (OCA) is used to measure the classiﬁcation

performances of all methods in the experiments. The measurement of Euclidean distance is used in the KNN classiﬁer. The RBF kernel function is used in the SVM classiﬁer, where the variance parameter and the penalization factor are obtained via cross-validation. For each dataset, we repeatedly sub-sample the training samples and testing samples ten times. The following results without notations are the average results of these ten different and independent experiments. (1) Effect of VQ landmarks on classiﬁcation accuracy across a range of d. The experiment explores the effects of using VQ landmarks on OCAs of LIsomap. We compare the overall classiﬁcation accuracies (OCAs) of LIsomap and VQ-LIsomap using the three classiﬁers mentioned above. For each dataset, the range of embedded dimension d is manually set between 2 and 80, the proportion of landmarks k is manually set as 10% of the total pixels, and the threshold r in the RLB algorithm is manually set as 500. Using cross-validation, the neighborhood size k of Indian dataset is set as 30, and the threshold of total distortion r in LBG algorithm is set as 0.005. For the PaviaU dataset, using cross-validation, the neighborhood size k is set as 20, and the threshold of total distortion r is set as 0.007. Table 3 shows detailed information about the parameters in both methods. The comparison in OCAs between VQ-LIsomap and LIsomap for both datasets is shown in Fig. 5. We observe from these ﬁgures that for each classiﬁer and each dataset, the VQ-LIsomap signiﬁcantly outperforms LIsomap in OCAs, despite the change in the dimension of embeddings. Furthermore, we compare the classiﬁcation accuracies in each class from both embeddings at a manually selected dimension. The results are shown in Fig. 6. The dimensions of embeddings for the Indian and PaviaU datasets are selected, using cross-validation, as 50 and 40 respectively. We observe from Fig. 6(a–c), for all three classiﬁers, the VQ-LIsomap classiﬁcation accuracies of most classes in the Indian dataset are higher than those of LIsomap, especially in class 9. This indicates the VQ landmarks improve the LIsomap embeddings, and ground objects in each class are better differentiated in the manifold space. The classiﬁcation results of PaviaU dataset in Fig. 6(d–f) conﬁrm this observation. We conclude that the VQ landmarks improve the embedding results of LIsomap and provide better classiﬁcation result than random landmarks do. (2) The comparison in classiﬁcation accuracy between Isomap versions at different percentage of landmarks. Compared with VQ-LIsomap, the classiﬁcation accuracy improvement of UL-Isomap has no clear correlation with the dimension of embeddings. We therefore focus on the UL-Isomap embedding at a selected dimension. In contrast, ignoring the computational cost, the percentage of landmarks correlates strongly with the embedding result. Therefore, we explore the classiﬁcation performance of UL-Isomap by changing the percentages of landmarks. We then compare the results with those of VQ-LIsomap and LIsomap. For all the methods, the

Fig. 5. The contrast in OCAs between the VQ-LIsomap and LIsomap with different classiﬁers for (a) Indian dataset and (b) PaviaU dataset.

W. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 89 (2014) 25–36

33

Fig. 6. The classiﬁcation accuracies in each class for VQ-LIsomap and LIsomap using different classiﬁers for Indian and Urban datasets.

dimensions of the Indian and PaviaU datasets are set, using crossvalidation, to be 50 and 40 respectively. For the Indian dataset, using cross-validation, the neighborhood size k for all the methods is set as 30; in UL-Isomap, the projected dimension P is set as 80, the overlapping parameter a and the threshold of total distortion r in LBG are set as 0.15 and 0.005 respectively. For the PaviaU dataset, using cross-validation, the k for all the methods is set as 25; the P in UL-Isomap is set as 60; the overlapping parameter a and the threshold of total distortion r are set as 0.1 and 0.01 respectively. Other parameter conﬁgurations unstated are the same as those mentioned in previous experiments. Table 3 shows detailed information about the parameters of the above methods on both datasets. Table 4 shows the OCAs of LIsomap, VQ-LIsomap and UL-Isomap for different percentages of landmarks in the total pixels. Small standard deviations (Stds.) in the table show the stability of the classiﬁcation results from all the dimensionality reduction methods. For each classiﬁer and each dataset, when increasing

the percentages of landmarks, the OCAs of all three methods grow monotonically, albeit slowly. This indicates the increase in the number of landmarks improves the spanning of the landmarks in the low-dimensional embedded space and improves the embedding results. For each percentage of landmarks and each classiﬁer, VQ-LIsomap outperforms LIsomap in OCAs, its OCAs 5.85% and 5.77% higher on average in the Indian dataset and PaviaU dataset respectively. This supports the conclusions in experiment (1). Also, for the same percentage of landmarks and the same classiﬁer, the OCAs of UL-Isomap are only slightly lower than those of VQ-LIsomap, the OCAs 0.95% and 1.04% lower on average in the Indian and PaviaU datasets respectively. This indicates that the three speed improvement schemes of UL-Isomap causes only about a 1% degradations in OCAs of the VQ-LIsomap embedding. Moreover, we implement the Mcnemar test (Foody, 2004) to evaluate the statistical signiﬁcance of the difference in classiﬁcation accuracies between VQ-LIsomap and UL-Isomap. The proportions of landmarks k in Indian and PaviaU datasets during the test are manually set as

34

W. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 89 (2014) 25–36

30% and 8.18% respectively. Using the proportions of correct allocations from three different classiﬁers above, the results show that the degradation in classiﬁcation accuracies of UL-Isomap when compared with VQ-LIsomap is statistically insigniﬁcant at the 5% level of signiﬁcance. Despite this degradation, the UL-Isomap still outperforms LIsomap in classiﬁcation. When comparing the OCAs between UL-Isomap and Isomap, the OCAs of UL-Isomap are only 1.47% and 1.73% lower on average than those of Isomap for the Indian and PaviaU datasets. In addition, the comparison between the OCAs of VQ-LIsomap and Isomap shows that a proper number of VQ-landmarks can provide an equal or slightly lower OCA than that of Isomap. (3) The Comparison in Computational speed between UL-Isomap and LIsomap. The experiment tests the computational speed performance of UL-Isomap against LIsomap. We study the overall computational performance of UL-Isomap and LIsomap rather than each separate step. Table 1 shows that percentage of landmarks has a much larger effect on computational complexity than does the dimension of the embeddings. Hence, we explore the computational time of both methods by changing the percentage of landmarks in the total pixels. For the Indian dataset, using cross-validation, the neighborhood size k and the dimension of embeddings d for both methods are set as 30 and 50 respectively. The projected dimension P and the overlapping parameter a in ULIsomap are set as 100 and 0.1 respectively, using cross-validation. For the PaviaU dataset, using cross-validation, the k and d are set as 20 and 40 respectively and P and a in UL-Isomap are set as 60 and 0.2 respectively. The ranges in the proportions of landmarks k for the Indian and PaviaU dataset are manually set as 10–50% and 2.73–13.6% respectively. Other parameters in UL-Isomap are the same as the counterparts in aforementioned experiments. Table 3 shows detailed information about the parameters of both methods on Indian and PaviaU datasets. We run the experiments on a Mac OS X computer with 2 2.26 GHZ Quad Core Processor and 16 GB of RAM. The codes of UL-Isomap and LIsomap are run in Matlab 2009a. Comparisons of the computational times of UL-Isomap and LIsomap for both datasets are shown in Fig. 7. The ﬁgure shows, for each percentage of landmarks, the computational time of UL-Isomap is shorter than that of LIsomap. Moreover, the curve of computational time in UL-Isomap grows more slowly than that of LIsomap, when increasing the percentage of landmarks. Additionally, for quantitative analysis purposes, Table 5 lists the ratios of the computational time of LIsomap to the computational time of UL-Isomap for different percentages of landmarks. For the Indian dataset, the ratio is 5.03 when using 10% of the total pixels as landmarks, and the ratio increases monotonically with increasing percentages of landmarks.

Similarly, the ratio in PaviaU dataset increases from 5.67 to 11.88 when increasing the percentage of landmarks from 2.73% to 13.6%. We conclude that UL-Isomap outperforms LIsomap in computational speed and improves the speed of LIsomap by at least 5 times. (4) Effect of projected dimension P on classiﬁcation accuracy. This experiment analyzes the effect of projected dimension P on the classiﬁcation performance of HSI data. We implement random projections for both datasets using the random matrix U/N(0, 1/P), and directly classify the results using the three stated classiﬁers. The ranges of projected dimension P for the Indian and PaviaU datasets are manually set between 1 and 160 and between 1 and 100, respectively. Table 3 shows the detailed information of the parameters of random projections on both datasets. For both datasets, the curves of OCAs versus projected dimension P are shown in Fig. 8. The ﬁgure shows that for all three classiﬁers, the OCAs are low at a small P. This indicates that too low of a projected dimension P causes too much loss in information of HSI data. In Fig. 8(a), when increasing P from 1 to 10, the curves rise sharply and reach high values. Then, the curves ﬂuctuate when P is between 10 and 20. This interval includes the dimension which equals the number of classes. The OCA curves ﬁnally become stable after the ‘‘turning point’’ P = 20, which is more than the number of classes. Fig. 8(b) shows that when increasing P, the OCAs of the three classiﬁers ﬁrst grows rapidly to large values and then the growth slows after about P = 10, which is more than the number of classes, resulting in a ‘‘turning point’’. We conclude that a moderate sized P, which is larger than the number of classes in HSI data, would preserve the most important information from HSI data when using random projections. Since OCAs tend to increase after the ‘‘turning point’’, we recommend to select a larger projected dimension P in real applications. (5) Effect of overlapping parameter a in fast approximate KNNs graph construction on classiﬁcation accuracy. The experiment explores the effect of the overlapping parameter a on the OCAs of UL-Isomap. The overlapping parameters a for both datasets are manually set from 0.05 to 0.4, with incremental steps of 0.05. The proportions of landmarks k in the total pixels for both datasets are manually set as 20% and 5% respectively. For the Indian dataset, using cross-validation, the neighborhood size k is set as 30, the projected dimension P is set as 80, and the dimension d of the UL-Isomap embedding is set as 55. For the PaviaU dataset, the k is set as 25, the P is set as 60 and the d is set as 35, after cross-validation. The other parameters in UL-Isomap for each dataset are the same as the counterparts in experiment (2). Table 3 shows detailed information about the parameters of UL-Isomap on both datasets.

Fig. 7. The contrast in the computational speed between UL-Isomap and LIsomap with different proportions of landmarks in the total pixels for (a) Indian dataset and (b) PaviaU dataset.

W. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 89 (2014) 25–36

35

Fig. 8. The effect of projected dimension P on the classiﬁcation results of (a) Indian dataset and (b) PaviaU dataset.

Fig. 9. The effect of overlapping parameter on the OCA of the UL-Isomap embedding with different classiﬁers for (a) Indian dataset and (b) PaviaU dataset.

Fig. 9 shows the curves of OCAs versus the overlapping parameter a for different classiﬁers. For each dataset and each classiﬁer, the OCAs increase slowly overall as a increases from 0.05 to 0.4. This indicates that in general a larger overlapping parameter a brings about a more accurate approximation of the actual KNNs graph and improves the classiﬁcation results. However, a larger a will also result in an increase in the computational complexity of fast approximate KNNs graph construction. Therefore, from an application point of view, the OCAs of UL-Isomap with a lower a is sufﬁcient. This a is a balanced tradeoff between higher OCAs and lower complexity of the KNNs graph construction. 5. Conclusions and future work The two main problems of LIsomap are the use of random landmarks and the high computational complexity of the method. These problems prevent the application of LIsomap in real applications of HSI data classiﬁcation. This paper proposes a method named UL-Isomap to address these problems. The method uses the VQ method to select proper landmarks for HSI data by minimizing the total quantization distortion. The VQ landmark scheme not only considers the variations in the local density of pixels in the spectral space but also ﬁnds the best representatives in the total pixels. The VQ landmarks from the LBG algorithm improve the OCAs of the LIsomap embedding. The UL-Isomap method also reduces the computational complexity of LIsomap by three methods. First, random projections reduce the bands of HSI data. Second, the RLB method decreases the computational complexity of regular KNNs graph construction. Third, the fast randomized low-rank approximate SVD decomposition decreases the computational time of eigenvalue decomposition in MDS. Five groups of experiments have been run on two different HSI datasets to test the

performance of the method. The results show that the three speed improvement methods allow UL-Isomap to perform over 5 times faster than LIsomap. In addition, UL-Isomap achieves signiﬁcantly better OCAs than does LIsomap and achieves slightly lower classiﬁcation accuracies than does VQ-LIsomap. The VQ-LIsomap can also obtain an equal or slightly lower OCA than does Isomap with a proper number of landmarks. Moreover, UL-Isomap has only slightly worse OCAs than does Isomap. The results show that a moderate projected number of classes and a small overlapping parameter provide satisfactory classiﬁcation results. In this paper, however, we did not carefully study the settings of the following parameters: the RLB threshold r, the proportion of landmarks k, and the total distortion threshold r. In future work, we will study the automatic or semi-automatic selection of the above parameters to improve the performance of UL-Isomap. Moreover, we will use the Mcnemar test to completely evaluate the classiﬁcation accuracies that are obtained from different dimensionality reduction methods (VQ-LIsomap, UL-Isomap, LIsomap and Isomap) and three different classiﬁers (KNN, NB and SVM). We will also apply the ULIsomap into bigger HSI datasets achieved from different sensors to further verify the new method and to solve realistic problems in our programs. Acknowledgements This work was Funded by ‘‘973’’ National Basic Research Program of China (2013CB733204) and by K.C. Wong Magna Fund in Ningbo University. The authors thank Yen-ming Mark Lai of the Norbert Wiener Center for Harmonic Analysis and Applications at the University of Maryland College Park for her helpful suggestions. The authors would also like to thank the editor and referees for their suggestions which improved the manuscript.

36

W. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 89 (2014) 25–36

References Adam, E., Mutanga, O., Rugege, D., Ismail, R., 2012. Discriminating the papyrus vegetation (Cyperus papyrus L.) and its co-existent species using random forest and hyperspectral data resampled to HYMAP. Int. J. Remote Sens. 33, 552–569. Bachmann, C.M., Ainsworth, T.L., Fusina, R.A., 2005. Exploiting manifold geometry in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 43, 441–454. Bachmann, C.M., Ainsworth, T.L., Fusina, R.A., 2006. Improved manifold coordinate representations of large-scale hyperspectral scenes. IEEE Trans. Geosci. Remote Sens. 44, 2786–2803. Bachmann, C.M., Ainsworth, T.L., Fusina, R.A., Montes, M.J., Bowles, J.H., Korwan, D.R., Gillis, D.B., 2009. Bathymetric retrieval from hyperspectral imagery using manifold coordinate representations. IEEE Trans. Geosci. Remote Sens. 47, 884– 897. Baraniuk, R.G., Wakin, M.B., 2009. Random projections of smooth manifolds. Found. Comput. Math. 9, 51–77. Baraniuk, R., Dawenport, M., Devore, R., Wakin, M., 2008. A simple proof of the restricted isometry property for random matrices. Constructive Approx. 28, 253–263. Belkin, M., Niyogi, P., 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396. Belluco, E., Camuffo, M., Ferrari, S., Modenese, L., Silvestri, S., Marani, A., Marani, M., 2006. Mapping salt-marsh vegetation by multispectral and hyperspectral remote sensing. Remote Sens. Environ. 105, 54–67. Berry, M.W., 1992. Large-scale sparse singular value computations. Int. J. Supercomput. Appl. 6, 13–49. Bingham, E., Mannila, H., 2001. Random projection in dimensionality reduction: applications to image and text data. In: Proc. the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. San Francisco, California, USA, 26–29 August, pp. 245–250. Boley, D., 1998. Principal direction divisive partitioning. Data Min. Knowl. Disc. 2, 325–344. Bouzalmat, A., Belghini, N., Zarghili, A., Kharroubi, J., Majda, A., 2011. Face recognition using neural network based fourier gabor ﬁlters and random projection. Int. J. Comput. Sci. Security 5, 376–386. Brantley, S.T., Zinnert, J.C., Young, D.R., 2011. Application of hyperspectral vegetation indices to detect variations in high leaf area index temperate shrub thicket canopies. Remote Sens. Environ. 115, 514–523. Chabrillat, S., Goetz, A.F.H., Krosley, L., Olsen, H.W., 2002. Use of hyperspectral images in the identiﬁcation and mapping of expansive clay soils and the role of spatial resolution. Remote Sens. Environ. 82, 431–445. Chen, G., Qian, S.E., 2009. Denoising and dimensionality reduction of hyperspectral imagery using wavelet packets, neighbour shrinking and principal component analysis. Int. J. Remote Sens. 30, 4889–4895. Chen, Y., Crawford, M., Ghosh, J., 2005. Applying nonlinear manifold learning to hyperspectral data for land cover classiﬁcation. In: Proc. 2005 IEEE International conference on Geoscience and Remote Sensing symposium (IGARSS ‘05). Seoul, Korea, 25–29 July, pp. 4311–4314. Chen, Y., Crawford, M.M., Ghosh, J., 2006. Improved nonlinear manifold learning for land cover classiﬁcation via intelligent landmark selection. In: Proc. 2006 IEEE International Conference on Geoscience and Remote Sensing Symposium (IGARSS ‘06). Denver, Colorado, USA, 31 July – 4 August, pp. 545–548. Chen, X., Warner, T.A., Campagna, D.J., 2007. Integrating visible, near-infrared and short-wave infrared hyperspectral and multispectral thermal imagery for geological mapping at Cuprite, Nevada. Remote Sens. Environ. 110, 344–356. Chen, J., Fang, H., Saad, Y., 2009. Fast approximate kNN graph construction for high dimensional data via Recursive Lanczos Bisection. J. Machine Learning Res. 10, 1989–2012. Cheng, Y.B., Ustin, S.L., Ria, O.D., Vanderbilt, V.C., 2008. Water content estimation from hyperspectral images and MODIS indexes in Southeastern Arizona. Remote Sens. Environ. 112, 363–374. Cover, T., Hart, P., 1967. Nearest neighbor pattern classiﬁcation. IEEE Trans. Inf. Theory 13, 21–27. Crawford, M.M., Ma, L., Kim, W., 2011. Exploring nonlinear manifold learning for classiﬁcation of hyperspectral data. In: Prasad, S., Bruce, L.M., Chanussot, J. (Eds.), Optical Remote Sensing. Springer, Berlin Heidelberg, pp. 207–234. Datta, R., Joshi, D., Li, J., Wang, J.Z., 2008. Image retrieval: ideas, inﬂuences, and trends of the new age. ACM Comput. Surveys (CSUR) 40, 1–60. Du, Q., Fowler, J.E., Ma, B., 2011. Random-projection-based dimensionality reduction and decision fusion for hyperspectral target detection. In: Proc. 2011 IEEE Internatiaonal Conference on Geoscience and Remote Sensing Symposium (IGARSS ‘11). Vancouver, Canada, 24–29 July, pp. 1790–1793. Eillis, R.J., Scott, P.W., 2004. Evaluation of hyperspectral remote sensing as a means of environmental monitoring in the St. Austell China clay (kaolin) region, Cornwall, UK. Remote Sensing Environ. 93, 118–130. Equitz, W.H., 1989. A new vector quantization clustering algorithm. IEEE Trans. Acoust. Speech Signal Process. 37, 1568–1575.

Foody, G.M., 2004. Thematic map comparison: evaluating the statistical signiﬁcance of differences in classiﬁcation accuracy. Photogrammetric Eng. Remote Sensing 70, 627–634. Fowler, J.E., Du, Q., 2012. Anomaly detection and reconstruction from random projections. IEEE Trans. Image Process. 21, 184–195. Gersho, A., Gray, R.M., 1992. Vector quantization and signal compression. Springer, Netherlands. Golden, B., 1976. Shortest-path algorithms: a comparison. Oper. Res. 24, 1164– 1168. Gray, R., 1984. Vector quantization. IEEE ASSP Mag. 1, 4–29. Jia, X., Kuo, B.C., Crawford, M.M., 2013. Feature mining for hyperspectral image classiﬁcation. Proc. IEEE 101, 676–697. Jin, H., Li, P., Cheng, T., Song, B., 2012. Land cover classiﬁcation using CHRIS/PROBA images and multi-temporal texture. Int. J. Remote Sens. 33, 101–119. Kleinberg, J.M., 1997. Two algorithms for nearest-neighbor search in high dimensions. In: Proc. the twenty-ninth annual ACM symposium on Theory of computing. El Paso, TX, USA, 04–06 May, pp. 599–608. Knuth, D.E., 1997. A generalization of Dijkstra’s algorithm. Inf. Process. Lett. 6, 1–5. Lanczos, C., 1950. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. United States Governm. Pr. Ofﬁce. Lee, J.A., Verleysen, M., 2007. Nonlinear Dimensionality Reduction. Springer Verlag, New York. Linde, Y., Buzo, A., Gray, R., 1980. An algorithm for vector quantizer design. IEEE Trans. Commun. 28, 84–95. Mathieu, R., Wessels, K., Asner, G., Knapp, D., VanAaardt, J., Main, R., Cho, M., Erasmus, B., Smit, I., 2009. Tree cover, tree height and bare soil cover differences along a land use degradation gradient in semi-arid savannas, South Africa. In: Proc. 2009 IEEE International Conference on Geoscience and Remote Sensing Symposium(IGARSS ‘09). Cape Town, South, Africa, 12–17 July, pp. II-194–II197. Mccallum, A., Nigam, K., 1998. A comparison of event models for naive bayes text classiﬁcation. In: Proc. AAAI-98 Workshop on Learning for Text Categorization. Madison, Wisconsin, USA, 26–27 July, pp. 41–48. Patan, G., Russo, M., 2001. The enhanced LBG algorithm. Neural Networks 14, 1219– 1237. Plaza, A., Martnez, P., Plaza, J., Prez, R., 2005. Dimensionality reduction and classiﬁcation of hyperspectral image data using sequences of extended morphological transformations. IEEE Trans. Geosci. Remote Sens. 43, 466– 479. Ra, S.W., Kim, J.K., 1993. A fast Mean-distance-ordered Partial Codebook Search algorithm for image vector quantization. IEEE Trans. Circ. Syst. II: Analog Digital Signal Process. 40, 576–579. Rokhlin, V., Szlam, A., Tygert, M., 2010. A randomized algorithm for principal component analysis. SIAM J. Mat. Anal. Appl. 31, 1100–1124. Roweis, S.T., Saul, L.K., 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326. Scott, D., 1992. The curse of dimensionality and dimension reduction. In: Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, New Jersey, pp. 195–217. Sertel, O., Kong, J., Lozanski, G., Shanaah, A., Catalyurek, U., Saltz, J., Gurcan, M., 2008. Texture classiﬁcation using nonlinear color quantization: Application to histopathological image analysis. In: Proc. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’08). Las Vegas, Nevada, USA, 31 March – 4 April, pp. 597–600. Silva, V., Tenenbaum, J.B., 2003. Global versus local methods in nonlinear dimensionality reduction. Adv. Neural Inf. Process. Syst. 15, 705–712. Steinwart, I., Christmann, A., 2008. Support Vector Machines. Springer Verlag, New York. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R., 1999. Interpreting patterns of gene expression with selforganizing maps: methods and application to hematopoietic differentiation. Proc. Nat. Acad. Sci. 96, 2907–2912. Tenenbaum, J.B., Silva, V., Langford, J.C., 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323. Thenkabail, P.S., Schull, M., Turral, H., 2005. Ganges and Indus river basin land use/ land cover (LULC) and irrigated area mapping using continuous streams of MODIS data. Remote Sens. Environ. 95, 317–341. Uto, K., Kosugi, Y., 2013. Estimation of Lambert parameter based on leaf-scale hyperspectral images using dichromatic model-based PCA. Int. J. Remote Sens. 34, 1386–1412. Yang, M.H., 2002. Extended isomap for pattern classiﬁcation. In: Proc. the Eighteenth national conference on Artiﬁcial intelligence. Edmonton, Alberta, Canada, 28 July–1 August, pp. 224–229. Zhang, Z., Zha, H., 2003. Nonlinear dimension reduction via local tangent space alignment. In: Intelligent Data Engineering and Automated Learning. Springer Verlag, New York, pp. 477–481.