Classical least squares for detection and classification

Classical least squares for detection and classification

Chapter 2.9 Classical least squares for detection and classification Neal B. Gallagher Chemometrics, Eigenvector Research, Inc., Manson, WA, United S...

2MB Sizes 0 Downloads 6 Views

Chapter 2.9

Classical least squares for detection and classification Neal B. Gallagher Chemometrics, Eigenvector Research, Inc., Manson, WA, United States

1. Introduction Classical least squares (CLS) is a useful modeling tool for detection and classification in hyperspectral images [1,2]. One reason CLS is amenable to modeling images is that target spectra might be known, but reference values for each pixel are rarely available. Another reason is that CLS can be used with weighting strategies to suppress clutter signal (interferences and noise) while enhancing minor target signal. Additionally, the CLS model reasonably matches how signal manifests in many types of images. This chapter starts with a motivation for this model form based on how the signal manifests in images and demonstrates the opportunity for synergy between tools to maximize information extraction to meet detection and classification objectives. Four forms of CLS are presented: classical least squares (CLS), weighted least squares (WLS), generalized least squares (GLS), and extended least squares (ELS) [2]. The focus will be on the latter two forms and three examples of images are shown: a Raman image of space shuttle tiles [3,4], a near-infrared (NIR) image of melamine in wheat gluten powder, and a Landsat 8 image of Lake Chelan. The first example shows how a single-target GLS model can be used to find minor signal in an image. The second example uses multiple targets in an ELS-GLS model followed by multivariate curve resolution (MCR) [5]. CLS is the model form used in MCR (a.k.a., end-member extraction or spectral unmixing), and MCR was used in an exploratory analysis of the NIR image to clearly elucidate how the image signal nearly follows closure (i.e., where the contributions sum to a constant). The last example uses target detection to classify pixels in an image followed by an ELS-GLS model to enhance discrimination of a tight cluster. Hyperspectral Imaging. Copyright © 2020 Elsevier B.V. All rights reserved.


232 SECTION j II Algorithms and methods

CLS models are flexible, are adaptable, and provide an easy platform for numerical constraints and result in the most interpretable models available because the model basis usually consists of pure component spectra instead of abstract loadings found in other models. Much of this is not discussed here because the focus is on accounting for clutter (e.g., interference signal to suppress) and enhancing sensitivity (e.g., via using multiple targets). Characterizing clutter is vitally important [6] to the analysis because it dictates if target can be detected (e.g., if a target looks spectrally similar to an interference it may be that it would not be detected). Therefore, it is of interest to understand how the target and the clutter manifest in the signal. Fortunately, hyperspectral images contain a large number of pixels that provide an opportunity to characterize these signals. It is also true that hyperspectral images can change image to image because of lighting changes, differences in detector configuration, or sampling artifacts. But often it is the sample being imaged that changes, and it is highly useful to characterize the clutter on an image-to-image basis. The GLS and ELS models are amenable to this type of analysis. Additionally, the target signal can change image to image, and this leads to targeted anomaly detection approaches based on CLS-like models.

2. Classical least squares models 2.1 Nomenclature and conventions In the following work, scalars are represented by lower case italic, x; column vectors are lower case bold, x; and matrices are upper cased bold, X. A matrix XM;N is of size M  N, where upper case italic indicates the size of a matrix dimension. The number of samples (pixels) is M for m ¼ 1; :::; M, and the number of variables is N for n ¼ 1; :::; N. The mth row of X is xTm (i.e., the extracted row is a column vector), and the nth column is xn where row or column are implied by the subscripts m or n, respectively. Upper case T indicates transpose, and a single element of X is xm;n . An image is represented by an Mx My  N “cube,” X, where M ¼ Mx My and the number of spectral channels is N. An image is “matricized” to a matrix such that XMx ;My ;N 0XM;N [2]. A column vector of ones with number of elements appropriate for the problem of interest is given by 1. The matrix I is the identity matrix with ones on the diagonal and zeros everywhere else. The trace of a matrix, trðXÞ, is the sum of P the diagonal elements. The two norm (Euclidean norm) isP given by kxk2 ¼ Nn¼1 x2n and the one norm (area norm) is given by kxk1 ¼ Nn¼1 jxn j. The covariance matrix is defined in statistics as 1 ðX  1xT ÞT ðX  1xT Þ where xT ¼ 1 1T X is the mean of X. covðXÞ ¼ M1 M However, in this chapter, it will also correspond to “variance about the model origin” or “the sum of squares about the origin” given by M1 XT X. The usage depends on the problem of interest and will be defined.

Classical least squares for detection and classification Chapter j 2.9


2.2 Development of the CLS model CLS is based on a linear mixture model and is often called “multicomponent BeereLambert Law” when applied to spectroscopic applications. The CLS model for a single measurement, xm , is given by x ¼ Sc þ e


where the columns of SNK form a basis of response vectors (often referred to as end-members), cK1 is a vector of contributions, and e is the model error. The contributions, c, can be seen as coefficients dictating “how much” each basis vector contributes to the measured signal, x. In chemical sensing applications S can be a set of known pure component spectra, and the contributions, c, are related to the concentration sensed in each pixel. However, the estimated contributions in hyperspectral imaging might not be directly proportional to concentration or not easily scalable to the signal being measured, e.g., in NIR imaging the sample volume can change based on particle packing density [6,7]. As a result, the general term “contribution” is used in place of concentration [6,16]. For a matricized image, X, each row corresponds to a measured pixel modelable using Eq. (1). Therefore, the entire image can be modeled as X ¼ CST þ E


where C corresponds to a matricized contributions image, i.e., rearranging C to CMx My K provides a set of K “chemical” images. In comparison, a principal components analysis (PCA) model of the image is given by X ¼ TPT þ EPCA:


In PCA, the scores, T, and loadings, P, have columns that are orthogonal such that TT T is diagonal and PT P ¼ I. In CLS, the columns of C and S are not generally orthogonal and are typically constrained to be nonnegative. In other words, S forms an oblique basis of nonnegative linearly independent spectra for describing the measured signal. Importantly, S tends to be more interpretable than loadings, P. The PCA model minimizes trðETPCA EPCA Þ and for the same number of factors, the CLS residual trðET EÞ is generally larger because C and S are often constrained to be nonnegative (other constraints such as smoothness can also be employed, but these are not discussed here). For PCA and CLS, the sum of squared residuals, Q residual [8,9], for the mth row is Qm ¼ eTm em (although Q typically differs for the two models). Four forms of the CLS model are discussed below and, to allow for ease of comparison, the general objective function to be minimized is written as OðcÞ ¼ eT W1 e ¼ ðx  ScÞT W1 ðx  ScÞ


where W is a weighting matrix typically designed such that the weighted residuals are independent and identically distributed [W1 2 ewNð0; s2 1Þ]. (It =

234 SECTION j II Algorithms and methods

is noted that models can be useful even though it is often observed the residuals do not follow this distribution.) The four models under consideration differ only in the definition of W and the general CLS estimator is given as  1  T 1 T bc ¼ ST W1 S ST W1 x ¼ e S e S e S e x (5) e ¼ W1 2 S and e where b indicates an estimated quantity, S x ¼ W1 2 x. Additional theory of CLS and how it relates to net-analyte-signal [9a] can be found in Ref. [2]. If the estimation error is dominated by measurements in x, then the error covariance, Vðb c Þ, is approximated by    1  T 1 c ¼ ST W1 S V b ¼ e S e (6) S =


and this term can provide guidance when setting decision thresholds for detection. Another potentially useful statistic is the weighted Q residual given by Qw;m ¼ eTm W1 em.

2.2.1 Classical least squares The first CLS model is generally known as the CLS model. For CLS, e S¼S and W ¼ s2 I. The term s2 > 0 corresponds to noise variance on the spectral channels. Making these substitutions into Eq. (5) yields the familiar CLS estimator bc ¼ ðST SÞ1 ST x. (This can also be written b c ¼ ðST s2 SÞ1 ST s2 x.) 2.2.2 Weighted least squares The second form of the CLS model is WLS. In WLS, W ¼ diagðwÞ where diagðwÞ is an operator that creates an N  N diagonal matrix from the N 1 vector w. The nonzero entries in w typically correspond to the noise on each spectral channel. WLS allows for different noise levels on each of the N spectral channels. For example, the WLS model allows noisy channels to be downweighted relative to less noisy channels by using different entries in w. In contrast to CLS, it was assumed that noise was the same on each channel. (It is noted that the weighting in w can be designed according to other criteria while recognizing that this can change the statistical properties of the estimator.) 2.2.3 Generalized least squares The third model form considered is GLS. In GLS, the weighting W is not diagonal and is a generalization of WLS. (Note that W must be invertible. If W is not invertible, regularization can be used [2,10,11] and regularization can be combined with a subspace approximation that has shown to improve sensitivity [12].) GLS is also known as the Aitken estimator [13] and has been

Classical least squares for detection and classification Chapter j 2.9


popularized as the matched filter [14,15]. In GLS, W is used to weight different “directions” in N-dimensional space as opposed to individual variables, and W typically corresponds to the “clutter” covariance matrix [2,6,10]. Clutter is defined as measured signal not related to target spectra in S; clutter corresponds to interference and noise signal. For example, for a matricized image, X, a submatrix Xc of clutter measurements corresponding to Mc nontarget pixels can be identified (Mc < M). The clutter covariance is given by Wc ¼

T   1  Xc  1xTc or Xc  1xTc Mc  1

Wc ¼

1 T X Xc Mc c


where xc is the clutter mean. The concept of clutter and weighting is central to the discussion in this chapter and will be expanded on in examples given below where the definition of “clutter” and “signal” will depend on signal to suppress and signal to enhance respectively. From a signal processing perspective, it may be desirable to use mean-centering, but there are times where mean-centering is not used (i.e., the covariance is defined relative to zero instead of the mean as shown in the right hand expression of Eq. (7)). GLS also provides the motivation to preprocess or e ¼ XW1 2 ) prior to inverse least squares (ILS) whiten a matrix of predictors (X modeling. The result is the GLS weighting preprocessing [11]. =

2.2.4 Extended least squares The final CLS model to be considered is ELS based on the extended mixture model [1]. ELS gets its name because the target is “extended” such that S/ ½ S P  where in ELS S strictly contains target spectra and P corresponds to a basis that spans the interference signal. For example, P could be the loading vectors from a PCA model of the clutter, or pure component spectra, Sc , from an MCR model of the clutter. Any basis set of interest that spans the interference signal could be used. In the “extended” form, the objective function to minimize is Oðc; tÞ ¼ eT e ¼ ðx  ½ S

P ½ c t ÞT ðx  ½ S

P ½ c

t Þ


where t corresponds to the clutter scores for known basis P. When the clutter basis is described by clutter spectra (P/Sc ), then the scores are describable as clutter contributions (t/cc ). An advantage of using spectra, Sc , as the clutter basis is that nonnegativity can be easily employed, and the clutter spectra and contributions are easily interpretable. It is also noted that with the aid of block inverse theorems, the ELS objective can be shown to reduce to Eq. (4) with the inverse weighting matrix written as W1 [ I  PðPT PÞ1 PT and note that this matrix is idempotent. Although this weighting may be less intuitive than Eq. (8), e ¼ XW1 ) it does provide motivation to preprocess a matrix of predictors (X prior to ILS modeling giving the external parameter orthogonalization preprocessing [17,18].

236 SECTION j II Algorithms and methods

3. Characterization of the measured signal A popular model for detection of a single target in hyperspectral modeling is based on GLS and is given by x  xc ¼ n sc þ e o OðcÞ ¼ tr ðx  xc  scÞT W1 c ðx  xc  scÞ  1 T 1 cb ¼ sT W1 s Wc ðx  xc Þ c s


where xc is the clutter mean and Wc is the clutter covariance defined in the left-hand expression of Eq. (7). Fig. 1A depicts this model for two variables x1 and x2 where the contributions c2 > c1 correspond to two measurements at different target levels. But what does this model mean? Eq. (9) implies that the mean clutter signal and clutter covariance (the fuzzy ellipse around the clutter mean) are stationary and that when target signal is present it adds to mean clutter; the clutter mean is treated as the model origin. This model implies that the measured signal intensity increases as target contribution increases. Eq. (9) might also imply that target signal could decrease relative to the clutter mean because cb is not typically constrained to be nonnegative. However, the model and unconstrained least squares do not agree with observation. First, it is well known that target signal must be nonnegative. Additionally, MCR results for measured signals tend to show that as target signal increases, clutter signal decreases [19,20]. This means that the measured signal can go from pure clutter to pure target with mixtures lying between the extremes. This model can be represented by x ¼ sc þ cnc xc þ e ¼ ½ s xc ½ c cc  o OðcÞ ¼ tr ðx  cc xc  scÞT W1 ðx  x c  scÞ c c c 1 i  h T 1 ½ s xc T W1 cb cbc ¼ ½ s xc  Wc ½ s xc  c x


FIGURE 1 (A) The model implied by Eq. (9) where the measured signal is x ¼ xc þ cs þ e. (B) The model implied by Eq. (10) where the measured signal is x ¼ cc xc þ cs þ e. (C) The model implied by Eq. (10) with strict closure on the contributions.

Classical least squares for detection and classification Chapter j 2.9


where the model origin is zero as shown in Fig. 1B. The bottom expression in Eq. (10) is the unconstrained least squares solution but will be subject to nonnegativity in the examples given below. The model given in Eq. (10) is a linear mixture model where the signal is a sum of clutter and target, and observations suggest that the signal might approximately follow closure such that the sum of the contributions is a constant, e.g., Eq. (10) should be subject to c þ cc ¼ 1. Fig. 1C shows that imposing closure means that the signal must lie strictly on the line between the clutter mean, xc , and target, s. However, MCR results suggest that a strict closure constraint is too strong [19e21] potentially due to lighting, shading, and other effects in the image (see the example in Section 4.2 for an NIR image). Although strict closure might be too strong, a softer approach mimics closure by normalizing the measurements and targets using a one norm (area norm) [21,21a]. This preprocessing was used in the three examples given below. The models given in Eqs. (9) and (10) (Fig. 1) assume that the clutter covariance and clutter mean are stationary across an image. Given multiple sources of clutter, this assumption is questionable and is suspected even for a single clutter source if lighting changes across an image. Additionally, Eq. (6) implies that the clutter is normally distributed. However, a simple example shows that this assumption can be far from accurate. For example, Fig. 2 shows PCA results for an NIR image of wheat gluten (additional detail is provided in Section 4.2). It is clear from the image the scores on principal component (PC) 1 and plots of scores on PC 1 and 2 that the wheat gluten signal is not normally distributed and, as a result, the estimate for Vðb c Þ should be considered as a “guide” that could be significantly inaccurate. To be consistent with observations of how the signal manifests, the examples below used zero as the model origin (Eq. 10) as opposed to using the clutter mean as the origin (Eq. 9) and nonnegativity on contributions [21b]. Additionally, “soft closure” was imposed by normalizing measurements and targets using a one norm. All analysis was performed using MATLAB [22], PLS_Toolbox, and MIA_Toolbox [23].

FIGURE 2 (A) Scores image for principal component (PC) 1 for a near-infrared image of wheat gluten (no signal processing was used). (B) Scores on PC 2 versus PC 1 with approximate 95% and 99% confidence ellipses based on the assumption of normality. (C) Scores histograms for PC 1 (top) and PC 2 (bottom) compared to Gaussian distributions.

238 SECTION j II Algorithms and methods

4. Example applications 4.1 Raman image of space shuttle tiles A Raman image was acquired of a silicon carbide space shuttle tile with boron carbide additive and is discussed in detail in Ref. [3]. The image is 50 pixels by 50 pixels measured over the range 1472e480 cm1 in 1024 spectral channels. For visualization, Fig. 3 (left) shows an RGB image of the PCA scores on PCs 1, 2, and 3. This figure corresponds to the major signal in the image (approximately, 92% of the total sum of squares). Fig. 3 (right) shows a plot of the Q residuals (w1.26% of the sum of squares) for a 14 PC model and Pixel 560 is a clear outlier caused by small systematic signal (Fig. 4 (bottom right)). The spectrum for Pixel 560 shows a strong feature at 1324 cm1 “likely related to diamond (normally seen at 1332 cm1)” and minor features at 788.9 and 968.8 cm1 “likely related to SiC” (silicon carbide) [4]. Pixel 560 corresponds to very minor overall variance in the image, and the objective of the subsequent analysis was to attempt to find more pixels like it using a single-target GLS model (often used in matched filters) iteratively. “Single target” refers to not including xc in Eq. (10). In the first iteration of the analysis, Pixel 560 was used as the target and Wc corresponded to the covariance about zero of all pixels except Pixel 560. Therefore, nontarget variance about zero was downweighted. The result was the strong detection of six pixels that appeared to include target signal: Group A and B Pixels in Fig. 4 (left). Note that the target contributions ranged from zero to slightly higher than one and that the contribution image was contrasted from 0.2 to 0.5 to enhance visualization of weak detections. Group A Pixels are plotted in Fig. 4 (top right) and show a good match to the target spectrum. In the second iteration of the analysis, the six detected pixels in Group A and B were removed from the clutter and target contributions were estimated

FIGURE 3 (Left) RGB image of the PCA scores on PCs 1, 2, and 3. (Right) Image of PCA Q residuals showing Pixel 560 has high Q (bright yellow). Its measured spectrum is plotted in Fig. 4 (bottom right). PCA, Principal component analysis; PC, Principal component.

Classical least squares for detection and classification Chapter j 2.9


FIGURE 4 (Left) Contrasted image of target contributions. (top right) Pixel Group A spectra. (Bottom right) Normalized spectrum for Pixel 560 compared to Pixel 383.

with the new clutter covariance. It is these contributions that are shown in Fig. 4 (left) and Pixel 383 in the bottom left corresponded to a subtle potential detection that was not evident in the first round of analysis. Pixel 383 is compared to the target in Fig. 4 (bottom right) and does show what appears to be a low signal that is similar, but not identical, to the target. The shuttle tile example shows how minor signal can be detected in an image, but clearly there is room for potential improvement. For example, if diamond is the desired target, then the small SiC peaks could be removed from the target spectrum and the result smoothed to more cleanly focus on diamond signal. Additionally, diamond peaks in the Group A spectra appear slightly shifted from the target peak, and the spectrum in Pixel 383 appears slightly broadened. It has been shown that this type of minor deviation in the signal can be handled using additional “shift” and “broadening” basis functions in an ELS model [24,24a].

4.2 NIR image of melamine in wheat gluten The wheat gluten image has been discussed previously using log10 transform of the measured NIR reflectance signal [12]. In the present example, the log transform was not used. The image analyzed was 240 by 241 with 318 spectral channels measured over the range 1008e1642 nm, and the image was masked to use only the sample of interest at the center of the image. The objective was to detect potential adulteration with melamine particles. Fig. 2 (left) shows a PCA scores image on PC 1 (detected melamine target pixels were excluded from this image). Target detection used Eq. (10) iteratively with the target, s, corresponding to the mean from a pure melamine sample. In the first iteration, xc was the mean of the image and the clutter was defined using the mean-centered pixels (i.e., Eq. 7). Detections were defined as pixels with contributions above twice

240 SECTION j II Algorithms and methods

the theoretical detection threshold (Eq. 6). In subsequent iterations, detected pixels were removed from the calculation of xc and the clutter covariance. This “ELS-GLS” model effectively reduced target signal biasing xc and reduced the detection threshold by lowering signal in the inverse clutter covariance parallel to s (see Eq. 6). The approach is adaptable on an image-to-image basis: clutter is defined specific to each image. However, the detection strategy is likely optimal for images with only a few pixels that dominated by target signal. As shown previously, the strategy appears to be reasonably effective [12] and once target was identified, additional signal exploration could be employed. In one example, all detected target pixels were used as “signal” pixels and all remaining pixels were used as “clutter” in a subsequent analysis using “whitened” or “weighted” PCA. This allowed the target signal to manifest itself as multiple, image-specific (not relying on outside information) principal components and resulted in additional potential detections. This approach was referred to as “targeted anomaly detection” [12]. For the present example, the extracted target, s, and mean clutter, xc , were used as spectral constraints in MCR and a third factor associated with wheat gluten was also estimated in the MCR decomposition. Fig. 5 (left) shows an RGB image of the contributions (a.k.a., MCR scores) plot corresponding to Components 1, 2, and 3, respectively, and estimated normalized pure component spectra (top right graph). Blue (Component 1, wheat gluten) pixels are the most ubiquitous with red (Component 2, wheat gluten) pixels interspersed. Signal from Component 2 appears to be diminished near the outer boundary near the sample cup wall (the black boarder representing pixels not included in the analysis). Green (Component 3, melamine target) pixels show up in three major spots [X,Y]: A large spot at the left [127,34], a medium sized spot at the bottom left [197, 58],

FIGURE 5 (Left) RGB image of the contributions (MCR scores), (top right) estimated normalized pure component spectra, (bottom right) scores profile for the white arrow in the image. In each image/graph: Blue is Component 1 ¼ major wheat gluten signal, Red is Component 2 ¼ minor wheat gluten signal (pixels interspersed), and Green is Component 3 ¼ melamine target.

Classical least squares for detection and classification Chapter j 2.9


and a small spot near the upper right of the center [78,136]. The results show that melamine shows up as minor heterogeneous spots in a broader mixture of background wheat gluten. Scores profiles also provide insight to the signal Fig. 5 (bottom right graph). In the image (Fig. 5 (left)) a white arrow is drawn from the upper left (green dot indicating the start of the arrow) downward terminating at the red dot. The line was drawn to cross both the large and medium green spots of melamine. The corresponding score profiles along the arrow are drawn left to right (Fig. 5 (bottom right)). The blue profile (Component 1, wheat gluten) is approximately flat with minor fluctuations over the entire profile except in the locations of the green spots where the profiles dip to zero. Conversely, the green profile (Component 3, melamine) is nearly zero with minor fluctuations over the entire profile except in the locations of the green spots where the target signal increases to maxima; as target melamine signal increases, the background wheat gluten signal decreases. In fact, it appears that the sum of the contributions for all three components is nearly constant, i.e., the signal follows closure (as discussed in Section 3). The wheat gluten example provides evidence in support of the proposed models used in this chapter and corresponds to the ELS model given in Fig. 1B with the objective function given in Eq. (8).

4.3 Landsat 8 image of Lake Chelan In this example, a Landsat 8 image of Lake Chelan, WA, was analyzed (U.S.G.S., FILE_DATE ¼ 2017-06-30T11:25:51Z). Lake Chelan is 50.5 miles long and 1486 ft deep and is located in north central Washington state. The image selected for analysis corresponded to the southeast portion of the lake (Fig. 6 (left)). The town of Chelan in the southeast of the image and Lake Chelan is the dark swath of pixels on the south side of the image. Table 1 lists the eight wavelength bands used for the analysis. The objective of the analysis was to split signal attributable to lawn (e.g., associated with the municipal golf course) from cherry orchard (and other

FIGURE 6 (Left) Image of Lake Chelan and the surrounding area based on bands 3, 2, 1 (RGB) listed in Table 1. (Right) Binary image of pixels comprising signal primarily associated with water (yellow). Lake Chelan, the Chelan River, and four small regions (circled) were correctly classified as water.

242 SECTION j II Algorithms and methods

TABLE 1 Landsat 8 imaging bands. Band number

Wavelength range (mm)

Resolution (m)






















8 (not used)






10 (not used)



11 (not used)



agricultural land including vineyards and other orchard types). This goal was complicated by the fact that the image included a large fraction of pixels associated with water, forest, rough terrain, dry scrubland, homes, roads, and buildings. Therefore the exercise was split into two tasks. Task 1 could be considered a global classification that identified broad classes within the image: Water, Green (orchards, vineyards and lawn), Bare Earth, three types of forest, Road, Buildings and Other (corresponding to no specific class). Task 2 was a local classification that split the Green class into Lawn and Cherries (cherry orchards). It is well known that local models can enhance signal associated with targets of interest that otherwise might be suppressed when using global models. In the context of the present discussion this should make sense because global models have to account for more clutter than local models. The first step in Task 1 identified the global Water class. To do this, PCA was performed on the image and a representative subset of pixels associated with water was selected in a PCA scores plot (not all water pixels were included in this initial subset). The mean of the selected subset of water pixels was used as the water target using a single-target GLS model similar to that discussed in Section 1. For this model the clutter was defined as the covariance of (1) the mean-centered water pixels plus, (2) all remaining pixels (not centered). Target and image pixels were normalized (one norm), and pixels with contributions greater than 0.6 (b c > 0:6) were classed as Water. Therefore, target detection used the representative subset of water pixels to aid in the detection of additional water signal within the image. Fig. 6 (right) shows a “binary image” for the

Classical least squares for detection and classification Chapter j 2.9


identified Water class. Interestingly the representative subset of water pixels were selected from Lake Chelan, yet other minor detections were also observed (circled in red). These minor detections and the Chelan River in the southeast corner of the image were confirmed with ground truth. After the detection of Water, the next step in Task 1 followed a similar procedure to identify Class Green. However, pixels classed as Water were excluded prior to detection of the Green class (i.e., water signal no longer contributed to the clutter covariance). Therefore, in a sense, with water ignored, the Green class was identified using a wlocal clutter. After Green, Classes Bare Earth, Forest 1, Forest 2, Forest 3, Road, Buildings and Other were identified one at a time. In each case, pixels associated with the previous class were removed from the analysis. The procedure applied for target detection in Task 2 was similar to that used in Task 1; however, this task was more complicated because the targets for the Green subclasses were more similar. As a result, a subtly different target detection model was used. Step 1 identified two areas known from ground truth to be associated with lawn and cherry orchard: Class Lawn and Cherries. PCA was performed on the image with only Class Green included, and representative pixels for Class Lawn and Cherries were selected. The mean of each subset was used as the corresponding target for each class, and the clutter was defined as the mean-centered Class Lawn and Class Cherries pixels plus all remaining Class Green pixels (noncentered). The result is a twotarget model, similar to that used in Section 4.2, local to Class Green. Fig. 7 shows the detection results for Lawn (left) and Cherries (right) where yellow pixels indicate detections for each class. Representative pixels for Class Lawn were selected from the municipal golf course in the southeast quadrant of the image (Fig. 7 (left)). Additional detections were found and confirmed for Don Morse Park, the community field, high school fields, and softball and football fields. Additionally, a private golf course on the southwest side of the image was detected. Representative pixels for Class Cherries were selected from a large orchard on the north center of the image (square box in Fig. 7 (right)) while confirmed detection of cherry orchard are circled. Note that black pixels

FIGURE 7 (Left) Class Lawn detections (yellow) and other Class Green (dark blue). (Right) Class Cherries detections (yellow) and other Class Green (dark blue).

244 SECTION j II Algorithms and methods

FIGURE 8 (Left) RGB image with an overlay of Class Lawn (green), Class Cherries (blue), Class Green (non-Lawn and non-Cherries) (yellow), and Class Bare Earth (red).

indicate pixels not included in Class Green while dark blue pixels correspond to Class Green not classed as Lawn (left) or Cherries (right). For context, Fig. 8 (left) shows an overlay of several classes on the RGB image: Class Lawn, Class Cherries, Class Green (non-Lawn and nonCherries), and Class Bare Earth. Fig. 8 (top right) shows the mean signal used as targets for each global class in Task 1 and shows that Water not only had a significant number of pixels (Fig. 6 (right)) but also had the most unusual “spectrum.” The uniqueness of the Water signal made it the easiest to split outdand that is why it was selected as the first class to identify. Task 2 attempted to split the Green class into subclasses Lawn and Cherries, and Fig. 8 (bottom right) shows how similar their target spectra are. The similarity of the targets is why the two-target model was useddthe model attempts to find signal unique to each target. The multitarget GLS model can produce more sensitive detection results than a single-target GLS model in cases where the targets are similar.

5. Conclusions The discussion splits the CLS models into four model types: CLS, WLS, GLS, and ELS. The relationship between the model forms was reduced to showing differences in the weighting strategy in the objective functions. Understanding how the math is used to model the measured signal allows for the development of sensitive target detection strategies that can also be used to classify pixels in hyperspectral images. Sensitivity was improved using weighting to suppress clutter signal, and selectivity was enhanced using ELS when targets were similar. An advantage is that the strategies are highly adaptable and image specific. Additionally, the concepts can be extended to weighted principal components analysis to further enhance detections resulting in targeted anomaly detection [12].

Classical least squares for detection and classification Chapter j 2.9


Acknowledgments The author would like to thank Fran Adar for the image in Section 4.1, Donal O’Sullivan for his assistance, Greg Israelson for the image in Section 4.2, and Marty Cochran for his assistance in ground truth for the image in Section 4.3.

References [1] [2]

[3] [4] [5] [6] [7] [8] [9] [9a] [10]

[11] [12] [13] [14] [15]

[16] [17]

H. Martens, T. Næs, Multivariate Calibration, second ed., John Wiley & Sons, Chichester, 1989. N.B. Gallagher, Detection, classification and quantification in hyperspectral images using classical least squares models, in: H.F. Grahn, P. Geladi (Eds.), Techniques and Applications of Hyperspectral Image Analysis, John Wiley & Sons, Sussex, 2007, pp. 181e201. F. Adar, Raman mapping of spectrally non-well-behaved species, Spectroscopy 31 (2) (2016) 16e26. F. Adar, Horiba Scientific, Personal Communication, Edison, New Jersey, 2017. A. de Juan, R. Tauler, Multivariate curve resolution (MCR) from 2000: progress in concepts and applications, Critical Reviews in Analytical Chemistry 36 (3e4) (2006) 163e176. T. Burr, B.R. Foy, H. Fry, B. McVey, Characterizing clutter in the context of detecting weak gaseous plumes in hyperspectral imagery, Sensors 6 (2006) 1587e1625. P. Geladi, D. MacDougall, H. Martens, Linearization and scatter-correction for near-infrared reflectance spectra of meat, Applied Spectroscopy 39 (3) (1985) 491e500. J.E. Jackson, A User’s Guide to Principal Components, John Wiley & Sons, New York, 1991. B.M. Wise, N.B. Gallagher, The process chemometrics approach to chemical process monitoring and fault detection, Journal of Process Control 6 (6) (1996) 329e348. A. Lorber, K. Faber, B.R. Kowalski, Net analyte signal calculation in multivariate calibration, Analytical Chemistry 69 (1997) 1620e1626. T.A. Blake, J.F. Kelly, N.B. Gallagher, P.L. Gassman, T.J. Johnson, Passive detection of solid explosives in Mid-IR hyperspectral images, Analytical and Bioanalytical Chemistry 395 (2) (2009) 337e348. H. Martens, M. Hoy, B.M. Wise, R. Bro, P.B. Brockhoff, Pre-whitening of data by covariance-weighted pre-processing, Journal of Chemometrics 17 (3) (2003) 153e165. N.B. Gallagher, J.M. Shaver, R. Bishop, R.T. Roginski, B.M. Wise, Decompositions with maximum signal factors, Journal of Chemometrics 28 (8) (2014) 663e671. A. Aitken, On least squares and linear combinations of observations, Proceedings of the Royal Society of Edinburgh (1935) 42e48. G.L. Turin, An introduction to matched filters, IRE Transactions on Information Theory 6 (3) (1960) 311e329. T. Burr, N. Hengartner, Overview of physical models and statistical approaches for weak gaseous plume detection using passive infrared hyperspectral imagery, Sensors 6 (2006) 1721e1750. N.B. Gallagher, T.A. Blake, Gassman, Application of extended inverse scatter correction to mid-infrared reflectance spectra of soil, Journal of Chemometrics 19 (2005) 271e281. A. Hayden, E. Niple, B. Boyce, Determination of trace-gas amounts in plumes by the use of orthogonal digital filtering of thermal-emission spectra, Applied Optics 35 (16) (1996) 2802e2809.

246 SECTION j II Algorithms and methods [18] J.M. Roger, F. Chauchard, V. Bellon Maurel, EPO-PLS external parameter orthogonalisation of PLS application to temperature-independent measurement of sugar content of intact fruits, Chemometrics and Intelligent Laboratory Systems 66 (2) (2003) 191e204. https:// 7439(03)00051-0. [19] N.B. Gallagher, J.F. Kelly, T.A. Blake, Passive Infrared Hyperspectral Imaging for Standoff Detection of Tetryl Explosive Residue on a Steel Surface, whispers, Reykjavik (Iceland), 2010. [20] N.B. Gallagher, J.M. Shaver, E.B. Martin, J. Morris, B.M. Wise, W. Windig, Curve resolution for images with applications to TOF-SIMS and Raman, Chemometrics and Intelligent Laboratory Systems 77 (1) (2005) 85e96. [21] N.B. Gallagher, T.A. Blake, P.L. Gassman, J.M. Shaver, W. Windig, Multivariate curve resolution applied to infrared reflectance measurements of soil contaminated with organic analyte, Applied Spectroscopy 60 (7) (2006) 713e728. [21a] S.L. Neal, Direct distance measures in factor analysis spectral resolution, ournal of Chemometrics 8 (1994) 245e261. [21b] R. Bro, S. de Jong, A fast non-negativity-constrained least squares algorithm, Journal of Chemometrics 11 (5) (1997) 393e401. [22] MATLAB The MathWorks, Natick, MA. USA. [23] PLS_Toolbox and MIA_Toolbox 8.5.2. Eigenvector Research, Inc., Manson, WA. USA. [24] T.A. Blake, P.L. Gassman, N.B. Gallagher, Detection and classification of organic analytes in soil, International Journal of High Speed Electronics and Systems 18 (2) (2008) 319e336. [24a] N.B. Gallagher, P.L. Gassman, T.A. Blake, Strategies for detecting organic liquids on soils using mid-infrared reflection-absorption spectroscopy, Environmental Science & Technology 42 (15) (2008) 5700e5705.