- Email: [email protected]

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs

A GIHS-based spectral preservation fusion method for remote sensing images using edge restored spectral modulation Xiran Zhou b, Jun Liu a,⇑, Shuguang Liu a, Lei Cao a,c, Qiming Zhou d, Huawen Huang a a

Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China c Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China d Department of Geography, Hong Kong Baptist University, and Laboratory for High Performance Geo-Computation, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China b

a r t i c l e

i n f o

Article history: Received 4 September 2012 Received in revised form 19 November 2013 Accepted 22 November 2013 Available online 20 December 2013 Keywords: Image fusion Generalized intensity–hue–saturation Edge restored spectral modulation Spectral distortion Image quality evaluation Remote sensing

a b s t r a c t High spatial resolution and spectral ﬁdelity are basic standards for evaluating an image fusion algorithm. Numerous fusion methods for remote sensing images have been developed. Some of these methods are based on the intensity–hue–saturation (IHS) transform and the generalized IHS (GIHS), which may cause serious spectral distortion. Spectral distortion in the GIHS is proven to result from changes in saturation during fusion. Therefore, reducing such changes can achieve high spectral ﬁdelity. A GIHS-based spectral preservation fusion method that can theoretically reduce spectral distortion is proposed in this study. The proposed algorithm consists of two steps. The ﬁrst step is spectral modulation (SM), which uses the Gaussian function to extract spatial details and conduct SM of multispectral (MS) images. This method yields a desirable visual effect without requiring histogram matching between the panchromatic image and the intensity of the MS image. The second step uses the Gaussian convolution function to restore lost edge details during SM. The proposed method is proven effective and shown to provide better results compared with other GIHS-based methods. Crown Copyright Ó 2013 Published by Elsevier B.V. All rights reserved.

1. Introduction Multi-source remote sensing images with high spatial and spectral resolutions play an important role in image analysis, feature extraction, modeling, target recognition, and classiﬁcation, among others. Remote sensing images with various spatial and spectral resolutions can be obtained from high-resolution satellites, including SPOT, IKONOS, QuickBird, and WorldView-2. However, the panchromatic (PAN) image that contains detailed object features with high spatial resolution has an extremely low spectral resolution. In addition, multispectral (MS) images with high spectral resolution cannot achieve high spatial resolution. These disadvantages are attributed to signal-to-noise ratio limit and transmission bottlenecks. Image fusion is designed to solve these problems by using the spatial and spectral characteristics of the original images to obtain MS images with high spatial and spectral resolutions. Commonly used fusion methods and frameworks for remote sensing images (Thomas et al., 2008) include the intensity– hue–saturation (IHS) transform (Chu and Zhu, 2008), principal component analysis (PCA) (Shah et al., 2008), and Brovey transform (BT) (Chen et al., 2009). These techniques obtain almost to ⇑ Corresponding author. Tel.: +86 18290480019. E-mail address: [email protected] (J. Liu).

the same level of spatial details, as shown by the use of spatial quality indexes, like the correlation. However, these fusion procedures involve serious spectral distortion (Laporterie-Déjean et al., 2005). Multi-resolution analysis (MRA) (e.g., wavelet transform (WT)) (Park and Kang, 2007; Lanir et al., 2007; El-Khamy et al., 2005) and the ARSIS concept (Ranchin and Wald, 2000) introduce an approach to reduce spectral distortion given that MRA expresses and extracts spatial and spectral data at different scales. However, all image fusion methods cannot simultaneously achieve optimal results in spatial and spectral domains. For example, the IHS, PCA, and BT can nearly preserve the entire spatial resolution of the PAN image but can cause varying degrees of spectral distortion in spectral resolution. WT and MRA can preserve more spectral information of the original MS images but lose spatial details in the PAN images. Therefore, efﬁcient image fusion should ensure a balance between preserving spatial details and reducing spectral distortion. The IHS transform, which is the most widely used fusion method, provides superior results when the MS to be fused only consists of three bands. Tu et al. proposed the generalized IHS (GIHS) to extend the IHS transform to more bands (Tu and Su, 2001). This technique eliminates forward and inverse transforms of the IHS; thus, the method rapidly performs its function and can be used for large amounts of image data fusion. Tu et al. further proposed

0924-2716/$ - see front matter Crown Copyright Ó 2013 Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.isprsjprs.2013.11.011

X. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 88 (2014) 16–27

a fusion method with spectral adjustment that adds the near-infrared (NIR) band and recalculates intensity (Tu et al., 2004). This approach achieved superior results during IKONOS image fusion. Choi proposed an IHS-based fusion with a trade-off parameter t, which adjusts spatial and spectral information to achieve rapid fusion (Choi, 2006). Tu et al. also proposed an optimal trade-off fusion method that preserves spatial details and reduces spectral distortion by using the parameter l (Tu et al., 2007). The qualities of the fusion images processed by these two methods depend directly on the parameters. For different sensors, various parameters must be selected for superior fusion. A fusion technique based on a local variation and the GIHS has also been proposed (Chu and Zhu, 2008). This approach achieves the same spatial resolution as that of the GIHS transform and good spectral quality in the green vegetation region. Numerous methods have also combined the IHS transform with other MRAs, such as WT. These approaches use the spectral information preservation ability of MRA to improve the spectral quality of the fusion image (Chibani and Houacine, 2002; Zhang and Hong, 2005). Numerous efforts have been directed toward reducing spectral distortion in image fusion. Tu et al. determined that spectral distortion in the nonlinear model of the RGB–IHS conversion results from changes in saturation by using the GIHS method. In the present study, we propose a remote sensing image fusion technique based on edge-restored spectral modulation (ERSM). Experiments and comparison tests with traditional methods prove that the proposed technique exhibits obvious advantages in balancing spatial and spectral information. The remainder of this paper is organized as follows. Section 2 introduces the GIHS method and the cause of spectral distortion. The proposed GIHS-based spectral preservation fusion method using ERSM is presented in Section 3. The experimental results with quantitative and qualitative comparisons for different datasets are provided in Section 4. The conclusion is stated in Section 5. 2. GIHS and spectral distortion In reference Laporterie-Déjean et al. (2005), the GIHS equation was deduced through the linear transform of the RGB–IHS, as follows:

2

3

2

3

2

3

Rnew R0 þ Inew I0 R0 þ d 6 7 6 7 6 7 4 Gnew 5 ¼ 4 G0 þ Inew I0 5 ¼ 4 G0 þ d 5; B0 þ d Bnew B0 þ Inew I0

ð1Þ

where (Rnew, Gnew, Bnew) is the fusion image; (R0, G0, B0) represents the original MS image; and d = Inew I0, I0 and Inew denote the intensities of the original and the new MS images, respectively. In the IHS transform, Inew is typically acquired by specifying the histogram of the PAN image according to that of the intensity of the MS images. Other commonly used fusion techniques such as the PCA transform, BT, and WT are categorized as GIHS methods, and thus, the causes of spectral distortion are analyzed in these methods. Various transform methods indicate that the expression of Inew differs. Therefore, the expression of d also varies. This subject is discussed in detail in reference Laporterie-Déjean et al. (2005). In the nonlinear transform of the RGB–HIS:

8 I0 ¼ ðR0 þ G0 þ B0 Þ=3 > > > > < arccosðuÞ if G0 R0 H¼ ; 2 p arccosð u Þ if G0 < R0 > > > > 3 minðR0 ;G0 ;B0 Þ 3X 0 I0 X 0 : S ¼ 1 R0 þG0 þB0 ¼ 1 R0 þG0 þB0 ¼ I0

ð2Þ

ð2B0 G0 R0 Þ=2 where u ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ , and X0 = min (R0, G0, B0). According 2 ðB0 G0 Þ þðB0 R0 ÞðG0 R0 Þ

to Eq. (2), H and S can be calculated as Hnew = H0, and Snew – S0, where H0 and S0 represent the hue and the saturation of the original

17

image, respectively, and Hnew and Snew denote the hue and the saturation of the fused image, respectively. The difference in S is:

DS ¼

I0 X 0 I0 X 0 d ¼ ðI0 X 0 Þ : Inew I0 Inew I0

ð3Þ

Given that X0 is the minimum, I0 X0 P 0d is not necessarily equal to zero; thus, DS is also not necessarily equal to zero. These results lead to the variation in saturation before and after fusion; consequently, they produce spectral distortion. According to the mechanism of color vision, hue is related to the different feelings evoked by light with various wavelengths, and color saturation indicates purity. Therefore, the hue and the saturation of color describe spectral information when they are taken together. Hence, in the GIHS framework, if DS = 0 and H remains unchanged, then the spectral distortion caused by the change in saturation can be reduced. Consequently, spectral information can be preserved more effectively. The proposed ERSM method is based on this concept. 3. The proposed method 3.1. Spectral preservation by scaling and shifting In the RGB color domain, let x = (R, G, B) be the gray value vector before processing to a certain pixel and x0 = (R0 , G0 , B0 ) be the gray value vector after processing. According to Eq. (2), solve the following equation when x and xh:

x0 ¼ ax þ b ¼ ðaR þ b; aG þ b; aB þ bÞ;

ð4Þ

The new intensity can then be calculated as:

I0 ¼ ðaR þ aG þ aB þ 3bÞ=3 ¼ aI þ b:

ð5Þ

Then, the intensity changes linearly.The new hue is:

ð2aB aG aRÞ=2 u0 ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ u: ðaB aGÞ2 þ ðaB aRÞðaG aRÞ

ð6Þ

Therefore, H0 = H, i.e., the hue will not change. The new saturation is:

S0 ¼

I0 X 00 aðI X 0 Þ ¼ –S; aI þ b I0

ð7Þ

where X 00 ¼ minðR0 ; G0 ; B0 Þ and X0 = min (R, G, B). Accordingly, we determine that saturation changes and their amplitudes depend on b. If b 0, then S0 is approximately equal to S, i.e., the saturation can be regarded as unchanged. Therefore, if the gray value vectors of a pixel before and after processing satisfy x0 = ax + b and b 0, then the hue and the saturation are retained, and only the intensity changes. b is the shifting coefﬁcient that affects edge restoration for the image fusion technique proposed in this study. a is the scaling coefﬁcient of pixel (i, j) in the image. Through this coefﬁcient, the spectral value (gray value) of each pixel can be modulated (i.e., SM). Thus, this method is called ERSM. As analyzed previously, ERSM ensures that the hue and the saturation of the spectral value of each pixel remain the same, thus maintaining the same color information and satisfying spectral preservation. In this study, a and b are calculated by extracting the details of the PAN and MS intensity images. If b = 0, then the hue and the saturation will be the same. Thus, the spectral information of the fusion image will be preserved more effectively. However, the original PAN and MS images vary in edge details. When only SM is performed, the differences in the gray values between adjacent pixels are probably reduced despite of an increase in intensity. Consequently, the gray values of the adjacent pixels tend to be close to one another, thus creating edge fuzziness in the fusion image to a certain extent. Hence, edge recovery is necessary. Moreover, the spectral information of the

18

X. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 88 (2014) 16–27

image should not be substantially changed. If b 0, then the aforementioned two requirements can both be satisﬁed. This conclusion can be easily expanded to multiple bands, i.e., if the gray values before and after fusion in each pixel in the corresponding bands satisfy the following relationship, then the spectral information can be effectively preserved, as follows:

M 0k ði; jÞ ¼ aM k ði; jÞ þ b;

ð8Þ

where k represents the band number; (i, j) denotes the pixel coordinates; Mk(i, j) and M 0k ði; jÞ are the gray values of the pixel before and after processing, respectively; and a and b are the scaling and shifting coefﬁcients of the pixel, respectively. 3.2. Spectral Modulation (SM) The extension of the GIHS method to more bands can be expressed as:

Fusk ði; jÞ ¼ Mulk ði; jÞ þ d;

ð9Þ

where Mulk and Fusk denote the kth bands of the original and the fused MS images, respectively; (i, j) represents the pixel coordinates; and d is the same as that in Eq. (1). In the IHS transform, d = Inew I0 is the difference between the intensity of the original and that of the new images, i.e., the difference in spatial details. When spatial details are extracted from the PAN image, the space details of the MS image, which include certain spatial details, should be removed. The scaling coefﬁcient is calculated by extracting the details of the PAN and the MS intensity images. In the present study, a Gaussian convolution is used to determine the scaling coefﬁcient. The Gaussian function has a low-pass nature that is used to extract speciﬁc spatial details from the original image. As an ideal mathematical model, this function can simulate the human visual mechanism. By setting different scale parameters r, the image viewed by the human eyes from different distances can be simulated by using the equation:

Gðx; y; rÞ ¼

1 2pr

2

2 x þ y2 ; exp 2 2r

ð10Þ

where r is the scale parameter of the Gaussian function, and (x, y) indicates the coordinates of a certain point in the convolution template. After convoluting the image with the Gaussian function, the smoothed image L becomes the low-frequency information of the original image, i.e.:

L ¼ Gðx; y; rÞ f ;

ð11Þ

where ⁄ is the convolution operator, and f is the original image. By subtracting low-frequency information from the original image, only high-frequency details are retained. The spatial details of the PAN images and the MS intensity component are expressed as Pan G(x, y, r)⁄Pan and I G(x, y, r)⁄I, respectively. Pan represents the PAN image, and I denotes the intensity component. The differences in spatial details described by Pan G(x, y, r)⁄Pan (I G(x, y, r)⁄I) are the special spatial details extracted from the PAN image. The previous process implies eliminating the lowfrequency details of the original image and preserving those with high frequency. In this study, SM can be expressed as follows:

SM k ¼ Mulk þ j Mulk ¼ ð1 þ jÞ Mulk ¼ a Mulk ;

ð12Þ

where Mulk is the kth band of the original MS image, and a is a set of SM coefﬁcients. j is expressed as:

j ¼ ½Pan Gðx; y; rÞ Pan ðI Gðx; y; rÞ IÞ=Max ¼ ½ðPan IÞ Gðx; y; rÞ ðPan IÞ=Max;

ð13Þ

where Max(i, j) = max{Mulk(i, j)} is the maximum value of each band P at each pixel coordinate (i, j) of the MS image, I ¼ N1 Nk¼1 Mulk is the intensity of the original MS image, and N is the number of bands. According to Eq. (13), the scaling coefﬁcient in each pixel can be calculated, with a = 1 + j as the scaling coefﬁcient. The histogram of the PAN image and the intensity of the MS image do not need to match because the spatial details added to the MS image are the speciﬁc information of the PAN images. Thus, the spectral information of the MS image is preserved more effectively. However, this histogram match has been conducted in numerous existing methods, particularly IHS- or PCA-based methods. The Gaussian window r and the scale factor r directly inﬂuence the preserved high-frequency details, thereby affecting the spatial characteristics of the image fusion results. The Gaussian convolution G(x, y; r) is naturally a low-pass ﬁlter, whereas 1 G(x, y; r) is a high-pass ﬁlter. When window r is increased, more high-frequency details are preserved and the fusion image achieves higher spatial resolution. However, computational requirements also increase as window r increases because the convolution operation is time consuming. Consequently, less spectral information remains, and vice versa. Therefore, a moderate window is necessary to obtain satisfactory fusion results. In addition, the random variable that follows the Gaussian function with l and r as parameters has values that are almost entirely distributed within the range of [l 3r, l + 3r]. This ‘‘3r rule’’ states that in the Gaussian ﬁlter, if a pixel is near the center of the ﬁlter, then its weight will be great. Pixels with distances greater than 3r can be considered as negligible. Experiments show that when r = 3r, a balance between spectral resolution and spatial resolution can be achieved. Hence, all scale factors in the present study satisfy the relationship r = 3r. 3.3. Edge restoration (ER) The detail of the PAN image is used to restore edges by edge detection and extraction. Edge detection operators include the Roberts, Prewitt, and Canny operators, among others. However, these operators are either too complex (i.e., time-consuming) or the results (i.e., the edges) are too wide, thus failing to satisfy b 0. In this study, we use the Gaussian convolution kernel to convolve the PAN image. The convolution result is the edge character of the image. For example, the Gaussian convolution with a kernel size of 5 5 is shown as:

2

0:0000

0:0000

0:0002

0:0000

0:0000

3

7 6 6 0:0000 0:0113 0:0837 0:0113 0:0000 7 7 6 7 GC ¼ 6 6 0:0002 0:0837 0:6187 0:0837 0:0002 7: 7 6 4 0:0000 0:0113 0:0837 0:0113 0:0000 5 0:0000 0:0000 0:0002 0:0000 0:0000

ð14Þ

This equation is only one example of Eq. (10). Other instantiated examples that fulﬁll the requirement b 0 can also be applied in practice. Let the edge detail be D. Given the low-pass character of the Gaussian convolution kernel, the convolution result is subtracted from the original image to detect edges, i.e., D = Pan GC⁄Pan, where ⁄ indicates the convolution process. D(i, j) can be used directly as the shifting coefﬁcient in pixel (i, j). In this pixel, b = D(i, j). To prove the requirement b 0 in terms of an image, we select a WorldView-2 PAN image of Hangzhou area in China. The experimental data (with a spatial resolution of 0.5 m) were acquired on March 5, 2010. Fig. 1 shows (a) the original PAN image and (b) the edge details of D, that is, the set of shifting coefﬁcients b. The kernel size of the Gaussian convolution is 5 5, as shown in Eq. (14). To illustrate, 128 was added to the value of the edge details. The distribution of D is shown as follows:

X. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 88 (2014) 16–27

19

Fig. 1. An image and its edge details.

The horizontal axis represents the interval of the distribution of the values, whereas the vertical axis represents the amount within a speciﬁc interval. Fig. 2 shows that the values in most pixels are near zero, thus indicating that the edge details extracted by the Gaussian convolution can satisfy b 0.

the following adjustments to prevent overﬂows, thereby ensuring that the new ERSM output value ranges from 0 to 255.

If aði; jÞ Mulk ði; jÞ þ bði; jÞ > 255 : Fusk ði; jÞ ¼ Fusk ði; jÞ 255= maxfERSMk ði; jÞg;

3.4. ERSM

If aði; jÞ Mulk ði; jÞ þ bði; jÞ < 0 :

Combined with edge restoration and spectral modulation methods, the ERSM method is proposed in this study, and is expressed as follows:

Fusk ði; jÞ ¼ 0

Fusk ¼ a Mulk þ b ¼ f1 þ ½ðPan IÞ GðrÞ ðPan IÞ=Maxg Mulk þ Pan GC Pan;

where G(r) represents the Gaussian function with a scale factor r and window size r = 3r, and GC is the kernel of Gaussian convolution. When calculating the ERSM output of each pixel, the value a(i, j) Mulk(i, j) + b(i, j) is possibly greater than 255 or less than 0. Hence, the pixel spectral modulation coefﬁcient must perform

x 10

4

5

4

3

2

1

0

-100

-50

0

50

Fig. 2. The distribution of edge detail values.

100

ð17Þ

Otherwise,

Fusk ði; jÞ ¼ aði; jÞ Mulk ði; jÞ þ bði; jÞ; ð15Þ

ð16Þ

ð18Þ

where (i, j) is the coordinate of the current pixel in the image plane;

a(i, j) and b(i, j) represent the scaling coefﬁcients and the shifting coefﬁcients of the current pixel prior to adjustment, respectively; and ERSMk(i, j) is the gray value of each band of the current pixel, which has been calculated by the scaling coefﬁcient and the shifting coefﬁcient prior to adjustment. These adjustments can be regarded as the overﬂow values multiplied by a factor 255/max {ERSMk(i, j)}. According to the analysis in Section 3, the saturation and the hue remain the same, thereby preserving spectral information. In addition, the number of pixels with gray values less than 0 is too small to affect the spectral information and gray value distribution of the entire image. For example, this number is 0.012% for all pixels in the ﬁrst experiment and 0.018% in the second experiment. Therefore, the aforementioned adjustments can still satisfy the requirements in Section 3. Although most of values in Fig. 2 are close to zero, we found useful to increase the spatial information of the fused result by means of the ER procedure. The quality indexes, including spatial frequency (SF), universal image quality index (UIQI) (Wang and Bovik, 2002), correlation coefﬁcients (CC) and Laplacian correlation coefﬁcients (LCC), can be used to compare the spatial and spectral quality of fusion results with and without ER procedure. Results for the WorldView-2 scene, better described in Section 4, are shown in Table 1. Please note that the spatial quality is improved after ER procedure, whereas the spectral quality is slightly decreased, since the spatial details affect the spectral information in the fusion procedure. The increasing spatial quality leads to decreasing spectral quality, and vice versa. Therefore, the ER procedure improves the spatial quality while has little inﬂuence on spectral quality.

20

X. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 88 (2014) 16–27

Table 1 Spatial and spectral quality comparison of results with and without ER procedure. Spatial quality

Result without ER Result with ER

Spectral quality

SF

LCC

UIQI

CC

13.9305 16.1714

0.8020 0.8221

0.9509 0.9434

0.9478 0.9396

Notably, the spectral modulation factor is practically calculated through ﬂoating point operations, and the gray value is an integer

between 0 and 255. In IHS–RGB transformation, converting ﬂoat value into integer value results in precision loss. Thus, the hue and the saturation of the fusion image cannot be identical to those of the original MS image. In practice, however, such loss slightly inﬂuences spectral information. For example, in the ﬁrst experiment, we found that the mean and the standard deviation are 8.6726e6 and 3.3883e3, respectively, after these values were calculated for the difference in saturation values at each pixel in ERSM9 before and after transformation. This ﬁnding indicates that saturation values are almost the same before and after processing. Hence, precision loss can be practically ignored. However, with

Fig. 3. Source QuickBird images and the fused results using different methods: (a) PAN image, (b) resampled MS image, (c) GIHS, (d) Choi’s technique; (e) Tu’s technique, (f) LV, (g) GS, (h) ERSM3, and (i) ERSM9.

X. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 88 (2014) 16–27

regard to the RGB color space, the gray value vector (R0 , G0 , B0 ) of each pixel in the fusion image changes compared with the former gray value vector (R, G, B) of the original MS image because of spectral modulation and edge restoration. During the quality evaluation of image fusion, the evaluation indicators based on the gray value vector reﬂect spectral distortion if the values at each pixel of the original MS image and the fused image are different. However, compared with other fusion techniques, the spectral distortion of the proposed method is signiﬁcantly smaller, as proven in the subsequent experiment.

21

Analogous to the GIHS, the proposed ERSM can be extended to an arbitrary number of bands. That is, the intensity of the MS image can be a function of all bands (R, G, B, and NIR) for four-band MS images such as IKONOS and QuickBird images. For example, the simplest function is:

I ¼ ðR þ G þ B þ NIRÞ=4;

ð19Þ

where NIR denotes the near-infrared band of the MS image. For images with more than four bands, the intensity can be similarly calculated. In general, the simplest intensity can be expressed as:

Fig. 4. Sub-images of the fused QuickBird images using different methods. (a) PAN image, (b) resampled MS image, (c) GIHS, (d) Choi’s technique, (e) Tu’s technique, (f) LV, (g) GS, (h) ERSM3, and (i) ERSM9.

22

X. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 88 (2014) 16–27

Table 2 Comparison of the proposed algorithm with existing methods on the Quickbird images shown in Fig. 3.

Dk Ds QNR ERGAS SDI SAM SF

UIQI

I¼

R G B Ave R G B Ave

GIHS

Choi

Tu

LV

GS

ERSM3

ERSM9

0.0018 0.1359 0.8625 8.7065 0.0162 2.7309 20.2066 20.1070 19.8552 20.0563 0.8516 0.8020 0.8008 0.8181

0.0023 0.1149 0.8830 6.9676 0.0129 2.1702 16.8822 16.7339 16.4390 16.6850 0.8809 0.8472 0.8419 0.8567

0.0013 0.1357 0.8632 8.7071 0.0170 2.9374 20.2066 20.1068 19.8582 20.0572 0.8516 0.8019 0.8008 0.8181

0.0022 0.0733 0.9246 6.7120 0.0142 2.3889 20.1329 19.9410 19.7356 19.9365 0.8788 0.8653 0.8444 0.8628

0.0064 0.1473 0.8473 8.5805 0.0286 5.2157 21.4320 19.3452 17.8825 19.5532 0.8174 0.7916 0.8038 0.8043

0.0030 0.0075 0.9895 2.6336 0.0027 0.4943 18.8978 19.0344 18.6988 18.8770 0.9759 0.9677 0.9643 0.9693

0.0090 0.0410 0.9504 5.2820 0.0035 0.6391 25.3338 26.0265 25.6564 25.6722 0.9218 0.8991 0.8914 0.9041

N 1X Mulk ; N k¼1

ð20Þ

where N denotes the number of bands of the MS image, and Mulk is the kth band of the MS image. Notably, Eqs. (19) and (20) are the simplest expressions, which can be modiﬁed by using more speciﬁc factors such as spectral response. 4. Experimental results and comparisons To evaluate the fusion performance of the proposed methods, QuickBird, WorldView-2 PAN, and MS images are used. Before the fusion, the PAN and the MS images are co-registered, and the MS image is resampled to make its pixel size equal to that of the PAN image. This process aims to obtain perfectly superposable images. The resampling method used is bicubic convolution. Meanwhile, the fused results are compared with the existing IHS-based fusion methods, including the GIHS, the methods used by Choi (2006) and Tu et al. (2007), and the IHS that uses local variation (LV). Gram–Schmidt spectral sharpening (GS) is also commonly used because this approach is known to yield the best fusion results (Aiazzi et al., 2007). The GS algorithm is implemented using the latest ENVI software (version 5.0) that creates the low-resolution PAN according to the sensor type. This result typically exhibits higher accuracy compared with other methods because the spectral response function of a given sensor estimates the appearance of the low-resolution PAN. In the method employed by Choi, the parameter t is set to 5, whereas the parameter l is equal to 125 in the method used by Tu et al. In the following experiments, the expression ‘‘ERSM3’’ refers to ERSM with the window size parameter r = 3, and the corresponding scale parameter is denoted by r = r/3. These parameters can be set according to practical uses. 4.1. Quality assessment During image fusion quality assessment, the fused image should be compared with the ideal MS reference image, which exhibits the same spatial resolution as that of the PAN image (González-Audícana et al., 2005). However, such a reference image cannot be obtained. According to Wald’s protocol (Wald et al., 1997), the spatial resolution of the original PAN and MS images are commonly degraded by downsampling. The degraded PAN and MS images are then fused. The original MS image is then designated as the reference image, and the fused image is compared with the reference image. Numerous indices based on statistics are used to evaluate fusion performance on degraded fused results. These indices include correlation coefﬁcients (CC), erreur relative

globale adimensionnelle de synthese (ERGAS) (Wald, 2002), spectral angle mapper (SAM), universal image quality index (UIQI) (Wang and Bovik, 2002), and Q4. However, Alparone et al. indicated that evaluating fusion quality by degrading the original PAN and MS images is not suitable for high-resolution image fusion (Alparone et al., 2008). Therefore, a quality evaluation method in which the reference image is unnecessary was proposed. This method is known as quality with no reference (QNR). The interband spectral quality of the fused data, interpreted as the similarity relationship between bands, is assumed to be unchanged after fusion. The QNR index is based on UIQI and can measure the local relationship, luminance, and contrast between two images. Assuming that x denotes the reference and y denotes the test image, UIQI is deﬁned as:

Qðx; yÞ ¼

4rxy x y ; ðr2x þ r2y Þðx2 þ y2 Þ

ð21Þ

where rxy denotes the covariance between x and y; rx and ry represent the variances of x and y, respectively; and x and y are the means of x and y, respectively. When the Q index is calculated, a sliding window that measures N N moves all rows and columns of the image to increase differentiation capability and to measure the local distortion of a fused image. The ﬁnal assessment is obtained by averaging all Q values in sliding windows. Based on UIQI, QNR can measure spectral distortion Dk and spatial distortion Ds. Spectral quality is determined by the spectral distortion Dk between the original MS image and the fused MS image with the expression:

Dk ¼

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ XN XN 1 jQ ðMl ; M r Þ Q ðF l ; F r Þj2 ; l¼1 r¼1;r–l NðN 1Þ

ð22Þ

where l and r represent the band number, and N is the total number of bands. Ml denotes the lth band of the original MS image, and Fr denotes the rth band of the fused MS image. Q is the UIQI value, and p is the non-negative integer used to emphasize spectral differences. The spectral distortion index measures the relative relationship between the interband Q indices of the fused and the original images. When spectral distortion is absent, Dk is equal to zero. Spatial distortion Ds is calculated as follows:

Ds ¼

rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 XN jQ ðMl ; P d Þ Q ðF r ; PÞj2 ; l¼1 N

ð23Þ

where P denotes the original PAN image, Pd represents the degraded low-spatial-resolution PAN image, and q is the parameter used to emphasize spatial differences. If all spatial details are injected into the original MS image, then Ds is equal to zero. For the ﬁnal

X. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 88 (2014) 16–27

23

Fig. 5. Source WorldView-2 images and the fused results using different methods: (a) PAN image, (b) resampled MS image, (c) GIHS (d) Choi’s technique, (e) Tu’s technique, (f) LV, (g) GS, (h) ERSM3, and (i) ERSM9.

evaluation, the QNR index is calculated by the nonlinear operation of Dk and Ds, as follows:

QNR ¼ ð1 Dk Þ ð1 Ds Þ:

ð24Þ

By using the aforementioned QNR index, we can evaluate the quality of image fusion in the ﬁne scale. As stated in Section 2, the spectral distortion of the IHS-based methods occurs as a result of the change in saturation. Thus, we introduce saturation deviation to measure the amount of

saturation changes and evaluate the spectral quality of fused images. The saturation deviation index (SDI) is deﬁned as follows:

SDI ¼

M X N 1 X jSM ði; jÞ SF ði; jÞj ; M N i¼1 j¼1 SM ði; jÞ

ð25Þ

where M and N represent the row and the column numbers of the image, and SM(i, j) and SF(i, j) denote the saturation value of the pixel in the (i, j) coordinate. The SDI is only applicable to three-band images. Thus, in this study, the SDI is only used to evaluate the

24

X. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 88 (2014) 16–27

Fig. 6. Sub-images of the fused WorldView-2 images using different methods: (a) PAN image, (b) resampled MS image, (c) GIHS, (d) Choi’s technique, (e) Tu’s technique, (f) LV, (g) GS, (h) ERSM3, and (i) ERSM9.

fusion experiment of three-band QuickBird images and to measure how saturation changes. We also introduce spatial frequency (SF), which is based on the deﬁnition of an image, to evaluate the spatial quality of the fused image. The SF index measures the entire activity percentile of pixels in an image. When SF is higher, image quality is better. The SF index is expressed as:

SF ¼

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ RF 2 þ CF 2 ;

ð26Þ

where

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u M X N1 X u 1 2 RF ¼ t ðIði; j þ 1Þ Iði; jÞÞ ; MðN 1Þ i¼1 j¼1 vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u N M1 X X u 1 2 CF ¼ t ðIði þ 1; jÞ Iði; jÞÞ : ðM 1ÞN j¼1 i¼1

ð27Þ

ð28Þ

X. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 88 (2014) 16–27

25

Fig. 7. Objective assessment results of the fused WorldView-2 images using different methods: (a) Dk and Ds, (b) QNR, (c) ERGAS, (d) SAM, (e) SF, and (f) UIQI.

4.2. Fusion results of QuickBird data The test images used for the ﬁrst experiment are QuickBird PAN and MS images acquired on August 29, 2002. These images contain a PAN image with a resolution of 0.61 m and an MS image with a resolution of 2.44 m. The true-color image that consists of red (R), green (G), and blue (B) is regarded as the input MS image. Both test images are 512 pixels 512 pixels. The MS image is ﬁrst co-registered exactly with the PAN image and then resampled to the same pixel size as the PAN image. Subsequently, the PAN and MS images are fused by using the selected methods. The source QuickBird PAN and MS images as well as the fusion results are shown in Fig. 3, and Fig. 4 presents the sub-images. The objective assessment results are given in Table 1. From a visual perspective, all fusion methods can provide fused MS images with high spatial resolution but the methods perform differently in the aspect of spectral distortion. In the original MS image, the color of the road is gray-white, and that of grasses and trees is dark green, as shown in Fig. 3(b). In Fig. 3(c)–(g), the

road appears dark, which is consistent with that in the PAN image. The bare ground is obviously darker than that in the original MS image, and the grasses are far from dark green. These observations are more obvious in the sub-images in Fig. 4. These phenomena indicate that the fusion results distinctly vary from the original MS image in terms of spectral information. However, the results of the proposed method show that the colors of the road, grasses, trees, and bare ground are almost the same as those of the original MS image, with the ERSM3 results being slightly dim. The same conclusion can be drawn from the results of the objective assessment presented in Table 2, in which the values in boldface type denote the best values in the speciﬁc index. The ‘‘Ave’’ of SF and UIQI denotes the average value of R, G, and B. The Dk obtained using the method of Tu et al. and the GIHS are the lowest and the second lowest, respectively. Meanwhile, the Ds obtained using ERSM3 and ERSM9 are the lowest and the second lowest, respectively. When both aspects are considered, the QNRs of ERSM3 and ERSM9 are the highest and the second highest, and only vary slightly. The SDI and ERGAS of ERSM3 and ERSM9

26

X. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 88 (2014) 16–27

Fig. 8. The inﬂuence of scale parameters on the QuickBird fusion results: (a) QNR, (b) ERGAS, (c) SDI, (d) SAM, (e) SF, and (f) UIQI.

are the lowest and the second lowest, whereas the values obtained using the other methods are signiﬁcantly bigger, thus indicating that the spectral distortion of these two methods are lower than those of the other approaches. The SF obtained by using ERSM9 is signiﬁcantly larger than the results of the other techniques. Therefore, the proposed method exhibits apparent advantages in fusion performance both visually and objectively. 4.3. Fusion results of WorldView-2 data The third experiment is performed on WorldView-2 images from Hangzhou, China. The dataset, which was acquired on March 5, 2010, contains a PAN image with a resolution of 0.5 m and an MS image with a resolution of 2.0 m. The original MS image has eight MS bands, i.e., coastal, blue, green, yellow, red, red edge, NIR1, and

NIR2. NIR2, blue, and coastal bands are selected to compose a three-band pseudo-color image, whereas all eight bands are employed in image fusion. All test PAN and MS images are 512 pixels 512 pixels. The source images and the pseudo-color fused images produced using all fusion techniques are presented in Fig. 5, and the sub-images of the fused images are shown in Fig. 6. The resampled MS image shown in Fig. 5(b) shows numerous green trees in the MS image presented in red because of the NIR2 band. The PAN image in Fig. 5(a) clearly reveals the texture of the trees. Hence, a good fused image should maintain the color as well as the texture of the trees. The colors of all fused images are relatively the same, with the result obtained from ERSM9 being the sharpest (Fig. 5(i)). The results obtained using the other methods appear dim to a certain extent. The white wall of the building is presented in gray in the GS result (Fig. 5(g)).

X. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 88 (2014) 16–27

The results of the objective evaluation are presented in Fig. 7. Only the average values of SF and UIQI are shown because of the limited number of pages for this article. The Dk and Ds obtained by using our proposed method are smaller than the results obtained by using the other methods, thus leading to higher QNR values. The SAM values resulting from our proposed method slightly vary, with the GS method exhibiting the highest SAM. In terms of the UIQI index, our proposed method shows the best values, whereas those obtained using the method of Tu et al. and the GIHS have the smallest values. Therefore, our proposed method performs more effectively not only in visual assessment but also in objective indices. 4.4. Inﬂuence of scale parameter As mentioned in Section 3, the scale r is the only parameter in the proposed method, and the scale is related to window size r of the Gaussian convolution with the relationship r = r/3. In the experiment on the QuickBird dataset, we increase the window size from 3 to 21. Then, the fused images with different window sizes are evaluated using the QNR, ERGAS, SDI, SAM, SF, and UIQI indices. The evaluation results are shown in Fig. 8. The QNR and UIQI values obviously decrease with increasing window size, whereas the other indices exhibit an ascending trend. These trends are observed because as window size increases, more spatial information (i.e., high-frequency information such as edges) is injected into the MS image, thereby decreasing spectral qualities. These behaviors are demonstrated by the decrease in UIQI and QNR as well as the increase in SDI, ERGAS, and SAM. Meanwhile, spatial information is enhanced, as observed from the increase in SF. Hence, a balance in spatial and spectral information can be achieved by choosing a suitable scale parameter. However, this objective remains difﬁcult to realize and will be further examined in our future work. By testing our proposed method on various images, we determine that the scale parameter is associated with the spatial differences between PAN and MS images. In addition, setting the window size to 5, 7, or 9 empirically achieves a balance between spatial and spectral information in most cases. Meanwhile, the process in the aforementioned cases does not require large-scale calculations. 5. Conclusion A new GIHS-based spectral preservation fusion method for remote sensing images is proposed. This technique, which consists of spectral modulation and edge restoration procedures, can effectively reduce spectral distortion of the GIHS caused by changes in saturation. The proposed method is evaluated subjectively and objectively, and is proven effective for the fusion of QuickBird and WorldView-2 PAN images with MS images. We propose the SDI to measure the amount of saturation change while implementing the GIHS-based technique. The SDI is combined with the QNR index and SF to assess fusion quality, and to compare the results of the proposed technique with those of other applied GIHS-based approaches. The experimental results indicate that the proposed method provides a clearer and higher spectral preservation fused image. In addition, it performs more effectively in visual and objective indices, particularly in the aspects of color and texture. This ﬁnding reveals that the proposed method is more applicable in situations with higher requirements for spectral preservation, such as classiﬁcation. In addition, the histogram of the PAN image does not

27

need to be speciﬁed according to the intensity of the MS images, and thus, spectral information is preserved more effectively. Meanwhile, by choosing a proper scale parameter, balance in spatial and spectral quality is achieved by using the proposed method. As a result, ﬁnely fused MS images more suitable for human vision and practical applications are obtained. Acknowledgments This work is jointly supported by the International Science & Technology Cooperation Program of China (No. 2010DFA9272024); National Natural Science Foundation program (No. 41301403); Chongqing Basic and Advanced Research General Project (No. cstc2013jcyjA40010); Chongqing Major Programs for Science and Technology Development (No. CSTC2011GGC40008). References Aiazzi, B., Baronti, S., Selva, M., 2007. Improving component substitution pansharpening through multivariate regression of MS+Pan data. IEEE Trans. Geosci. Remote Sens. 45 (10), 3230–3239. Alparone, L., Aiazzi, B., Baronti, S., Garzelli, A., Nencini, F., Selva, M., 2008. Multispectral and panchromatic data fusion assessment without reference. Photogramm. Eng. Remote Sens. 74 (2), 193–200. Chen, S.H., Zhang, R.H., Su, H.B., 2009. Scaling-up transformation of multisensor images with multiple resolutions. Sensors 9, 1370–1381. Chibani, Y., Houacine, A., 2002. The joint use of IHS transform and redundant wavelet decomposition for fusing multispectral and panchromatic images. Int. J. Rem. Sens. 23 (18), 3821–3833. Choi, M., 2006. A new intensity–hue–saturation fusion approach to image fusion with a tradeoff parameter. IEEE Trans. Geosci. Rem. Sens. 44 (6), 1672–1682. Chu, H., Zhu, W.L., 2008. Fusion of IKONOS satellite imagery using IHS transform and local variation. IEEE Geosci. Rem. Sens. Lett. 5 (4), 653–657. El-Khamy, S.E., Hadhoud, M.M., Dessouky, M.I., Salam, B.M., Abd El-Samie, F.E., 2005. Regularized super-resolution reconstruction of images using wavelet fusion. Opt. Eng. 44 (9), 097001. González-Audícana, M., Otazu, X., Fors, O., Seco, A., 2005. Comparison between Mallat’s and the ‘a trous’ discrete wavelet transform based algorithms for the fusion of multispectral and panchromatic images. Int. J. Rem. Sens. 26 (3), 595– 614. Lanir, J., Maltz, M., Rotman, S.R., 2007. Comparing multispectral image fusion methods for a target detection task. Opt. Eng. 46 (6), 066402. Laporterie-Déjean, F., De Boissezon, H., Flouzat, G., Lefèvre Fonollosa, M., 2005. Thematic and statistical evaluations of ﬁve panchromatic/multispectral fusion methods on simulated PLEIADES-HR images. Inform. Fus. 6 (3), 193–212. Park, J.H., Kang, M.G., 2007. Multispectral iris authentication system against counterfeit attack using gradient-based image fusion. Opt. Eng. 46 (11), 117003. Ranchin, T., Wald, L., 2000. Fusion of high spatial and spectral resolution images: the ARSIS concept and its implementation. Photogramm. Eng. Rem. Sens. 66, 49–61. Shah, V.P., Younan, N.H., King, R.L., 2008. An efﬁcient Pan-sharpening method via a combined adaptive PCA approach and contourlets. IEEE Trans. Geosci Rem. Sens. 46 (5), 1323–1335. Thomas, C., Ranchin, T., Wald, L., Chanussot, J., 2008. Synthesis of multispectral images to the high spatial resolution: a critical review of fusion methods based on remote sensing physics. IEEE Trans. Geosci. Rem. Sens. 46 (5), 1301–1312. Tu, T.M., Su, S.C., 2001. A new look at IHS-like image fusion methods. Inform. Fus. 2, 177–186. Tu, T.M., Huang, P.S., Hung, C.L., Chang, C.P., 2004. A fast intensity–hue–saturation fusion technique with spectral adjustment for IKONOS imagery. IEEE Geosci. Rem. Sens. Lett. 1 (4), 309–312. Tu, T.M., Cheng, W.C., Chang, C.P., Huang, P.S., Chang, J.C., 2007. Best tradeoff for high-resolution image fusion to preserve spatial details and minimize color distortion. IEEE Geosci. Rem. Sens. Lett. 4 (2), 302–306. Wald, L., 2002. Data Fusion: Deﬁnitions and Architectures—Fusion of Images of Different Spatial Resolutions. ENSMP, Paris, France. Wald, L., Ranchin, T., Mangolini, M., 1997. Fusion of satellite images of different spatial resolutions: assessing the quality of resulting images. Photogramm. Eng. Rem. Sens. 63 (6), 691–699. Wang, Z., Bovik, A.C., 2002. A universal image quality index. IEEE Sign. Process. Lett. 9 (3), 81–84. Zhang, Y., Hong, G., 2005. An IHS and wavelet integrated approach to improve pansharpening visual quality of natural colour IKONOS and QuickBird images. Inform. Fus. 6, 225–234.