Feature-domain super-resolution for iris recognition

Feature-domain super-resolution for iris recognition

Computer Vision and Image Understanding 117 (2013) 1526–1535 Contents lists available at ScienceDirect Computer Vision and Image Understanding journ...

2MB Sizes 6 Downloads 78 Views

Computer Vision and Image Understanding 117 (2013) 1526–1535

Contents lists available at ScienceDirect

Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu

Feature-domain super-resolution for iris recognition q Kien Nguyen ⇑, Clinton Fookes, Sridha Sridharan, Simon Denman Image and Video Research Lab, SAIVT, Queensland University of Technology, 2 George Street, Brisbane, QLD 4001, Australia

a r t i c l e

i n f o

Article history: Received 9 July 2012 Accepted 30 June 2013 Available online 17 July 2013 Keywords: Super-resolution Feature-domain super-resolution Iris recognition Iris recognition at a distance

a b s t r a c t Uncooperative iris identification systems at a distance suffer from poor resolution of the acquired iris images, which significantly degrades iris recognition performance. Super-resolution techniques have been employed to enhance the resolution of iris images and improve the recognition performance. However, most existing super-resolution approaches proposed for the iris biometric super-resolve pixel intensity values, rather than the actual features used for recognition. This paper thoroughly investigates transferring super-resolution of iris images from the intensity domain to the feature domain. By directly super-resolving only the features essential for recognition, and by incorporating domain specific information from iris models, improved recognition performance compared to pixel domain super-resolution can be achieved. A framework for applying super-resolution to nonlinear features in the feature-domain is proposed. Based on this framework, a novel feature-domain super-resolution approach for the iris biometric employing 2D Gabor phase-quadrant features is proposed. The approach is shown to outperform its pixel domain counterpart, as well as other feature domain super-resolution approaches and fusion techniques. Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction Biometric systems enable the automatic identification of individuals based on their physiological and behavioural characteristics such as face, fingerprint, palmprint, gait, iris, retina, and voice. Among the biometrics, the iris has been shown to be one of the most accurate traits for human identification due to its stability and high degree of freedom in texture [1,2]. Most existing iris recognition systems require users to present their iris to a camera at close distance (less than 0.6 m), to ensure images of sufficient quality are captured. The research community is interested in enabling iris recognition to be conducted in less constrained environments, such as when the subject is moving and at a distance. The two most significant problems when performing iris recognition at a distance are pixel resolution (i.e. the number of pixels in the iris region) and quality variation of the captured images [3]. Super-resolution techniques have previously been employed to address the low resolution problems of imaging systems [4]. There are two differing super-resolution approaches: reconstructionbased and learning-based. Reconstruction-based approaches such as [5–12] fuse the sub-pixel shifts among multiple low resolution images to obtain an improved resolution image. Alternatively, q

This paper has been recommended for acceptance by Rudolf M. Bolle.

⇑ Corresponding author. Fax: +61 731381516.

E-mail addresses: kien.nguyenth[email protected] (K. Nguyen), [email protected] qut.edu.au (C. Fookes), [email protected] (S. Sridharan), [email protected] (S. Denman). 1077-3142/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.cviu.2013.06.010

learning-based approaches model high-resolution training images and learn prior knowledge to constrain the super-resolution process [4]. Recently, super-resolution techniques have been applied to biometric systems. A number of super-resolution techniques have been successfully developed for face [13–17] and iris [18–21]. Kwang et al. [19] propose a learning-based super-resolution method using multiple MLPs (multi-layer perceptrons). The middle and high frequency components of a low resolution iris image are restored from the trained neural network architecture. Huang et al. [20] propose another learning-based method utilising a CSF (Circular Symmetric Filter). Their algorithm predicts the prior relation between iris feature information of different bands and incorporates this into the process of iris image enhancement. From a reconstruction perspective, Fahmy [18] proposed a reconstruction-based super-resolution technique to restore multiple low-resolution iris frames captured at a distance of 3 feet. Nguyen Thanh et al. [21] proposed to incorporate quality metrics into the superresolution process to improve performance by assigning better quality frames a higher weight when fusing multiple low resolution images. Nguyen Thanh et al. [21] employs an exponential fusion scheme to estimate the high-resolution image from the low-resolution image sequence. One main concern raised by both Gunturk et al. [22] and Nguyen et al. [23] is how to apply super-resolution for a specific biometric modality effectively to improve recognition performance, rather than visual clarity. Two issues have been raised:

1527

K. Nguyen et al. / Computer Vision and Image Understanding 117 (2013) 1526–1535

 The aim of applying super-resolution to biometrics is not for visual enhancement, but to improve recognition performance. Most existing super-resolution approaches are designed to produce visual enhancement. If recognition improvement is desired, why do we not focus on super-resolving only items essential for recognition?  Each biometric modality has its own characteristics. Most existing super-resolution approaches for biometrics are generalscene super-resolution approaches. Can any specific information from biometric models be exploited to improve super-resolution performance? Based on these concerns, feature-domain super-resolution techniques have been proposed for face [22,24] and iris [23] to improve recognition performance. These approaches no longer super-resolve images in the pixel-domain, but super-resolve the extracted features that are used for classification in the feature-domain, and the super-resolution output (a super-resolved feature vector) is directly employed for recognition. Different linear features including Principle Component Analysis (PCA) [22,23] and Tensor Face [24] have been investigated to improve the respective biometric modalities. These features are super-resolved to increase the resolution using a maximum a posteriori estimation approach. Specific knowledge of face and iris models are incorporated in the form of prior probabilities to constrain the super-resolution process, to make it robust to noise and segmentation errors. These approaches have been shown to outperform pixel-domain super-resolution approaches for face and iris recognition. However, for the specific case of the iris biometric, the 2D Gabor wavelet has been shown to be one of the most effective encoding techniques since it achieves the best trade-off in both spatial and spectral resolution [25,26]. The major challenge that prevents feature-domain super-resolution from being successfully applied to the 2D Gabor phase-quadrant encoding technique is the non-linear nature of the encoding technique. The existing feature domain super-resolution frameworks of [22–24] are unable to super-resolve non-linear features, and are thus not suitable for use with the 2D Gabor wavelet features. To further improve the performance of feature-based super-resolution for iris recognition, we seek to provide a new feature-domain super-resolution framework to overcome the non-linearity of 2D Gabor wavelet features. Using the proposed framework, a novel feature-domain super-resolution approach using 2D Gabor wavelets and the iris biometric is proposed. We show that the proposed approach outperforms the unenhanced features, the pixel domain super-resolution equivalent, as well as other existing feature domain super-resolution and fusion techniques. It should be noted that the proposed framework can also be applied to other non-linear features and other biometrics. The remainder of this paper is organised as follows: a framework for applying feature-domain super-resolution with nonlinear features is investigated in Section 2; Section 3 describes the proposed feature-domain super-resolution approach for iris images; Section 4 describes the approach to estimate initial parameters of the estimation; Section 5 presents the experimental results; and the paper is concluded in Section 6.

2. Non-linear features for feature-domain super-resolution Linear encoding techniques such as PCA and LDA have been investigated for feature-domain super-resolution for iris [23] and face [22]. A PCA encoding technique seeks to project an image onto dimensions which best reconstruct the original data. Alternatively, a LDA encoding technique projects an image onto dimensions which maximise the separation of classes. Both methods capitalise

on the global trend in the training data to represent each iris image as a linear superposition of fundamental functions (eigenface/ eigeniris for PCA and fisherface/fisheriris for LDA). These techniques have been employed to conduct pioneering experiments on iris feature-domain super-resolution [23], and face feature-domain super-resolution [22]. The benefit of using PCA and LDA features is their linear superposition property, which simplifies the estimation of the high resolution features. However, these global features lose spatial information since they decompose the 2D image into a 1D vector [27]. For the iris biometric, the phase-quadrant encoding technique based on 2D Gabor wavelets has been shown to extract the most discriminant features of an iris [2,28]. The advantages of this encoding technique are rapid matching, a binomial impostor distribution and a predictable false acceptance rate [26]. Furthermore, 2D Gabor wavelets have also been shown to be highly effective for face recognition tasks [29]. Hence, employing these coarse phase features rather than PCA or LDA within featuredomain super-resolution has the potential to further improve recognition performance, advancing the state-of-the-art biometric systems. Despite these advantages, the non-linear nature of the encoding technique has prevented it from being investigated for feature-domain super-resolution. The non-linearity (due to the use of phase-quadrant encoding) makes the estimation of a high resolution feature from multiple low resolution features very challenging [23], as it makes the relationship between the multiple low resolution features and the high resolution feature very difficult to establish. In the remainder of this section, we analyse this encoding technique to discuss how feature-domain super-resolution can be applied using the non-linear 2D Gabor features. A brief description of the conventional iris encoding technique [2] is illustrated in Fig. 1. Prior to feature extraction, an eye image is segmented to locate and extract the iris region. Two circles are employed to approximate the pupillary and limbus boundaries of the iris region. This region is normalised to a fixed size so that it can be used for comparison. The normalisation process uses a rubber-sheet model to transform the iris texture from Cartesian to polar coordinates. The remapping of the iris image I(x, y) from raw Cartesian coordinates (x, y) to the dimensionless polar coordinates (r, h) can be represented as,

Iðxðr; hÞ; yðr; hÞÞ ! Iðr; hÞ;

ð1Þ

where r is on the unit interval [0, 1], h is an angle in the range [0, 2p], x(r, h) and y(r, h) are defined as a linear combination of both the set of pupillary boundary points (xp(h), yp(h)) and the set of limbus boundary points (xs(h), ys(h)), such that,

xðr; hÞ ¼ ð1  rÞxp ðhÞ þ rxs ðhÞ;

ð2Þ

yðr; hÞ ¼ ð1  rÞyp ðhÞ þ rys ðhÞ:

Each normalised iris is then demodulated to extract the phase information using quadrature 2D Gabor wavelets,

hRe;Im ¼ signRe;Im

Z Z q

2

Iðq; /Þeixðh0 /Þ eððr0 qÞ

=a2 þðh0 /Þ2 =b2 Þ



qdqd/ ;

/

ð3Þ where hRe,Im can be regarded as a complex-value component whose real and imaginary parts are either 1 or 0 depending on the sign of the 2D integral; I(q, /) is the normalised iris image; a and b are the multi-scale 2D wavelet size parameters; and (r0, h0) represents the two dimensions of the normalised iris image. Only phase information is used for recognition because amplitude information is not discriminant, and depends on extraneous factors such as imaging contrast, illumination and camera gain [2]. Altogether, 2048 phase bits establish the IrisCode.

1528

K. Nguyen et al. / Computer Vision and Image Understanding 117 (2013) 1526–1535

Fig. 1. A conventional iris encoding procedure [2]. The iris region in one iris image (a) is approximated by the region between two circles (b). The eyelid occlusion regions are excluded by a mask. The iris region is normalised to a fixed-size rectangle (c). This normalised rectangle is then encoded using the phase-quadrant 2D Gabor wavelet encoding technique to create an IrisCode (d). The IrisCode is the representation of an iris.

The similarity between two IrisCodes is measured by a bitwise hamming distance given by,

P2048 HD ¼

i¼1

M ððAi  Bi Þ \ ðAM i \ Bi ÞÞ ; P2048 i¼1 ðAi  Bi Þ

ð4Þ

where AM and BM are respectively the masks of IrisCodes A and B, \ and  represents bitwise AND and XOR operator respectively. The masks are employed to exclude the corrupted regions from eyelashes, reflections, eyelids and low signal-to-noise ratio [2]. To explore the non-linearity of the conventional phase-quadrant 2D Gabor wavelet, we present the encoding technique from a different point of view. Notice that the encoding process in Eq. 3 is equivalent to 2 sub-steps: 1. Calculate the complex-valued 2D Gabor features. 2. Encode the complex features using quadrant encoding technique. The non-linearity of the conventional phase-quadrant 2D Gabor wavelet encoding technique comes from the second sub-step. The complex-valued 2D Gabor features obtained from step 1 are not themselves suitable for recognition as they are affected considerably by changes in illumination [2], thus the quadrant encoding of step 2 is required to obtain features suitable for matching. The complex-valued 2D Gabor features are, however, linear. Instead of performing feature-domain SR on the final non-linear phasequadrant features, we propose performing feature-domain SR on the intermediate complex-valued 2D Gabor features and then encode the complex-valued super-resolved output with the non-linear phase-quadrant encoding technique. This framework can take advantage of both the linear property of complex-valued 2D Gabor features and the highly discriminant property of phase-quadrant 2D Gabor wavelet encoding. The approach is illustrated in Fig. 2. Note that this framework can be easily extended to most nonlinear encoding techniques for iris including the conventional approach proposed by Daugman [2,28], and other recent state-ofthe-art approaches such as [26,30,31], since all of them calculate the linear complex-valued features first before encoding them with non-linear techniques. The proposed framework can also be extended to other biometric modalities such as face. The next Section will introduce a novel feature-domain super-resolution approach for the iris biometric based on the proposed framework. 3. Feature-domain super-resolution for iris recognition Our goal is to estimate the 2D Gabor features of the HR images, which we call the high-resolution (HR) features, from the observed low-resolution (LR) features. To solve this estimation, the relationship between the HR features and the respective observed LR features of the same iris needs to be established. This relation can be estimated from the spatial domain relation between the HR images and the respective LR images in the conventional super-resolution

problem. Since the problem of estimation can be regarded as an approach to estimate an unobserved quantity on the basis of empirical data, a Bayesian statistics estimator called maximum a posteriori can be employed to seek the resolution [32]. When solving the estimation, specific information relating to iris models can be exploited to improve the accuracy of the estimation. The proposed five-step algorithm is illustrated in Fig. 3. Details in each step are covered in the remainder of this section. Stage 1: Observation model in the spatial domain. Let x be the original HR iris image, and y(i) be the ith observed LR iris image after being degraded by downsampling, D(i); blurring, B(i); and warping, W(i). The relation between x,y(i) is described as follows [4],

yðiÞ ¼ HðiÞ x þ nðiÞ ¼ DðiÞ BðiÞ W ðiÞ x þ nðiÞ ;

ð5Þ

(i)

(i)

where n is the observation noise. The blur matrix, B , models both the optical blur and motion blur, as outlined in [4]. Stage 2: Observation model in the feature domain. We seek to transform the observation model from the spatial domain to the feature domain. The phase-quadrant 2D Gabor features (IrisCodes) of HR irises, h, and LR irises, h(i), are represented as follows,

hRe;Im ¼ signRe;Im ðgÞ;

ð6Þ

ðiÞ

hRe;Im ¼ signRe;Im ðg ðiÞ Þ;

ð7Þ

(i)

where g and g are the complex-valued vectors representing 2D Gabor features of HR irises and LR irises respectively given by,



Z Z q

g ðiÞ ¼

xeððr0 qÞ

2

=a2 þðh0 /Þ2 =b2 Þ ixðh0 /Þ

e

qdqd/;

ð8Þ

/

Z Z q

yðiÞ eððr0 qÞ

2

=a2 þðh0 /Þ2 =b2 Þ ixðh0 /Þ

e

qdqd/:

ð9Þ

/

Substituting the spatial observation model of Eq. 5 into the HR feature representation of Eq. 9, we have,

g ðiÞ ¼

Z Z q

¼

ðDðiÞ BðiÞ W ðiÞ x þ nðiÞ Þeððr0 qÞ

2

=a2 þðh0 /Þ2 =b2 Þ ixðh0 /Þ

e

qdqd/

/

Z Z

2

2

2

2

DðiÞ BðiÞ W ðiÞ xeððr0 qÞ =a þðh0 /Þ =b Þ eixðh0 /Þ qdqd/ q / Z Z 2 2 2 2 nðiÞ eððr0 qÞ =a þðh0 /Þ =b Þ eixðh0 /Þ qdqd/ ¼ G1 þ G2ð10Þ þ q

/

We make the following assumptions: 1. For each iris image, blurring and warping factors, which degrade the quality of the iris image, are changing along the image. This explicitly means the blurring and warping level var-

1529

K. Nguyen et al. / Computer Vision and Image Understanding 117 (2013) 1526–1535

Fig. 2. Feature-domain super-resolution framework for non-linear iris features. By breaking the encoding process into two sub-steps: complex-valued 2D Gabor feature extraction and phase-quadrant encoding, we can overcome the non-linear nature of the phase-quadrant 2D Gabor encoding technique. We propose a framework to do feature-domain SR on the complex-valued features first, then applying the non-linear phase-quadrant encoding sub-step to the output of super-resolution process. The proposed framework takes advantage of both the linear property of complex-valued 2D Gabor features and the high discriminant property of 2D Gabor wavelet encoding.

G1 ¼

Z Z q

DðiÞ BðiÞ W ðiÞ xeððr0 qÞ

2

=a2 þðh0 /Þ2 =b2 Þ ixðh0 /Þ

e

qdqd/

/

¼ DðiÞ BðiÞ W ðiÞ

Z Z q

xeððr0 qÞ

2

=a2 þðh0 /Þ2 =b2 Þ ixðh0 /Þ

e

qdqd/

/

¼ DðiÞ BðiÞ W ðiÞ g

ð11Þ

(i)

2. Noise, n , is properly assumed to be an Independently Identical Distributed (IID) Gaussian signal. The 2D Gabor wavelet transform can be considered as a local Fourier transform. Moreover, the 2D Fourier transform of a Gaussian signal has a Gaussian form. Hence, the 2D Gabor wavelet transform of the noise, which is the second component in Eq. 10, can be approximated as an IID Gaussian signal.

G2 ¼

Z Z q

nðiÞ eððr0 qÞ

2

=a2 þðh0 /Þ2 =b2 Þ ixðh0 /Þ

e

qdqd/ ¼ v ðiÞ :

ð12Þ

/

With these two assumptions, Eq. 10 can be re-written as,

g ðiÞ ¼ DðiÞ BðiÞ W ðiÞ g þ v ðiÞ :

Fig. 3. The proposed feature-domain super-resolution approach for iris recognition. The relationship between the original high resolution image and the observed low resolution images is modelled, then transferred to feature domain. The high resolution features can be estimated from the observed low resolution features using Maximum A Posteriori (MAP) estimation. With the addition of iris specific model information, the estimation can be done using iterative conjugate gradients.

ies according to the location of the pixel in the image. In this case, B(i) and W(i) are a function of q and /. However, we can make an approximation and assume that B(i) and W(i) are uniform over the normalised iris image. With this assumption, the first component of Eq. 10 can be represented as,

ð13Þ

Eq. 13 shows the relationship between the HR and observed LR iris features. In Eq. 13, the blurring matrix, B(i), can be estimated using the approach proposed in [21]. This approach works by measuring the energy of the high frequency components in the images using a spatial 8  8 high pass filter. Estimating the warping matrix ,W(i), is achieved by finding a matrix that registers the frames with a selected reference frame (the reference frame is chosen by the frame with the highest quality score in the sequence). The accuracy of this step is critical for the super-resolution process, and the images must be registered at the sub-pixel level. There are different approaches for registering irises with sub-pixel accuracy such as [33,34]. For the trade off between accuracy and simplicity, in this research, we follow the patch-based registration approach outlined in [33]. A patch size of 7  7 and neighbourhood size of 10  10 are used. Phase correlation is employed to judge the similarity between patches. In our experiments, three peaks are employed in the phase correlation map for estimating the shift of the patch. Iris images usually suffer from image deformation problems. Dividing normalised iris frames into smaller patches and aligning each local patch with the template will compensate for the local deformation.

1530

K. Nguyen et al. / Computer Vision and Image Understanding 117 (2013) 1526–1535

The following sections will discuss a solution to estimate the HR iris features from this equation. Stage 3: Estimating HR features. In Bayes statistics, a maximum a posteriori probability estimate can be used to estimate an unobserved quantity on the basis of empirical data. Using Bayes maximum a posteriori probability estimation, a HR feature can be estimated as,

g~ ¼ argmaxg pðg ð1Þ ; . . . ; g ðMÞ jgÞpðgÞ:

ð14Þ

The estimated HR feature, g~, is the value that maximises the product of the conditional probability p(g(1), . . . , g(M)—g) and the priori probability p(g). M is the number of the observed LR images.

Stage 5: Estimating the solution. The estimation in Stage 4 is an unconstrained optimisation problem. This optimisation can be solved by both iterative steepest descent and iterative conjugate gradients [35]. With a proper choice of the step size and the maximum number of steps, the iterative steepest descent method is capable of converging to the local minimum sharply. However, iterative steepest descent may never reach the true minimum [35]. Instead of employing steepest gradient directions for iterative updating, a conjugate gradients method utilises conjugate directions, which enables the method to converge more accurately in at most n steps, where n is the size of the matrix of the system [35]. Given this, we solve the optimisation problem in Stage 4 by iterative conjugate gradients. Let the cost function E(g) be defined as,

Stage 4: Incorporating iris model information.

EðgÞ ¼ To solve the above estimation problem, specific information relating to iris models can be incorporated in the form of two assumptions:

1 expððg  lg ÞT K1 ðg  lg ÞÞ: Z

ð15Þ

 Noise, v(i), is an Independent Identically Distributed (IID) Gaussian with a diagonal covariance matrix,

pðv ðiÞ Þ ¼

1 T 1 ðiÞ ðiÞ expððv ðiÞ  lðiÞ v Þ K ðv  lv ÞÞ: Z

T 1 1 ðiÞ expððg ðiÞ  DðiÞ BðiÞ W ðiÞ g  lðiÞ v Þ K ðg Z  DðiÞ BðiÞ W ðiÞ g  lðiÞ v ÞÞ:

Y

pðg ðiÞ jgÞ

i M X T 1 1 ðiÞ ¼ exp  ðg ðiÞ  DðiÞ BðiÞ W ðiÞ g  lðiÞ v Þ K ðg Z i¼1 !

DðiÞ BðiÞ W ðiÞ g  lðiÞ v Þ : The estimation problem can then be rewritten as, g~ ¼ argmaxg ðpðg ð1Þ ; . . . ; g ðMÞ jgÞpðgÞÞ M X T 1 1 ðiÞ exp  ðg ðiÞ  DðiÞ BðiÞ W ðiÞ g  lðiÞ v Þ K ðg Z i¼1 ! 1  expððg  lg ÞT K1 ðg  lg ÞÞ DðiÞ BðiÞ W ðiÞ g  lðiÞ Þ v Z

¼ argmaxg

¼ argming

M X T 1 ðiÞ ðiÞ ðiÞ ðiÞ ðiÞ ðg ðiÞ  DðiÞ BðiÞ W ðiÞ g  lðiÞ v Þ K ðg  D B W g  lv Þ i¼1 T

!

þðg  lg Þ K1 ðg  lg Þ :

ð19Þ

ð20Þ

where Cgn is defined as,

ð21Þ

where Mgn = rgE(gn) and bn ¼ maxð0; bPR n Þ, and

bPR n ¼

Mg Tn ðMg n  Mg n1 Þ : Mg Tn1 Mg n1

ð22Þ

an is the parameter to minimise E(gn + anCgn) through a line search. Hence, with an initial estimation g0, the iterative conjugate gradients estimation method will converge to the true high-resolution g~ which minimises the cost function E(g).

ð17Þ

From Eq. 13, g(i)  D(i)B(i)W(i)g is IID as a consequence of the fact that v(i) is IID, thus,

pðg ð1Þ ; . . . ; g ðMÞ jgÞ ¼

g nþ1 ¼ g n þ an Cg n ;

Cg n ¼ Mg n þ bn Cg n1 ; ð16Þ

The process of estimating of the statistics of the prior probabilities of the features and the noise from the training set is explained in Section 4. From Eq. 13, the individual conditional probability can be estimated as,

pðg ðiÞ jgÞ ¼

i¼1 T 1  lðiÞ v Þ þ ðg  lg Þ K ðg  lg Þ:

The solution for optimisation can be estimated iteratively as follows,

 Prior probability is jointly Gaussian,

pðgÞ ¼

M  T X g ðiÞ  DðiÞ BðiÞ W ðiÞ g  lðiÞ K 1 ðg ðiÞ  DðiÞ BðiÞ W ðiÞ g v

ð18Þ

4. Estimating the statistics of prior probabilities of the features and noise The estimation solution as explained in Section 3 requires the statistics of noise and the prior probability of HR features to be estimated before hand. This section describes this prerequisite estimation performed on a training set (details of the training set used in this work are presented in Section 5). For the high resolution images and video sequences, irises are detected, segmented, and transformed into the polar representation. The high resolution features are extracted from these polar representations using the Gabor-based phase-quadrant encoding technique. With each video sequence, there are up to 20 frames with significant variation in quality. As Daugman has claimed in [2], the features extracted using the Gabor-based phase-quadrant encoding technique are robust to some ranges of quality variation. With a small amount of variation, the features are still able to capture the discriminant components from the images. However, if the quality varies significantly, this property of the Gabor-based phase-quadrant encoding technique no longer exists. Therefore, within a video sequence, the very low quality frames do not provide the required discriminant components. These very low quality frames are eliminated using a quality threshold. This threshold is selected through experimentation.

K. Nguyen et al. / Computer Vision and Image Understanding 117 (2013) 1526–1535

4.0.1. The statistics of prior probability of HR features Prior probability of the HR features has been assumed to have a Gaussian form with mean vectors lg and covariance matrix K given by,

lg ¼



M 1X GðiÞ ; M i¼1

M T 1X ðGðiÞ  lg ÞðGðiÞ  lg Þ ; M i¼1

ð23Þ

ð24Þ

where G(i) is the HR features of the ith training image, M is the total number of training images. 4.0.2. The statistics of noise From Eq. 13, 2D Gabor complex features of noise in the observation equation can be estimated as,

v ðiÞ ¼ g ðiÞ  DðiÞ BðiÞ W ðiÞ g:

ð25Þ

The statistics of noise in the form of a mean vector lv and a covariance matrix K can be estimated as,

lv ¼



M 1X ðg ðiÞ  DðiÞ BðiÞ W ðiÞ gÞ; M i¼1

ð26Þ

M n o T 1X ðg ðiÞ  DðiÞ BðiÞ W ðiÞ g  lv Þ  ðg ðiÞ  DðiÞ BðiÞ W ðiÞ g  lv Þ : M i¼1

ð27Þ The statistics of noise and prior probability estimated here are used to bolster the estimation process described in Section 3. 5. Experiments All experiments are conducted on the Multiple Biometric Grand Challenge (MBGC) portal dataset [36]. The dataset consists of 628 NIR Near-Infrared (NIR) face portal video sequences (approximately 5 videos for one identity) recorded when participants walk through a portal (Iris On The Move [37]) located 3 m from a fixedfocal-length NIR camera (Pulnix TM-4000CL); and 8589 NIR highquality iris still images of the 129 participants. Portal iris video sequences were recorded with fewer constraints on participants. There are approximately 20 frames in each video and the quality of frames varies significantly in the video with degradation caused by effects such as poor image resolution, blurring, and illumination changes as shown in Fig. 4. The resolution of the still iris images is high with approximately 220 pixels across the diameter of the iris boundary circle, while the resolution of iris in the portal videos is significantly lower with less than 90 pixels across the diameter of the iris. The dataset is divided into two subsets for training and testing. For training, 5 still images and 1 video sequence per identity are used to estimate the statistics of noise and prior probability of HR features. Parameters of the statistics are estimated as described in Section 4. For testing, the 4 remaining video sequences for each identity are matched against the HR still images. For each video sequence, all frames are evaluated for quality. The quality metrics proposed in [21] are employed to evaluate the quality of each frame. Individual quality factors including focus, off-angle appearance, illumination variation, and motion blur are fused using Dempster–Shafer theory to produce an overall quality score for each frame. A threshold to remove poor quality frames is selected through experimentation. The remaining frames in one video

1531

sequence have features extracted using the 2D Gabor phase-quadrant encoding technique. These features are fused using the proposed feature-domain super-resolution approach to generate a high resolution iris feature. Finally, the super-resolved feature is compared with the high-resolution iris features. Detection Error Trade-Off (DET) plots are employed to show the performance of different approaches. Experiments are conducted to determine the optimum number of frames for super-resolution (Section 5.1); to compare the system to other pixel-domain super-resolution techniques in terms of the distance to the true vectors (the original feature vector from the high resolution image) and the recognition performance (Section 5.2); to compare linear and nonlinear features for feature-domain super-resolution (Section 5.3); and to compare the proposed approach with other score-level fusion techniques (Section 5.4). Through these experiments, we show that the proposed algorithm outperforms other signal-level fusion, score fusion, and frame selection techniques for iris recognition. 5.1. Effect of the number of frames on the feature-domain superresolution performance The benefit of fusing more frames is the availability of extra information to contribute to improving the accuracy of the high resolution reconstruction, and thus the recognition performance. On the other hand, fusing more poor quality frames could also be detrimental to the accuracy with more noise included. To investigate the trade-off, an experiment was conducted to determine the best number of frames for fusion. The very low quality frames are useless for improving the reconstruction. We employ the quality metrics proposed in [21]. One overall quality score for each frame is calculated from multiple quality measures including focus, viewing angle, illumination variation, and blur. Dempster–Shafer fusion theory has been used to fuse these 4 quality measures into one overall quality score. In this research, the frames with the quality score less than an experimental threshold (0.23 in this research) are excluded from the dataset. Because of the inconsistent quality of the MBGC dataset, there are only 3–9 frames left for each video sequence after the very low quality frames have been eliminated. Super-resolution fusion and recognition are performed with different number of frames: 1, 3, 5, 7, and all available good quality frames (up to 9). When there are insufficient ‘‘good frames’’ of a subject to meet the number required for super-resolution, all the available ‘‘good’’ frames of that subject are used. On the other hand, when there are more ‘‘good frames’’ than the chosen number (i.e. the chosen number of frames for super resolution is 5 but we have 8 good quality frames), the best 5 according to quality score are selected out of 8. From Fig. 5, we can see that when the number of frames is increased from 1 to 5, the recognition performance increases with the introduction of more data. When the number of frames exceeds 5, the decrease in quality caused by illumination, occlusion, or poor focus within the low resolution set counteracts the addition of extra information, resulting in degradation in the performance. For all future experiments, the best performing number of frames (5) will be combined unless stated otherwise. It is also worth noting that the optimal number of fusing frames depends on the dataset. For the MBGC dataset, this is 5. If there is greater Depth of Field (DOF) and/or better illumination this number would increase, and poorer illumination and shallower DOF would lead to a smaller number. 5.2. Comparison to pixel-domain SR The aim of the proposed feature-domain super-resolution approach is to outperform the conventional pixel-domain approaches

1532

K. Nguyen et al. / Computer Vision and Image Understanding 117 (2013) 1526–1535

Fig. 4. A sequence of frames in one video sequence. The level of illumination and focus level of the iris varies significantly. In some sequences, there is also variable occlusions, but this is not present in this sequence.

5.3. Linear vs. nonlinear features

Fig. 5. Effect of the number of frames used for super-resolution on the recognition performance. Using 5 frames yields the best recognition result.

with the super-resolution processing performed directly in the feature domain. In this section, the proposed approach is compared with a conventional interpolation super-resolution approach [38] and a state-of-the-art pixel-domain super-resolution approach [21]. The purpose of a super-resolution approach is to reconstruct high-resolution information from low-resolution evidence. The quality of reconstruction is evaluated by measuring how close the reconstructed vectors are to the ground truth high-resolution vectors. Here we employ the average distance of the reconstructed features to the original high-resolution features as a measurement of the quality of the reconstruction. The distance between a reconstructed feature vector, g~, and the true HR feature vector, g, is calculated for each subject as follows,

Dðg; g~Þ ¼

jg  g~j 1 ;  LengthðgÞ jgj

ð28Þ

where jj is the Euclidean distance; and Length(g), the length of vector g, is used for normalisation. Fig. 6 illustrates that the reconstructed features obtained using the proposed approach are closer to the true HR features than those obtained through other pixel-domain super-resolution approaches. This conclusion is reinforced by Table 1. It can be seen from Fig. 7 that this improvement in reconstruction accuracy leads to an improvement in recognition performance.

PCA and LDA feature-domain super-resolution approaches have been shown to outperform their pixel-domain counterparts in [23]. The 2D Gabor feature-domain super-resolution has also been demonstrated to have superior performance compared with its counterpart pixel-domain approach in Section 5.2. In this section, the advantage of employing the nonlinear 2D Gabor phase-quadrant features against the linear PCA and LDA features is discussed. Fig. 8 illustrates the superior performance of the feature-domain super-resolution approach using 2D Gabor features compared to LDA and PCA features (an Equal Error Rate (EER) of 0.5% for the proposed 2D Gabor approach in comparison with EERs of 2.1% and 4.8% for LDA and PCA respectively). The performance of low resolution features is also shown in Fig. 8 (EERs of 1.8%, 4.9%, and 10% for raw 2D Gabor, LDA, and PCA low resolution features). It can be seen that once again the 2D Gabor approach is superior to LDA and PCA, and that the raw 2D Gabor approach also outperforms the feature domain SR approaches using LDA and PCA. It can also be seen that feature-domain super-resolution has boosted the recognition performance of all three approaches. Employing nonlinear 2D Gabor phase-quadrant features in the feature-domain SR approach as proposed in this paper capitalises on both the boost in recognition performance obtained through the feature-domain SR approach and the discriminant property of the 2D Gabor features.

5.4. Comparison to other fusion approaches The proposed approach can be considered a fusion approach. To further examine its performance, in this section, we compare it with other signal-level, feature-level and score-level fusion techniques. First, a common concern when performing signal-level fusion is that fusing more poor quality frames might degrade the performance. Hence, the best quality frame selection approach [39,2] is implemented here to compare with the proposed feature-domain super-resolution approach. This technique simply chooses the frame with the best quality score from a video sequence for comparison to the still iris images. Fig. 9 shows that the proposed approach, with an EER of 0.5%, achieves a substantial improvement in recognition performance compared with the best quality selection approach with an EER of 1.5%. A recent signal-level fusion approach introduced in [40,41] is re-implemented for comparison. This signal-level fusion approach simply averages all normalised iris images after aligning them. This simple averaging does not work well with the iris video sequence where the quality varies significantly. As such in Fig. 9, the recognition performance of the iris system does not improve

K. Nguyen et al. / Computer Vision and Image Understanding 117 (2013) 1526–1535

1533

Fig. 6. Distance to true feature vectors. The high resolution features reconstructed by the proposed feature-domain super-resolution are closer to the true feature vectors than the one reconstructed by the pixel-domain counterpart and the conventional bicubic interpolation super-resolution.

Table 1 Average distance to true feature vectors of all subjects. Techniques

Average distance to HR feature

Interpolation SR Pixel-domain SR Proposed SR

1.192 1.147 1.036

Fig. 8. Linear vs. nonlinear features in the feature-domain SR approach. Two linear features (LDA, PCA as in [23]) have been employed to compare with the proposed approach using the nonlinear 2D Gabor phase-quadrant features. The nonlinear features (2D Gabor) super-resolution approach achieves an EER of 0.5% in comparison to EERs of 2.1% and 4.8% for the linear features LDA and PCA respectively.

Fig. 7. Recognition performance comparison of the proposed feature-domain SR and pixel-domain SR. The proposed feature-domain SR outperforms other pixeldomain SR due to the direct super-resolving in the feature domain and the incorporation of specific information from iris models. Equal Error Rates (EER) for the four approaches are 1.8% for No SR, 1.8% for bicubic SR, 0.9% for Pixel-domain SR, and 0.5% for the 2D Gabor features.

significantly over the conventional best quality selection approach, with an EER of 1.4% in comparison to 1.5% respectively. Feature-level fusion is believed to be more relevant to less constrained iris recognition since there are significant changes in illumination which may destabilise the pixel fusion approach [42]. A feature-level fusion scheme [42] is compared with the proposed method. The proposed approach can be considered as a featurelevel fusion scheme since the fusion phase is performed in the

feature domain. However, unlike the approach in [42], the features from multiple low resolution frames are not only fused together, but the resolution of the features are also enhanced to improve recognition performance. Fig. 9 illustrates the superior performance of the proposed approach (with an EER of 0.5%) against the simple feature-level fusion scheme (with an EER of 0.8%). Score-level fusion [21] has also been applied for iris recognition. The quality metric is employed to weight the scores for fusion as follows,

PN S¼

Qi i¼1 Si  e PN i¼1 Si

;

where Si is the Hamming distance score when comparing frame i with the template, Qi is the quality score of frame i, and S is the combined score for all frames in the video sequence. This score-level fusion approach (with an EER of 0.9%) is also outperformed by the proposed approach.

1534

K. Nguyen et al. / Computer Vision and Image Understanding 117 (2013) 1526–1535

 From the feature-domain relationship, the objective function to estimate the high-resolution feature is formulated as a maximum a posteriori problem.  Specific information from an iris model is incorporated into the estimation problem to support the estimation. This information is in form of the prior probability of features and noises.  The high-resolution feature is estimated using the conjugate gradient approach.

Fig. 9. Comparison of the proposed approach to other fusion techniques. The proposed feature-domain super-resolution can be considered as a feature-level fusion scheme. The proposed approach capitalises on both the feature fusion using maximum a posteriori and the enhanced resolution of the features. This advantage boosts the recognition performance against other fusion approaches. An EER of 0.5% is achieved for the proposed approach in comparison to EERs of 0.8%, 0.9%, 1.4% for other feature-level, score-level, signal-level fusion approaches.

In summary, the proposed feature domain super-resolution approach outperforms the best quality frame on its own, as well as existing signal, feature and score fusion techniques. 6. Discussion and conclusion The use of feature-domain super-resolution to improve the recognition performance of biometric modalities has been raised in only a small number of papers in the literature. Instead of super-resolving in the pixel domain as in most super-resolution approaches, feature-domain super-resolution focuses on directly super-resolving what is needed for recognition performance, rather than aiming for visual enhancement. In addition, incorporating specific information from the target biometric modality to constrain the estimation process also bolsters the reconstruction. In this paper we have investigated feature-domain super-resolution using different linear and nonlinear features. The nonlinear features have not previously been proposed for feature-domain super-resolution due to the difficulty involved in formulating the reconstruction for these features. However, a large number of the most discriminant features are nonlinear. Hence, to further improve feature-domain super-resolution, these nonlinear features need to be employed. We have proposed a framework for applying feature-domain super-resolution to biometrics. This framework can be applied to different kinds of biometric modalities such as face and iris, and for different nonlinear features. A feature-domain super-resolution approach for the iris biometric using the nonlinear 2D Gabor phase-quadrant features within the new framework has been demonstrated and evaluated in this paper. There are 5 steps in estimating a high-resolution feature from a low-resolution video sequence of an iris:  Modeling the degradation between the original high-resolution images and low-resolution images in the spatial domain with 3 factors: downsampling, blurring, and warping.  Transforming the spatial relationship from the pixel to the feature domain.

The use of nonlinear 2D Gabor phase-quadrant features has been shown to improve the recognition performance in comparison with the linear features in feature-domain super-resolution, and has also been shown to outperform other pixel-domain super-resolution approaches. The use of 2D Gabor phase-quadrant features (one of the most discriminant feature for iris, which has been widely used in commercial iris recognition systems) has further pushed super-resolution techniques towards real world applications. In addition, the proposed feature-domain super-resolution approach has also been shown to be a robust fusion process compared to other signal-level, feature-level and score-level fusion approaches. In future work, we will investigate the distribution of the high resolution features. In this paper, we have followed the work of Gunturk et al. [22] and assumed that the high resolution features form a Gaussian distribution. The actual distribution of these features has not been verified and will be explored to, potentially, further improve performance. A study on the computation cost of the proposed technique, and other enhancement techniques for biometrics will also be investigated. References [1] A. Jain, A. Ross, S. Prabhakar, An introduction to biometric recognition, IEEE Trans. Circ. Syst. Video Technol. 14 (1) (2004) 4–20. [2] J. Daugman, How iris recognition works, IEEE Trans. Circ. Syst. Video Technol. 14 (2004) 21–30. [3] J. Matey, L. Kennell, Iris Recognition Beyond One Meter, 2009, pp. 23–59. [4] S. Park, M. Park, M. Kang, Super-resolution image reconstruction: a technical overview, IEEE Signal Proc. Mag. 20 (3) (2003) 21–36. [5] R. Hardie, K. Barnard, E. Armstrong, Joint map registration and high-resolution image estimation using a sequence of undersampled images, IEEE Trans. Image Process. 6 (12) (1997) 1621–1633. [6] A. Panagiotopoulou, V. Anastassopoulos, Regularized super-resolution image reconstruction employing robust error norms, Optical Eng. 48 (11) (2009) 117004 (14 pp.). [7] A. Patti, Y. Altunbasak, Artifact reduction for set theoretic super resolution image reconstruction with edge adaptive constraints and higher-order interpolants, IEEE Trans. Image Process. 10 (1) (2001) 179–186. [8] M. Elad, A. Feuer, Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images, IEEE Trans. Image Process. 6 (12) (1997) 1646–1658. [9] M. Irani, S. Peleg, Improving resolution by image registration, CVGIP: Graph. Model. Image Process. 53 (3) (1991) 231–239. [10] G. Gilboa, N. Sochen, Y. Zeevi, Forward-and-backward diffusion processes for adaptive image enhancement and denoising, IEEE Trans. Image Process. 11 (7) (2002) 689–703. [11] L.C. Pickup, D.P. Capel, S.J. Roberts, A. Zisserman, Overcoming registration uncertainty in image super-resolution: maximize or marginalize?, EURASIP J Adv. Signal Process 2007 (2) (2007) 1–14. [12] L.C. Pickup, D.P. Capel, S.J.R.A. Zisserman, Bayesian image superresolution, The Computer Journal (2006) 1089–1096. [13] S. Baker, T. Kanade, Limits on super-resolution and how to break them, IEEE Trans. Pattern Anal. Mach. Intell. 24 (9) (2002) 1167–1183. [14] K. Jia, S. Gong, Generalized face super-resolution, IEEE Trans. Image Process. 17 (6) (2008) 873–886. [15] F. Lin, C. Fookes, V. Chandran, S. Sridharan, Super-resolved faces for improved face recognition from surveillance video, in: Lecture Notes in Computer Science, LNCS, vol. 4642, Seoul, Republic of Korea, 2007, pp. 1–10. [16] Y. Hu, K.-M. Lam, G. Qiu, T. Shen, From local pixel structure to global image super-resolution: a new face hallucination framework, IEEE Trans. Image Process. 20 (2) (2011) 433–445. [17] J. Yang, J. Wright, T.S. Huang, Y. Ma, Image super-resolution via sparse representation, IEEE Trans. Image Process. 19 (11) (2010) 2861–2873. [18] G. Fahmy, Super-resolution construction of iris images from a visual low resolution face video, in: 9th International Symposium on Signal Processing and Its Applications, ISSPA 2007, 2007, pp. 1–4.

K. Nguyen et al. / Computer Vision and Image Understanding 117 (2013) 1526–1535 [19] Y. Kwang, R. Kang, J. Byung, J. Sung, Super-resolution method based on multiple multi-layer perceptrons for iris recognition, in: International Conference on Ubiquitous Information Technologies Applications, ICUT ’09, 2009, pp. 1–5. [20] J. Huang, L. Ma, T. Tan, Y. Wang, Learning based resolution enhancement of iris images, Brit. Mach. Vis. Conf. 1 (1) (2003) 1–10. [21] K. Nguyen Thanh, C. Fookes, S. Sridharan, S. Denman, Quality-driven superresolution for less constrained iris recognition at a distance and on the move, IEEE Trans. Inform. Forens. Secur. 6 (4) (2011) 1–11. [22] B. Gunturk, A. Batur, Y. Altunbasak, M. Hayes, R. Mersereau, Eigenface-domain super-resolution for face recognition, IEEE Trans Image Process 12 (5) (2003) 597–606. [23] K. Nguyen, C. Fookes, S. Sridharan, S. Denman, Feature-domain superresolution for iris recognition, in: 18th IEEE International Conference on Image Processing, 2011, pp. 3258–3261. [24] K. Jia, S. Gong, Multi-modal tensor face for simultaneous super-resolution and recognition, in: 10th IEEE International Conference on Computer Vision, vol. 2, 2005, pp. 1683–90. [25] J. Daugman, Gabor wavelets and statistical pattern recognition, in: The Handbook of Brain Theory and Neural Networks, THE MIT PRESS Cambridge, Massachusetts, London, England, 2002, pp. 457–461. [26] A. Kong, D. Zhang, M. Kamel, An analysis of iriscode, IEEE Trans. Image Process. 19 (2) (2010) 522–532. [27] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley and Sons Inc., 2001. [28] J. Daugman, New methods in iris recognition, IEEE Trans. Syst. Man Cybernet. Part B: Cybernet. 37 (5) (2007) 1167–1175. [29] A. Serrano, I. de Diego, C. Conde, E. Cabello, Recent advances in face biometrics with gabor wavelets: a review, Pattern Recogn. Lett. 31 (5) (2010) 372–381. [30] K. Miyazawa, K. Ito, T. Aoki, K. Kobayashi, H. Nakajima, An effective approach for iris recognition using phase-based image matching, IEEE Trans. Pattern Anal. Mach. Intell. 30 (10) (2008) 1741–1756. [31] D. Monro, S. Rakshit, D. Zhang, Dct-based iris recognition, IEEE Trans. Pattern Anal. Mach. Intell. 29 (4) (2007) 586–595.

1535

[32] R. Hardie, K. Barnard, E. Armstrong, Joint map registration and high-resolution image estimation using a sequence of undersampled images, IEEE Trans. Image Process. 6 (12) (1997) 1621–1633. [33] J.-Z. Huang, T.-N. Tan, L. Ma, Y.-H. Wang, Phase correlation based iris image registration model, J. Comput. Sci. Technol. 20 (3) (2005) 419–425. [34] J. Thornton, M. Savvides, V. Kumar, A bayesian approach to deformed pattern matching of iris images, IEEE Trans. Pattern Anal. Mach. Intell. 29 (4) (2007) 596–606. [35] W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes: The Art of Scientific Computing, Cambridge University Press, 2007. [36] P. Phillips, P. Flynn, J. Bevcridge, W. Scruggs, A. O’Toole, D. Bolme, K. Bowyer, B. Draper, G. Givens, Y.M. Lui, H. Sahibzada, I. Scallan, J.A., S. Weimer, Overview of the multiple biometrics grand challenge, in: Advances in Biometrics. in: Proceedings Third International Conference, ICB 2009, Germany, 2009, pp. 705–14. [37] J. Matey, O. Naroditsky, K. Hanna, R. Kolczynski, D. LoIacono, S. Mangru, M. Tinker, T. Zappia, W. Zhao, Iris on the move: acquisition of images for iris recognition in less constrained environments, Proc. IEEE 94 (11) (2006) 1936– 1947. [38] R. Keys, Cubic convolution interpolation for digital image processing, IEEE Trans. Acoust. Speech Signal Process. 29 (6) (1981) 1153–1160. [39] Y. Lee, R. Micheals, P. Phillips, Improvements in video-based automated system for iris recognition (vasir), in: Workshop on Motion and Video Computing, 2009. WMVC ’09, 2009, pp. 1–8. [40] K.P. Hollingsworth, K.W. Bowyer, P.J. Flynn, Image averaging for improved iris recognition, Lecture Notes in Computer Science, LNCS, vol. 5558, 2009, pp. 1112–1121. [41] K. Hollingsworth, T. Peters, K. Bowyer, P. Flynn, Iris recognition using signallevel fusion of frames from video, IEEE Trans. Inform. Forens. Secur. 4 (4) (2009) 837–848. [42] A. Desoky, H. Ali, N. Abdel-Hamid, Enhancing iris recognition system performance using templates fusion, in: Proceedings 2010 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2010), 2010, pp. 451–6.