Salient feature based graph matching for person re-identification

Salient feature based graph matching for person re-identification

Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr Sal...

5MB Sizes 0 Downloads 4 Views

Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/pr

Salient feature based graph matching for person re-identification Sara Iodice 1, Alfredo Petrosino n,1 Department of Science and Technology, University of Naples Parthenope, Centro Direzionale Is C4, 80143 Naples, Italy

art ic l e i nf o

a b s t r a c t

Article history: Received 2 January 2014 Received in revised form 12 September 2014 Accepted 15 September 2014

We propose a person re-identification non-learning based approach that uses symmetry principles, as well as structural relations among salient features. The idea comes from the consideration that local symmetries, at different scales, also enforced by texture features, are potentially more invariant to large appearance changes than lower-level features such as SIFT, ASIFT. Finally, we formulate the re-identification problem as a graph matching problem, where each person is represented by a graph aimed not only at rejecting erroneous matches but also at selecting additional useful ones. Experimental results on public dataset i-LIDS provide good performance compared to state-of-theart results. & 2014 Elsevier Ltd. All rights reserved.

Keywords: Computational symmetry Salient features Graph representation and matching People re-identification

1. Introduction Symmetry detection is highly relevant in pattern recognition. Indeed, the description of a figure may be different when it is embedded in a context with horizontal or vertical symmetry [19]. Besides, in tasks requiring the completion of partially occluded visual stimuli, subjects tend to produce systematically symmetrical [14]. The concept of symmetry is not univocal: various kinds of properties of an image are defined as symmetry [30,28]. As instance, a figure has rotational symmetry when it can be rotated less than 3601 around its central point, or axis, and still matches the original figure. This cue is peculiar in person re-identification where the problem consists in recognizing people in different poses from images coming from distinct cameras. This is an important task in the video surveillance, where large and structured environments must be supervised (such as airport, metro, station or shopping centres) and it becomes more critical when the cardinality of gallery set increases. Symmetry is adopted in [6], a person re-identification method that weighs appearance information, extracted from different body parts, in accordance with their distance from symmetry axes computed on the whole figure. Therefore, symmetry appears to have been used as a global property on this previous work, not as a local one. Conversely, symmetry is adopted as both global and local property in our approach. In particular, the global symmetry axis is exploited to select salient locations representing each n

Corresponding author. E-mail addresses: [email protected] (S. Iodice), [email protected] (A. Petrosino). 1 Tel.:/fax: þ 39 0815476656.

pedestrian, while local symmetry is adopted like local feature in order to describe each salient location. Researchers in computer vision have made significant progress in representing and detecting symmetries in images and other types of data [15]. However, there has been relatively little work on using local symmetries as explicit features for matching tasks [2], and no work about matching the same pedestrian with different poses, like in re-identification. The main idea is to use a variety of symmetries, rather than repetitions, together with texture as cues, in order to define a complex feature detector, based on the consideration that, at different scales, the symmetry is potentially more invariant to large appearance changes than lower-level features such as SIFT, and, when combined with a texton-based feature, is highly discriminative [11]. In this context, the main contribution resides in the image features and the adopted feature organization for matching. The discriminative power could be more appreciated if the features are organized in an appropriate manner. Most of the researchers focus their attention on the features extracted in many ways from the scene. The Harris and SIFT [16] features are important to identify what is distinctive and discriminative for the purpose of a correct recognition of the scene. The bag of words algorithm has been applied to SIFT descriptors, to identify discriminative combinations of descriptors [4]. In [18] the application of a clustering to descriptors leads to results which are less distinctive in a large cluster than those in a small cluster. For example in indoor navigation, window corners are common, so they are not good features to uniquely identify scenes, while corners found on posters or signs are much better. In [7] an effective approach based on real-time loop detection has proved to be efficient using a hand-held camera, through SIFT features and intensity and hue

http://dx.doi.org/10.1016/j.patcog.2014.09.011 0031-3203/& 2014 Elsevier Ltd. All rights reserved.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

histograms combined using a bag of words approach. Up to now none of the existing approaches tackled the relations among features in terms of similarity and spatial homogeneity. Our main contribution consists in introducing an approach based on graph-based representation, according to which regions with their corresponding feature vector and the geometric relationship between these regions are encoded in the form of a graph. According to the idea that an image can be described at the higher level in terms of a nested hierarchy of local symmetries, we present a novel graph matching approach to the problem aimed at evolving an initial set of correspondences computed with the local features, as a kind of compromise between the constraints imposed by both the local features and the structural relations. The vertices are the image key points detected by the SIFT (specifically we adopt ASIFT [27]), enriched by features based on color, texture, and local symmetry. The cost of an edge joining two vertices represents a measure of their dissimilarity. Therefore, the problem of person re-identification is formulated as a graph matching problem. Our approach is an appearance-based method which differs from the state-of-the-art: (i) unlike [3,21] we give great weight to local features; (ii) we do not adopt spatio-temporal information such as [17,13]; (iii) our method is non-learning based one, unlike [32,1,22]. The paper is organized as follows. Section 2 deals with pedestrian representation. Section 3 provides details about how pedestrian representations are compared each other in order to solve re-identification task. Section 4 reports testing results on public dataset ILIDS and finally in Section 5 some conclusions are drawn.

2. Pedestrian representation The main steps to represent a pedestrian for recognition purposes using visual features can be listed as below: 1. 2. 3. 4.

Candidate keypoints extraction Keypoints filtering by symmetry axis Local feature extraction Graph representation

correlation. Therefore, under the hypothesis that there are no high pose variations, the symmetry axis detected by algorithm still remains unchanged. As instance, as shown in Fig. 1, if two pixels p1 and p2 belonging to the pedestrian in the frame A are symmetric with respect to the reflective symmetric axis RFSa and are located at a distance d from RFSa, they continue to be exactly symmetric also in the frame B, where the pedestrian changes his pose and is located at the 0 distance d  d. For this reason, filtering keypoints ASIFT located at a distance x r d, the probability to keep matched keypoints clearly increases and the number of keypoints, candidate for the graph representation, decreases. Another main aspect is that candidate keypoints, with respect to the distance d from the symmetry axis, are roughly located within the foreground and, consequently, background subtraction for figure/background separation could be avoided. 2.2. Keypoints distribution around symmetry axis To extract information about the density distribution of candidate keypoints around the symmetry axis, the distances dj of the generic keypoint kj from the reflection symmetry axis RFSi of the region (head, trunk and legs) to which kj belongs, are computed for each frame i ¼ 1; …; n. For the generic distance d we estimate

 The probability density p(d) that describes the probability of 

ASIFT keypoints to be located exactly at a distance d from the RFS reflection symmetry axis. The cumulative distribution P(d) that describes the probability of ASIFT keypoints to be located at a distance x r d from RFS.

The plots shown in Fig. 2 are built considering all the frame of ILDS MCTS public dataset. Moreover, the cumulative distribution accounts for 58.29% for d ¼12, underlining that a wide set of ASIFT keypoints is located closer to the reflection symmetry axis. On the other hand, from the plot in Fig. 2a, it can be observed that the probability density decreases for values of d Z3. Therefore, ASIFT keypoints, located at a greater distance from the reflection symmetry axis, are clearly less informative and should be

First of all, candidate keypoints are extracted from each pedestrian image, filtered according to their distance from the symmetry axis and then local features are computed on each selected location. Finally, each pedestrian image is represented by a graph, in order to consider structural relations. 2.1. Keypoints filtering by symmetry axis ASIFT locations [27], fully invariant with respect to zoom, rotation, translation, and the angles defining the camera and axis orientation, are extracted from each frame of the dataset as candidate keypoints. The property of ASIFT features to be invariant to affine transformations is very attractive in person re-identification problem, where pedestrian figures, subject to viewpoints and pose variations, need to be represented for recognition aim. On the other hand, the property of two pixels, belonging to a certain region, to be symmetric, with respect to a particular axis, remains unchanged when an affine transformation is applied to the same region. Thus, reflective symmetry should be considered invariant to affine transformations. We adopt our multi-scale symmetry detection algorithm [10] to extract the symmetry axis corresponding to the region and the direction with maximal

Fig. 1. (a) Frame A, (b) Frame B.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

3

vector of length 128 including information about the local texture on image, using affine invariant intrinsic neighborhoods and affine invariant intrinsic orientation matrices. 2.3.2. Local and multiscale symmetry We specifically take advantage of a measure obtained by using correlation with the flipped image around a particular axis. Indeed, in any direction, the optimal symmetry axis corresponds to the maximal correlation of a pattern with its symmetric version [5]. In particular, an enhanced version of the Kondra and Petrosino symmetry detection algorithm, already reported at the “Symmetry Detection from Real World Images” competition at IEEE CVPR 2011 and reported in [10], is used to extract measures of symmetry. Our approach was recipient of the Award as Winner of the Reflection Symmetry at the 2013 Symmetry Detection from Real World Images competition held at CVPR 2013. Let S be the side of the patch (scale) extracted from pedestrian images and centred at each salient location. Suppose that S assumes different values in the range ½a; b. At each scale S, plane rotations of the  image are computed with directions in the set  D ¼ di : i ¼ 1‥nd . Let pji be the i-th patch extracted from image after rotating the image plane in direction j. The correlation measure cji between the patch pji and its reflected version p0;j i is computed as follows: cji ¼

ðpji ○p0;j i Þx;y W x;y ¼ 1 S



ð1Þ

where W¼

S

∑ ðpji Þ2x;y

x;y ¼ 1

j 0;j and ðpji ○p0;j i Þx;y is a matrix with elements given by ðpi ○pi Þx;y ¼

ðpji Þx;y  ðp0;j i Þx;y : For color images, the patch is reflected with respect to the three bands before doing the correlation. Furthermore, KIMA A R3 are measures of symmetry related to three different values of scale S, obtained by selecting maximum correlation value in any direction D. Given the values of S, each component of KIMA for pji is computed as follows: n o ð2Þ KIMA½S ¼ maxj cji Fig. 2. (a) Probability density function p(d), (b) cumulative distribution function P(d).

discarded, thus reducing the number of keypoints required for the re-identification process. 2.3. Local feature extraction Each keypoint kj, selected with respect to the symmetry axis, is enriched by visual features (local symmetry, texture and color) resulting in a final feature vector f j A R147 . Each feature is normalized by a whitening process (mean zero and variance one). Furthermore, the feature vector fj is arranged as follows: fASIFT A R132 ; KIMA A R3 ; TIMA A R3 ; TEXTON A R6 ; RGB A R3 g where the descriptors are defined as follows. 2.3.1. ASIFT descriptor The ASIFT descriptor is defined like P i ¼ ðX i ; Ri ; U i Þ, where X i ¼ ðxi ; yi Þ is its 2D location, Ri ¼ ðr i ; αi Þ its scale and orientation  respectively (in radians from  π to π ) and U i ¼ U i;1 ; …; U i;128 a

Besides, TIMA A R3 are the directions corresponding to the maximum value of correlations for the same values of scale. Given the value of S, each component of TIMA for i-th patch is computed as follows: n o TIMA½S ¼ argmaxj cji ð3Þ We selected S in the set f2; 4; 8g in our experiments. 2.3.3. Texture description For each keypoint kj, textural filter responses are computed (TEXTON A R6 in the feature vector). Texture features are based on the technique proposed in [12]. Specifically, a two-dimensional patch is considered for each interest point. The method uses 3 spatial filters of dimension 13  13. The first is a Gaussian filter with σ ¼ 0.5, the second is a LoG filter with σ ¼ 0.7. Finally, the third filter is of the form   πτt t2 =2σ2 e ð4Þ Fðr; σ ; τÞ ¼ cos

σ

where σ is the sigma/standard deviation of the Gaussian, τ is the frequency of sine wave. As can be observed, filter response grows

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

if τ 4 σ , while decreases if τ o σ . However, parameters have no influence if τ ¼ σ. In our case, σ and τ are equal to 2. Each two-dimensional patch is convoluted with these filters, later thresholded so as to keep only the positive responses. The obtained binary image is denominated the “on” response. Thus, three binary images were obtained, and then one for each filter. The “on” responses are further convoluted with average filters, respectively of size 3  3 and 6  6, so as to obtain a multi-scale representation. At the next step, images are converted to grayscale and the intensity is normalized to have zero mean and variance equal to one. All filters have been normalized using a L1 normalization, so that the responses of each filter are in the same range, i.e. each filter Fi is divided by the J F i J .

2.3.4. Color description The last three components of the features vector fj are the average values of a patch pi centred at kj of side K (K ¼4 for our experiments) for the three channels R,G,B. R,G,B space is adopted to describe local chromatic content since Euclidean distance, used to compare local features should not be used for H,S,V space. However, we exploited the H,S,V color histogram to encode global color content, better than the R,G,B to define similarity between images.

Fig. 3. Examples of graph representation.

2.4. Salient feature graph

3. Pedestrian matching

Each salient location detected by ASIFT [27] and filtered out by the symmetry axis is enriched by the features described above. Each pedestrian image is represented by a k-nearest neighbor graph, in order to consider structural relations. So the problem of person re-identification becomes a problem of graph matching. Usually a Graph G is a 3-tuple G ¼ ðV; E; ZÞ where V is a set of vertices, E D V  V is a set of edges, where e ¼ ðvi ; vj Þ A E is edge joining nodes vi , vj A V, and Z is a set of vectors, where zi A Z is the vector of attributes associated to node vi A V: We build the graph for each pedestrian image as follows:

This section deals with how to match pedestrian representations to solve re-identification task. Two measures of distances are adopted. The former is based on the number of keypoints positively matched according to the ratio test, while the latter takes into account also structural relations as a result of a graph matching.

3.1. Ratio test A feature vector fAi from a source image A is positively matched to a feature vector fBk from a destination image B, if the distance

1. Vertices are the image key points detected by the ASIFT algorithm. We associate to each vertex a feature vector fj as described in Section 2.3. 2. Two vertices vi and vj are connected by an edge if the Euclidean distance between vi and vj is among the k-th smallest distances from vi to other n vertices. The edge cost represents a measure of the distance between vertices.

and

pffiffiffi In our experiments we set k ¼ 2 n, where n is the number of ASIFT locations selected at current frame. In Fig. 3, examples of knearest neighbor graph representing two pedestrians are shown; the vertices are represented in red while edges joining two vertices are in blue.

where distðÞ is the Euclidean distance, f k2 is the descriptor of destination image with the second smallest distance from fAi and 0 o p r 1 is a ratio value controlling the tolerance to false positives. Therefore, a measure of distance dRATIO to compare pedestrians is computed, starting form the number of feature vectors positively matched between the source image A and the destination image B.

2.5. Weighted color histogram

3.2. Graph matching

HSV histograms is adopted like global feature to encode all the chromatic content of each part of the pedestrian. Moreover, the weighted histogram [6] is built, taking into consideration the distance from the d symmetry axis.  Each  pixel is weighted by a one-dimensional Gaussian kernel N μ; σ , where μ is the y coordinate of the symmetry axis, and σ is a standard deviation set to J/4. So doing, pixel values near the symmetry axis count more in the final histogram.

One main point is how to measure the contribution of matching one node to another with regards to the structural relations. We adopt a revised version of the algorithm reported in [24]. The distance between the vertices of two graph is obtained through the combination of two measures: the measure of similarity and the measure of consistency. The former is the Euclidean distance between the feature vectors; the latter takes into account also the links between the vertices.

A

B

A

B

A

B

distðf i ; f k Þ ¼ minðdistðf i ; f j ÞÞ;

distðf i ; f k Þ A B distðf i ; f k2 Þ

o ρ;

j ¼ 1; …; m;

ð5Þ

0 op r 1

ð6Þ B

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

The algorithm for matching two graphs GD ¼ ðV D ; ED ; Z D Þ and GM ¼ ðV M ; EM ; Z M Þ, having m and n vertices respectively, is the following:

þ β  dist RATIO ðSET ðAÞ; SET ðBÞÞ

To initialize the matching matrix Sð1Þ , the ratio test procedure has been applied to the vertices of the two graphs. It requires about N  N  D operations, where D is the length of feature vectors (in our case D ¼147). Let us consider Q i;j as a measure of structural consistency between vi and vj, while Ri;j is their measure of similarity. Then, each element W i;j of matrix Ω is computed as ð7Þ

More formally, for the match vi A GD -vj A GM , Q ði;jÞ is given by " # Q i;j ¼ exp μ ∑

∑ ADði;kÞ AMðj;lÞ Sk;l

ð8Þ

k A Vdl A Vm

where ADði;kÞ and AMðj;lÞ are the adjacency matrices of GD and GM, respectively (i.e., ADði;kÞ ¼ 1 if there is an edge joining vi and vk; similarly, AMðj;lÞ ¼ 1 if there is an edge joining vj and vl) and Sk;j is an element of the N  M current matching matrix S and μ 40 is a control parameter to avoid local minima. The matching matrix S is a binary matrix defining an injective mapping between two graphs GD and GM, such that an element Si;j A S is set to 1 if node vi A V D is matched to node vj A V M and 0 otherwise. Whereas, Ri;j is given by Ri;j ¼

1 D

M

distðf i ; f j Þ

ð9Þ

D where fD i and fj are the feature vectors related to vi A GD and vj A GM , respectively. The computation of matrix Ω requires about ðMN þDÞðMNÞ operations, where D is the size of the feature vector. At the last step, the maximum element of each row of Ω is found and then the corresponding element of matrix S is activated. This step requires about mðn  1Þ operations. In brief, the measure of similarity between two vertices will be higher, if they are not only close in the features space, but they have also similar vertices, as neighbors. In our experiments, we set the value of ρ ¼0.8 (to control the acceptance rate) and the value of control parameter μ ¼ 0.15. Finally, a measure of distance dGRAPH is obtained from the number of nodes positively matched between two pedestrian.

3.3. Final distance measures The final distance measures used to compare pedestrians are dist 1 ðA; BÞ ¼ α  dist WH ðWH ðAÞ; WH ðBÞÞ þ β  dist GRAPH ðGRAPH ðAÞ; GRAPH ðBÞÞ

Conversely, Eq. (11) does not take into account structural relations and allows us to evaluate our feature vectors saving computation time. dist 2 ðA; BÞ ¼ α  dist WH ðWH ðAÞ; WH ðBÞÞ

1. The matching matrix is initialized at the first iteration Sð1Þ . 2. The matrix Ω of combined coefficients is computed. 3. The matching matrix S is updated.

W i;j ¼ Q i;j Ri;j

5

ð10Þ

where WH ðÞ is Weighted Color Histogram computed taking into consideration the distance of each pixel to the symmetry axis, as described in Section 2.5. GRAPH ðÞ is the graph representation of the pedestrian image defined in Section 2.4, distWH is the Bhattacharyya distance used to compare histograms, distGRAPH is the graph distance described in Section 3.2. Finally, α and β are weighting values adopted to give different emphasis to distWH and distGRAPH.

ð11Þ

It needs just MD operations against the ½ðMNÞ þ DðMNÞ operations roughly required by Eq. (10). Moreover, SET ðÞ is the salient locations set, while distRATIO is the distance measure computed from the number of keypoints positively matched in accordance with the ratio test defined in Section 3.1

4. Experiments and analysis Our approach has been evaluated by comparing the reidentification task on the i-LIDS MCTS dataset [26]. The results of re-identification performance are in terms of Cumulative Matching Characteristic (CMC) and Synthetic Disambiguation or Reacquisition Rate (SDR) curves. CMC [8] represents the re-identification rate of the system in function of rank position. Such measure is analogous to the Roc Curve for detection problem [29]. On the contrary, the SRR or SDR (Synthetic Disambiguation or Reacquisition Rate) converts a performance metric of size N to one of size M, showing the probability that any of the M best matches is correct. We provide four test cases in order to compare local and global features to evaluate the contribution of our descriptor based on symmetry and texture. Finally, our method is compared with results achieved by stateof-the-art methods on the same i-LIDS MCTS dataset with the same performance measures. 4.1. Dataset The performance of our method is evaluated on the i-LIDS MCTS dataset, corresponding to a real scenario observed by a multi-camera CCTV network monitoring the arrival hall of an airport. The i-LIDS MCTS dataset contains 479 images of variable size, for a total of 119 pedestrians. Each pedestrian is characterized by at least two images representing the same person wearing the same clothes. Such scenario is affected by several problems like illumination changes, occlusions, shape deformation and image blurring that make challenging the re-identification process.

4.2. Experimental setup To evaluate our approach, MvsS experiment is performed. According to the MvsS experiment one image from each person is randomly chosen to set up the gallery set G, while the probe set P is composed by the remaining images. Each image pi A P is compared with each image g i A G in order to evaluate the matching or score distance. We repeat the experiment MvsS as described above 100 times in order to build the CMC curve. Four test cases are designed to quantify the contribution of our descriptor based on symmetry and texture and compare local and global features. Cases differ from each other for the kind of descriptor used for pedestrian representation.

 Case A: ASIFT descriptor is adopted as local feature.  Case B: ASIFT descriptor enriched by symmetry and texture is used as local feature.

 Case C: global feature (Weighted Color Histogram) in combination with local features based just on ASIFT descriptors is adopted.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

 Case D: global feature (Weighted Color Histogram) in combination with local features based on ASIFT descriptor enriched by symmetry and texture is adopted. As it appears, just local features are used in cases A and B, while global features in combination with local features are adopted in cases C and D. Furthermore, our descriptor based on symmetry and texture is used in cases B and D. Experiments for each test case are performed changing the value of parameter d. Moreover, according to our approach only the keypoints kj located at distance x r d from the reflection symmetry axis RFS are adopted for pedestrian representation. Eq. (11) is adopted to evaluate the matching distance and establishes the best feature combination, saving computation time. Finally, Eq. Table 1 Case A. d

Rank 1

Rank 2

Rank 3

Rank 4

Rank 5

Rank 6

Rank 12

8 12 16 20 22 24 26 28

8.94 12.44 13.69 15.72 16.41 17.42 17.48 18.56

12.96 17.58 18.67 22.09 22.71 23.39 24.59 24.34

16.12 20.99 22.54 25.42 27.01 26.61 28.26 27.78

18.81 23.85 26.05 29.39 29.97 29.21 30.89 31.19

21.08 25.89 29.02 32.82 32.45 31.66 33.34 33.73

23.23 27.58 31.54 35.79 34.24 33.84 36.24 36.25

31.85 36.19 42.45 46.17 42.53 42.89 44.37 46.09

(10) is applied to take into account also structural relations and reach a best re-identification rate.

4.3. Experimental results: A, B, C, D cases Experimental results on test cases described above are shown in this section. Table 1 contains re-identification percentages of rank f1; 2; 3; 4; 5; 6; 12g for different values of d in the test case A. Images A and B, depicting the same pedestrian, produce rank k re-identification, if B is correctly identified as A when B is at the k-smallest distance from A. The chart 4 shows the changes of recognition rates over different values of d parameter in terms of CMC curve. The vertical axis of the graph provides the recognition percentages from distinct experiments performed using various sets of keypoints for pedestrian representation, obtained changing the d parameter value. Each color is related to distinct test running with a specific value of the parameter d. Moreover, the set of keypoints becomes smaller when the value of d decreases, improving the computation time required by the matching. From the plot in Fig. 4, it can be observed that the trend of the CMC curve is fairly similar for values of d A f20; 22; 28g, while recognition percentage dramatically decreases for lower values of d. Table 2 shows recognition percentages for rank f1; 2; 3; 4; 5; 6; 12g with distinct values d A f8; 12; 16; 20; 22; 24; 26; 28g for the test case B, while the plot in Fig. 5 provides the same results in terms of CMC

Fig. 4. Case A: ASIFT descriptor is adopted as local feature. (For the interpretation of the reference to color in this figure caption, the reader is referred to the web version of this paper.)

Fig. 5. Case B: ASIFT descriptor enriched by symmetry and texture is used as local feature. (For the interpretation of the reference to color in this figure caption, the reader is referred to the web version of this paper.)

Table 2 Case B.

Table 3 Case C.

d

Rank 1

Rank 2

Rank 3

Rank 4

Rank 5

Rank 6

Rank 12

d

Rank 1

Rank 2

Rank 3

Rank 4

Rank 5

Rank 6

Rank 12

8 12 16 20 22 24 26 28

10.63 12.60 15.45 18.05 16.52 17.48 18.23 18.85

14.66 16.92 20.30 22.74 22.03 23.78 24.62 26.20

18.39 19.82 23.26 26.01 26.05 27.56 28.05 29.79

20.53 22.27 25.53 28.91 29.46 30.17 30.72 31.58

22.01 24.57 27.20 31.13 31.95 32.21 32.63 33.26

23.42 26.47 28.64 32.97 34.02 34.17 34.24 35.04

31.81 37.19 36.66 42.48 43.89 43.40 44.58 45.45

8 12 16 20 22 24 26 28

36.91 38.04 40.11 41.89 41.31 40.40 41.47 42.11

48.45 49.40 51.50 53.78 53.82 52.92 54.89 55.28

53.25 56.70 58.97 59.64 59.57 59.55 60.70 61.05

56.86 61.13 63.66 64.26 64.18 63.69 65.17 64.49

59.97 64.27 66.09 67.36 67.88 66.84 68.67 68.07

62.59 66.53 68.03 69.82 70.49 69.98 71.74 70.94

73.75 76.23 77.36 79.97 80.49 80.41 80.87 81.60

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Fig. 6. Case C: global feature (Weighted Color Histogram) in combination with local features based just on ASIFT descriptors is adopted. (For the interpretation of the reference to color in this figure caption, the reader is referred to the web version of this paper.)

7

Fig. 8. Ratio test wrt graph test.

Table 4 Case D.

Fig. 7. Case D: global feature (Weighted Color Histogram) in combination with local features based on ASIFT descriptor enriched by symmetry and texture is adopted. (For the interpretation of the reference to color in this figure caption, the reader is referred to the web version of this paper.)

curves. Cases A and B are compared in order to observe the effect of our descriptor based on symmetry and texture. Furthermore, each cell of Table 1 is compared to the corresponding cell of Table 2 at the same position. The bold type highlights the highest re-identification rate for the same kind of recognition and value of the d parameter. It could be observed that case B slightly improves on rank 1 recognition percentage with all values d. This result supports our conjecture that our descriptor, based on symmetry and texture, is highly specific, useful in a real time context characterized by a high number of pedestrians. Indeed, provided similarity measures of a certain pedestrian against the others, rank 1 re-identification requires just the maximum similarity computation, so saving computational times. In case C (see Table 3 and Fig. 6), by the use of global feature, re-identification rate strongly improves for all d values. Conversely,

d

Rank 1

Rank 2

Rank 3

Rank 4

Rank 5

Rank 6

Rank 12

8 12 16 20 22 24 26 28

35.77 37.60 39.83 40.33 40.79 41.7 41.20 42.44

47.43 48.68 50.60 50.99 54.25 54.26 53.55 54.63

53.17 55.41 57.08 58.47 60.82 60.40 59.28 60.66

56.75 60.12 61.67 64.86 64.85 64.64 64.72 65.13

59.84 62.66 64.56 67.99 68.21 67.96 67.78 68.28

62.49 64.81 66.27 70.16 70.79 70.06 69.71 70.33

72.51 74.80 77.65 79.81 78.80 77.35 78.78 79.58

re-identification rate slightly decreases in case D (see Table 4) where our descriptor based on symmetry is combined with the global feature. In practice, global feature and our local descriptors are able to recognize different pedestrian classes, but the effect of global features is stronger, letting re-identification percentage results to be a bit lower. From the CMC curve in Fig. 7, it can be observed that the trend is quite similar for values of d A f22; 26; 28g. Moreover, setting d¼ 22, just 88% ASIFT keypoints are adopted to represent a pedestrian, as shown in Fig. 2b. Therefore, 12% of ASIFT keypoints can be discarded for each pedestrian reducing the number of keypoints required for the reidentification process.

4.4. Experimental results: ratio test wrt graph test Taking into account also structural relations through Eq. (10) a better re-identification rate is obtained, like shown in Fig. 8. The increase of re-identification rate due to the graph representation is observed until rank equals 35, while for higher rank values both curves tend to be fairly similar. This experiment shows that discriminative power of the salient features could be improved considering structural relations (Table 4). Nevertheless, computational cost clearly increases, since graph matching requires ½ðMNÞ þ DðMNÞ operations instead of MD merely required by ratio test, as discussed in Section 3.2

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

8

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4.5. Experimental results: local features wrt global features

greater details about recognition percentages of rank 1 and 12 over distinct parameters configuration. As shown, recognition percentage for rank 1 and 12 peaked in the α ¼0.6 and β ¼0.4, accounting 44.54% and 79.82%, respectively.

Further experiments have also been conducted to analyze the sensibility of the method against α and β parameters. These parameters, as shown in Eq. (11), give different emphasis to global and local features. In particular, case D, by using of graph pedestrian representation and setting d ¼22, has been analyzed varying the values of parameters α and β. Plot in Fig. 9, shows CMC curves for distinct values of parameters α and β. Each curve is related to distinct test running with a particular parameter configuration. Moreover, best results are achieved setting α ¼0.6 and β ¼0.4, giving quite similar emphasis to global and local features. Nevertheless, recognition percentages dramatically decrease for lower values of α; for this reason global features could not be avoided. Finally, the bar chart (Fig. 10) provides

Finally, we compare our approach setting d ¼22, using our local descriptor based on symmetry and taking into account structural relations through graph pedestrian representation against those achieved by state-of-the-art methods. Specifically, Farenzena et al. [6] propose SDALF, an appearance method that weighs appearance information extracted from different body parts, in agreement with their distance from the symmetry axis calculated over the entire pedestrian image. Zheng et al. [32] tackle the MvsS case

Fig. 9. CMC curve on varying of parameters α and β. (For the interpretation of the reference to color in this figure caption, the reader is referred to the web version of this paper.)

Fig. 11. Comparison with state of art results in terms of CMC curves. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

4.6. Experimental results: our approach wrt state-of-the-art

Fig. 10. Recognition percentages of Rank 1 and 12 on varying of parameters α and β. (For the interpretation of the reference to color in this figure caption, the reader is referred to the web version of this paper.)

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

9

analyzing different types of visual features: Concatenated Histogram, Concatenated Histogram SIFT and CRRO-BRO. As shown in the black curve of Fig. 11, re-identification rate accounts for exactly 44.5% for rank 1 re-identification, producing results quite similar to our previous approach in red where d ¼ 1 and SDALF in green. Moreover, our method outperforms Concatenated Histogram SIFT in blue, Concatenated Histogram in cyan and CRRO-BRO in magenta. However, for rank values greater than 3, with a reduction of 12% of the number of ASIFT keypoints adopted to represent each pedestrian, outperforms the results achieved when all ASIFT locations have been adopted, and for certain rank values, also SDALF accounting for 71.71% re-identification rate, achieved for rank ¼6, against 70% reached by SDALF. Analogous results are shown in Fig. 12 also in terms of SDR curves. 4.7. Examples of reidentification successes/failures

Fig. 12. Comparison with state of art results in terms of SDR curves. (For the interpretation of the reference to color in this figure caption, the reader is referred to the web version of this paper.)

There are several problems which can affect the reidentification process, such as illumination change, pose variation, occlusion and scaling making the re-identification problem very hard. The proposed approach works also under these conditions, reaching a re-identification success on the cases shown below.

Fig. 13. Illumination change.

Fig. 14. Pose variation.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

10

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Fig. 15. Occlusion.

Fig. 16. Scaling.

Furthermore, our method succeeds in the sequence of images shown in Fig. 13 where there is clearly illumination change on the scene and also in the sequences Figs. 14-16 under conditions of pose variation, partial occlusion and scaling, respectively. However, the method frequently fails when more problems occur in the same scene. As instance, a re-identification failure appears when there are both strong illumination change and pose variation, as it can be observed in the sequence Fig. 17. Finally, the approach also fails under full occlusions as the cases shown in Figs. 18 and 19. Further knowledge, like temporal information should be necessary to face this kind of problem, where the target is completely occluded.

5. Conclusions

Fig. 17. Pose illumination changes.

We report a new feature detector and descriptor based on ASIFT enriched by local symmetries densely detected across image and scale space, collected together in a graph representation. ASIFT locations are selected in agreement with the distance from the symmetry axis in order to discard less informative keypoints and to reduce computation time required for re-identification task. The basic features are designed for finding correspondence

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

11

Fig. 18. Pose variation strong occlusion.

Fig. 19. Strong occlusion illumination change.

between difficult image pairs rich in symmetries, while the graph representation is able to catch structural relations. A graph matching method aimed at solving the point-set correspondence problem, takes into account relative structural and geometrical measurements. The results on the i-LIDS MCTS dataset demonstrate that the presence of ASIFT feature enriched by local symmetries and structure of an individual is very informative for re-identification. The idea appears completely new and promising, giving a new perspective to further improvements in terms of accuracy, mainly for what concerns the tuning of parameter values and a sensitivity analysis.

Conflict of interest statement None declared. References [1] M. Bauml, K. Bernardin, M. Fischer, H.K. Ekenel, Multi-pose face recognition for person retrieval in camera networks, in: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Boston, MA, USA, 2010. [2] D. Cabrini Hauagge, N. Snavely, Image matching using local symmetry features, in: Proceedings of IEEE, CVPR 2012, 2012. [3] Y. Cai, M. Pietikäinen, Person re-identification based on global context, in: Proceedings of ACCV Workshop, 2010.

[4] G. Csurka, C.R. Dance, L. Fan, J. Williamowski, C. Bray, Visual categorization with bags of keypoints, 2004. [5] V. Di Gesú, B. Zavidovique, The s-kernel: a measure of symmetry of objects, Pattern Recognit. 40 (3) (2007) 839–852. [6] M. Farenzena, L. Bazzani, A. Perina, V. Murino, M. Cristani, Person reidentification by symmetry-driven accumulation of local features, in: Proceedings of IEEE, CVPR 2010, 2010. [7] D. Filliat, A visual bag of words method for interactive qualitative localization and mapping, IEEE Trans. Robot. Autom. (2007) 3921–3926. [8] D. Gray, S. Brennan, H. Tao, Evaluating appearance models for recognition, reacquisition, and tracking, in: IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), 2007. [10] S. Kondra, A. Petrosino, S. Iodice, Multi-scale kernel operators for reflection and rotation symmetry: further achievements, in: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW '13), IEEE Computer Society, Washington, DC, USA, 2013, pp. 217–222. [11] S. Kondra, A. Petrosino, Self-similarity and points of interest in textured images, in: Perception and Machine Intelligence, 2012, pp. 306–313. [12] S. Kondra, V. Torre, Texture classification using three circular filters, in: 6th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), Bhubaneshwar, India, 2008. [13] C.-H. Kuo, C. Huang, Inter-camera association of multi-target tracks by on-line learned appearance affinity models, in: Proceedings of ECCV 2010, Hersonissos, Crete, Greece, 2010. [14] R. vanLier, J.S. Wagemans, From images to objects: global and local completion of self-occluded parts, J. Exp. Psychol.: Hum. Percept. Perform. 25 (1999) 1721–1741. [15] Y. Liu, H. Hel-Or, C. Kaplan, L.V. Gool, Computational symmetry in computer vision and computer graphics, Found. Trends Comput. Graph. Vis. 5 (1–2) (2010) 1–195.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

12

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

[16] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2) (2004) 91–110. [17] R. Mazzon, S. Fahad Tahir, A. Cavallaro, Person re-identification in crowd, Pattern Recognit. Lett., 33 (14) (2012) 1828−1837. [18] J. Neira, J.D. Tardos, Data association in stochastic mapping using the joint compatibility test, IEEE Trans. Robot. Autom. 17 (6) (2001) 890–897. [19] S.R. Palmer, The role of symmetry in shape perception, Acta Psychol. 59 (1985) 67–90. [21] B. Prosser, S. Gong, T. Xiang, Multi-camera matching using bi-directional cumulative brightness transfer functions, in: Proceedings of the British Machine Vision Conference, Leeds, UK, 2008. [22] B. Prosser, W.S. Zheng, S. Gong, T. Xiang, Person re-identification by support vector ranking, in: Proceedings of the British Machine Vision Conference, Aberystwyth, UK, 2010. [24] G. Sanroma, R. Alquzar Mancho, F. Serratosa i Casanelles, A new graph matching method for point-set correspondence using the EM algorithm and softassig, Comput. Vis. Image Underst. 116 (2) (2012) 292–304.

[26] U.H. Office, i-LIDS Multiple Camera Tracking Scenario Definition, 2008. [27] G. Yu, J.-M. Morel, ASIFT: An Algorithm for Fully Affine Invariant Comparison, Image Processing On Line. http://dx.doi.org/10.5201/ipol.2011.my-asift, 2011. [28] P. Wenderoth, The salience of vertical symmetry, Perception 23 (1994) 221–236. [29] C. Wilson, A.R. Hicklin, M. Bone, H. Korves, P. Grother, B. Ulery, R. Micheals, M. Zoep, S. Otto, C. Watson, Fingerprint Vendor Technology Evaluation 2003: Summary of Results and Analysis Report, Technical Report NISTIR 7123, NIST, 2004. [30] H. Zabrodsky, Symmetry—A Review, Technical Report 90–16, CS Dept., The Hebrew University of Jerusalem, 1990. [32] W.S. Zheng, S. Gong, T. Xiang, Person re-identification by probabilistic relative distance comparison, in: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 2011.

Sara Iodice received the Master Degree in Computer Science from the University of Naples Parthenope. Currently, she is a Fellow Researcher at the CVPRLab of the University of Naples Parthenope. Her research interests are data mining and knowledge discovery, pattern recognition, soft computing, spatio temporal data analysis. She is a student member of the IEEE Computer Society.

Alfredo Petrosino is a professor of Computer Science at the University of Naples Parthenope, where he heads the research laboratory CVPRLab at University of Naples Parthenope (cvprlab.uniparthenope.it). He held positions at University of Salerno, International Institute of Advanced Scientific Studies (IIASS), National Institute for the Physics of Matter (INFM), and lastly as a Researcher and a Senior Researcher at National Research Council (CNR). He taught at the Universities of Salerno, Siena, Naples Federico II, Naples Parthenope. He is a Senior member of the IEEE, USA, a member of the International Association for Pattern Recognition, USA, International Neural Networks Society, USA. He co-edited six books and more than one hundred research publications in the areas of Computer Vision, Image and Video Analysis, Pattern Recognition, Neural Networks, Fuzzy and Rough Sets, Data Mining. He is/was an Associate Editor of Pattern Recognition journal; a Member of the Editorial Board of Pattern Recognition Letters, International Journal of Knowledge Engineering and Soft Data Paradigms; an Editor of IEEE Trans SMC-Part A, Fuzzy Sets and Systems, Image and Vision Computing, Parallel Computing.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identification, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i