- Email: [email protected]

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/pr

Salient feature based graph matching for person re-identiﬁcation Sara Iodice 1, Alfredo Petrosino n,1 Department of Science and Technology, University of Naples Parthenope, Centro Direzionale Is C4, 80143 Naples, Italy

art ic l e i nf o

a b s t r a c t

Article history: Received 2 January 2014 Received in revised form 12 September 2014 Accepted 15 September 2014

We propose a person re-identiﬁcation non-learning based approach that uses symmetry principles, as well as structural relations among salient features. The idea comes from the consideration that local symmetries, at different scales, also enforced by texture features, are potentially more invariant to large appearance changes than lower-level features such as SIFT, ASIFT. Finally, we formulate the re-identiﬁcation problem as a graph matching problem, where each person is represented by a graph aimed not only at rejecting erroneous matches but also at selecting additional useful ones. Experimental results on public dataset i-LIDS provide good performance compared to state-of-theart results. & 2014 Elsevier Ltd. All rights reserved.

Keywords: Computational symmetry Salient features Graph representation and matching People re-identiﬁcation

1. Introduction Symmetry detection is highly relevant in pattern recognition. Indeed, the description of a ﬁgure may be different when it is embedded in a context with horizontal or vertical symmetry [19]. Besides, in tasks requiring the completion of partially occluded visual stimuli, subjects tend to produce systematically symmetrical [14]. The concept of symmetry is not univocal: various kinds of properties of an image are deﬁned as symmetry [30,28]. As instance, a ﬁgure has rotational symmetry when it can be rotated less than 3601 around its central point, or axis, and still matches the original ﬁgure. This cue is peculiar in person re-identiﬁcation where the problem consists in recognizing people in different poses from images coming from distinct cameras. This is an important task in the video surveillance, where large and structured environments must be supervised (such as airport, metro, station or shopping centres) and it becomes more critical when the cardinality of gallery set increases. Symmetry is adopted in [6], a person re-identiﬁcation method that weighs appearance information, extracted from different body parts, in accordance with their distance from symmetry axes computed on the whole ﬁgure. Therefore, symmetry appears to have been used as a global property on this previous work, not as a local one. Conversely, symmetry is adopted as both global and local property in our approach. In particular, the global symmetry axis is exploited to select salient locations representing each n

Corresponding author. E-mail addresses: [email protected] (S. Iodice), [email protected] (A. Petrosino). 1 Tel.:/fax: þ 39 0815476656.

pedestrian, while local symmetry is adopted like local feature in order to describe each salient location. Researchers in computer vision have made signiﬁcant progress in representing and detecting symmetries in images and other types of data [15]. However, there has been relatively little work on using local symmetries as explicit features for matching tasks [2], and no work about matching the same pedestrian with different poses, like in re-identiﬁcation. The main idea is to use a variety of symmetries, rather than repetitions, together with texture as cues, in order to deﬁne a complex feature detector, based on the consideration that, at different scales, the symmetry is potentially more invariant to large appearance changes than lower-level features such as SIFT, and, when combined with a texton-based feature, is highly discriminative [11]. In this context, the main contribution resides in the image features and the adopted feature organization for matching. The discriminative power could be more appreciated if the features are organized in an appropriate manner. Most of the researchers focus their attention on the features extracted in many ways from the scene. The Harris and SIFT [16] features are important to identify what is distinctive and discriminative for the purpose of a correct recognition of the scene. The bag of words algorithm has been applied to SIFT descriptors, to identify discriminative combinations of descriptors [4]. In [18] the application of a clustering to descriptors leads to results which are less distinctive in a large cluster than those in a small cluster. For example in indoor navigation, window corners are common, so they are not good features to uniquely identify scenes, while corners found on posters or signs are much better. In [7] an effective approach based on real-time loop detection has proved to be efﬁcient using a hand-held camera, through SIFT features and intensity and hue

http://dx.doi.org/10.1016/j.patcog.2014.09.011 0031-3203/& 2014 Elsevier Ltd. All rights reserved.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

histograms combined using a bag of words approach. Up to now none of the existing approaches tackled the relations among features in terms of similarity and spatial homogeneity. Our main contribution consists in introducing an approach based on graph-based representation, according to which regions with their corresponding feature vector and the geometric relationship between these regions are encoded in the form of a graph. According to the idea that an image can be described at the higher level in terms of a nested hierarchy of local symmetries, we present a novel graph matching approach to the problem aimed at evolving an initial set of correspondences computed with the local features, as a kind of compromise between the constraints imposed by both the local features and the structural relations. The vertices are the image key points detected by the SIFT (speciﬁcally we adopt ASIFT [27]), enriched by features based on color, texture, and local symmetry. The cost of an edge joining two vertices represents a measure of their dissimilarity. Therefore, the problem of person re-identiﬁcation is formulated as a graph matching problem. Our approach is an appearance-based method which differs from the state-of-the-art: (i) unlike [3,21] we give great weight to local features; (ii) we do not adopt spatio-temporal information such as [17,13]; (iii) our method is non-learning based one, unlike [32,1,22]. The paper is organized as follows. Section 2 deals with pedestrian representation. Section 3 provides details about how pedestrian representations are compared each other in order to solve re-identiﬁcation task. Section 4 reports testing results on public dataset ILIDS and ﬁnally in Section 5 some conclusions are drawn.

2. Pedestrian representation The main steps to represent a pedestrian for recognition purposes using visual features can be listed as below: 1. 2. 3. 4.

Candidate keypoints extraction Keypoints ﬁltering by symmetry axis Local feature extraction Graph representation

correlation. Therefore, under the hypothesis that there are no high pose variations, the symmetry axis detected by algorithm still remains unchanged. As instance, as shown in Fig. 1, if two pixels p1 and p2 belonging to the pedestrian in the frame A are symmetric with respect to the reﬂective symmetric axis RFSa and are located at a distance d from RFSa, they continue to be exactly symmetric also in the frame B, where the pedestrian changes his pose and is located at the 0 distance d d. For this reason, ﬁltering keypoints ASIFT located at a distance x r d, the probability to keep matched keypoints clearly increases and the number of keypoints, candidate for the graph representation, decreases. Another main aspect is that candidate keypoints, with respect to the distance d from the symmetry axis, are roughly located within the foreground and, consequently, background subtraction for ﬁgure/background separation could be avoided. 2.2. Keypoints distribution around symmetry axis To extract information about the density distribution of candidate keypoints around the symmetry axis, the distances dj of the generic keypoint kj from the reﬂection symmetry axis RFSi of the region (head, trunk and legs) to which kj belongs, are computed for each frame i ¼ 1; …; n. For the generic distance d we estimate

The probability density p(d) that describes the probability of

ASIFT keypoints to be located exactly at a distance d from the RFS reﬂection symmetry axis. The cumulative distribution P(d) that describes the probability of ASIFT keypoints to be located at a distance x r d from RFS.

The plots shown in Fig. 2 are built considering all the frame of ILDS MCTS public dataset. Moreover, the cumulative distribution accounts for 58.29% for d ¼12, underlining that a wide set of ASIFT keypoints is located closer to the reﬂection symmetry axis. On the other hand, from the plot in Fig. 2a, it can be observed that the probability density decreases for values of d Z3. Therefore, ASIFT keypoints, located at a greater distance from the reﬂection symmetry axis, are clearly less informative and should be

First of all, candidate keypoints are extracted from each pedestrian image, ﬁltered according to their distance from the symmetry axis and then local features are computed on each selected location. Finally, each pedestrian image is represented by a graph, in order to consider structural relations. 2.1. Keypoints ﬁltering by symmetry axis ASIFT locations [27], fully invariant with respect to zoom, rotation, translation, and the angles deﬁning the camera and axis orientation, are extracted from each frame of the dataset as candidate keypoints. The property of ASIFT features to be invariant to afﬁne transformations is very attractive in person re-identiﬁcation problem, where pedestrian ﬁgures, subject to viewpoints and pose variations, need to be represented for recognition aim. On the other hand, the property of two pixels, belonging to a certain region, to be symmetric, with respect to a particular axis, remains unchanged when an afﬁne transformation is applied to the same region. Thus, reﬂective symmetry should be considered invariant to afﬁne transformations. We adopt our multi-scale symmetry detection algorithm [10] to extract the symmetry axis corresponding to the region and the direction with maximal

Fig. 1. (a) Frame A, (b) Frame B.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

3

vector of length 128 including information about the local texture on image, using afﬁne invariant intrinsic neighborhoods and afﬁne invariant intrinsic orientation matrices. 2.3.2. Local and multiscale symmetry We speciﬁcally take advantage of a measure obtained by using correlation with the ﬂipped image around a particular axis. Indeed, in any direction, the optimal symmetry axis corresponds to the maximal correlation of a pattern with its symmetric version [5]. In particular, an enhanced version of the Kondra and Petrosino symmetry detection algorithm, already reported at the “Symmetry Detection from Real World Images” competition at IEEE CVPR 2011 and reported in [10], is used to extract measures of symmetry. Our approach was recipient of the Award as Winner of the Reﬂection Symmetry at the 2013 Symmetry Detection from Real World Images competition held at CVPR 2013. Let S be the side of the patch (scale) extracted from pedestrian images and centred at each salient location. Suppose that S assumes different values in the range ½a; b. At each scale S, plane rotations of the image are computed with directions in the set D ¼ di : i ¼ 1‥nd . Let pji be the i-th patch extracted from image after rotating the image plane in direction j. The correlation measure cji between the patch pji and its reﬂected version p0;j i is computed as follows: cji ¼

ðpji ○p0;j i Þx;y W x;y ¼ 1 S

∑

ð1Þ

where W¼

S

∑ ðpji Þ2x;y

x;y ¼ 1

j 0;j and ðpji ○p0;j i Þx;y is a matrix with elements given by ðpi ○pi Þx;y ¼

ðpji Þx;y ðp0;j i Þx;y : For color images, the patch is reﬂected with respect to the three bands before doing the correlation. Furthermore, KIMA A R3 are measures of symmetry related to three different values of scale S, obtained by selecting maximum correlation value in any direction D. Given the values of S, each component of KIMA for pji is computed as follows: n o ð2Þ KIMA½S ¼ maxj cji Fig. 2. (a) Probability density function p(d), (b) cumulative distribution function P(d).

discarded, thus reducing the number of keypoints required for the re-identiﬁcation process. 2.3. Local feature extraction Each keypoint kj, selected with respect to the symmetry axis, is enriched by visual features (local symmetry, texture and color) resulting in a ﬁnal feature vector f j A R147 . Each feature is normalized by a whitening process (mean zero and variance one). Furthermore, the feature vector fj is arranged as follows: fASIFT A R132 ; KIMA A R3 ; TIMA A R3 ; TEXTON A R6 ; RGB A R3 g where the descriptors are deﬁned as follows. 2.3.1. ASIFT descriptor The ASIFT descriptor is deﬁned like P i ¼ ðX i ; Ri ; U i Þ, where X i ¼ ðxi ; yi Þ is its 2D location, Ri ¼ ðr i ; αi Þ its scale and orientation respectively (in radians from π to π ) and U i ¼ U i;1 ; …; U i;128 a

Besides, TIMA A R3 are the directions corresponding to the maximum value of correlations for the same values of scale. Given the value of S, each component of TIMA for i-th patch is computed as follows: n o TIMA½S ¼ argmaxj cji ð3Þ We selected S in the set f2; 4; 8g in our experiments. 2.3.3. Texture description For each keypoint kj, textural ﬁlter responses are computed (TEXTON A R6 in the feature vector). Texture features are based on the technique proposed in [12]. Speciﬁcally, a two-dimensional patch is considered for each interest point. The method uses 3 spatial ﬁlters of dimension 13 13. The ﬁrst is a Gaussian ﬁlter with σ ¼ 0.5, the second is a LoG ﬁlter with σ ¼ 0.7. Finally, the third ﬁlter is of the form πτt t2 =2σ2 e ð4Þ Fðr; σ ; τÞ ¼ cos

σ

where σ is the sigma/standard deviation of the Gaussian, τ is the frequency of sine wave. As can be observed, ﬁlter response grows

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

if τ 4 σ , while decreases if τ o σ . However, parameters have no inﬂuence if τ ¼ σ. In our case, σ and τ are equal to 2. Each two-dimensional patch is convoluted with these ﬁlters, later thresholded so as to keep only the positive responses. The obtained binary image is denominated the “on” response. Thus, three binary images were obtained, and then one for each ﬁlter. The “on” responses are further convoluted with average ﬁlters, respectively of size 3 3 and 6 6, so as to obtain a multi-scale representation. At the next step, images are converted to grayscale and the intensity is normalized to have zero mean and variance equal to one. All ﬁlters have been normalized using a L1 normalization, so that the responses of each ﬁlter are in the same range, i.e. each ﬁlter Fi is divided by the J F i J .

2.3.4. Color description The last three components of the features vector fj are the average values of a patch pi centred at kj of side K (K ¼4 for our experiments) for the three channels R,G,B. R,G,B space is adopted to describe local chromatic content since Euclidean distance, used to compare local features should not be used for H,S,V space. However, we exploited the H,S,V color histogram to encode global color content, better than the R,G,B to deﬁne similarity between images.

Fig. 3. Examples of graph representation.

2.4. Salient feature graph

3. Pedestrian matching

Each salient location detected by ASIFT [27] and ﬁltered out by the symmetry axis is enriched by the features described above. Each pedestrian image is represented by a k-nearest neighbor graph, in order to consider structural relations. So the problem of person re-identiﬁcation becomes a problem of graph matching. Usually a Graph G is a 3-tuple G ¼ ðV; E; ZÞ where V is a set of vertices, E D V V is a set of edges, where e ¼ ðvi ; vj Þ A E is edge joining nodes vi , vj A V, and Z is a set of vectors, where zi A Z is the vector of attributes associated to node vi A V: We build the graph for each pedestrian image as follows:

This section deals with how to match pedestrian representations to solve re-identiﬁcation task. Two measures of distances are adopted. The former is based on the number of keypoints positively matched according to the ratio test, while the latter takes into account also structural relations as a result of a graph matching.

3.1. Ratio test A feature vector fAi from a source image A is positively matched to a feature vector fBk from a destination image B, if the distance

1. Vertices are the image key points detected by the ASIFT algorithm. We associate to each vertex a feature vector fj as described in Section 2.3. 2. Two vertices vi and vj are connected by an edge if the Euclidean distance between vi and vj is among the k-th smallest distances from vi to other n vertices. The edge cost represents a measure of the distance between vertices.

and

pﬃﬃﬃ In our experiments we set k ¼ 2 n, where n is the number of ASIFT locations selected at current frame. In Fig. 3, examples of knearest neighbor graph representing two pedestrians are shown; the vertices are represented in red while edges joining two vertices are in blue.

where distðÞ is the Euclidean distance, f k2 is the descriptor of destination image with the second smallest distance from fAi and 0 o p r 1 is a ratio value controlling the tolerance to false positives. Therefore, a measure of distance dRATIO to compare pedestrians is computed, starting form the number of feature vectors positively matched between the source image A and the destination image B.

2.5. Weighted color histogram

3.2. Graph matching

HSV histograms is adopted like global feature to encode all the chromatic content of each part of the pedestrian. Moreover, the weighted histogram [6] is built, taking into consideration the distance from the d symmetry axis. Each pixel is weighted by a one-dimensional Gaussian kernel N μ; σ , where μ is the y coordinate of the symmetry axis, and σ is a standard deviation set to J/4. So doing, pixel values near the symmetry axis count more in the ﬁnal histogram.

One main point is how to measure the contribution of matching one node to another with regards to the structural relations. We adopt a revised version of the algorithm reported in [24]. The distance between the vertices of two graph is obtained through the combination of two measures: the measure of similarity and the measure of consistency. The former is the Euclidean distance between the feature vectors; the latter takes into account also the links between the vertices.

A

B

A

B

A

B

distðf i ; f k Þ ¼ minðdistðf i ; f j ÞÞ;

distðf i ; f k Þ A B distðf i ; f k2 Þ

o ρ;

j ¼ 1; …; m;

ð5Þ

0 op r 1

ð6Þ B

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

The algorithm for matching two graphs GD ¼ ðV D ; ED ; Z D Þ and GM ¼ ðV M ; EM ; Z M Þ, having m and n vertices respectively, is the following:

þ β dist RATIO ðSET ðAÞ; SET ðBÞÞ

To initialize the matching matrix Sð1Þ , the ratio test procedure has been applied to the vertices of the two graphs. It requires about N N D operations, where D is the length of feature vectors (in our case D ¼147). Let us consider Q i;j as a measure of structural consistency between vi and vj, while Ri;j is their measure of similarity. Then, each element W i;j of matrix Ω is computed as ð7Þ

More formally, for the match vi A GD -vj A GM , Q ði;jÞ is given by " # Q i;j ¼ exp μ ∑

∑ ADði;kÞ AMðj;lÞ Sk;l

ð8Þ

k A Vdl A Vm

where ADði;kÞ and AMðj;lÞ are the adjacency matrices of GD and GM, respectively (i.e., ADði;kÞ ¼ 1 if there is an edge joining vi and vk; similarly, AMðj;lÞ ¼ 1 if there is an edge joining vj and vl) and Sk;j is an element of the N M current matching matrix S and μ 40 is a control parameter to avoid local minima. The matching matrix S is a binary matrix deﬁning an injective mapping between two graphs GD and GM, such that an element Si;j A S is set to 1 if node vi A V D is matched to node vj A V M and 0 otherwise. Whereas, Ri;j is given by Ri;j ¼

1 D

M

distðf i ; f j Þ

ð9Þ

D where fD i and fj are the feature vectors related to vi A GD and vj A GM , respectively. The computation of matrix Ω requires about ðMN þDÞðMNÞ operations, where D is the size of the feature vector. At the last step, the maximum element of each row of Ω is found and then the corresponding element of matrix S is activated. This step requires about mðn 1Þ operations. In brief, the measure of similarity between two vertices will be higher, if they are not only close in the features space, but they have also similar vertices, as neighbors. In our experiments, we set the value of ρ ¼0.8 (to control the acceptance rate) and the value of control parameter μ ¼ 0.15. Finally, a measure of distance dGRAPH is obtained from the number of nodes positively matched between two pedestrian.

3.3. Final distance measures The ﬁnal distance measures used to compare pedestrians are dist 1 ðA; BÞ ¼ α dist WH ðWH ðAÞ; WH ðBÞÞ þ β dist GRAPH ðGRAPH ðAÞ; GRAPH ðBÞÞ

Conversely, Eq. (11) does not take into account structural relations and allows us to evaluate our feature vectors saving computation time. dist 2 ðA; BÞ ¼ α dist WH ðWH ðAÞ; WH ðBÞÞ

1. The matching matrix is initialized at the ﬁrst iteration Sð1Þ . 2. The matrix Ω of combined coefﬁcients is computed. 3. The matching matrix S is updated.

W i;j ¼ Q i;j Ri;j

5

ð10Þ

where WH ðÞ is Weighted Color Histogram computed taking into consideration the distance of each pixel to the symmetry axis, as described in Section 2.5. GRAPH ðÞ is the graph representation of the pedestrian image deﬁned in Section 2.4, distWH is the Bhattacharyya distance used to compare histograms, distGRAPH is the graph distance described in Section 3.2. Finally, α and β are weighting values adopted to give different emphasis to distWH and distGRAPH.

ð11Þ

It needs just MD operations against the ½ðMNÞ þ DðMNÞ operations roughly required by Eq. (10). Moreover, SET ðÞ is the salient locations set, while distRATIO is the distance measure computed from the number of keypoints positively matched in accordance with the ratio test deﬁned in Section 3.1

4. Experiments and analysis Our approach has been evaluated by comparing the reidentiﬁcation task on the i-LIDS MCTS dataset [26]. The results of re-identiﬁcation performance are in terms of Cumulative Matching Characteristic (CMC) and Synthetic Disambiguation or Reacquisition Rate (SDR) curves. CMC [8] represents the re-identiﬁcation rate of the system in function of rank position. Such measure is analogous to the Roc Curve for detection problem [29]. On the contrary, the SRR or SDR (Synthetic Disambiguation or Reacquisition Rate) converts a performance metric of size N to one of size M, showing the probability that any of the M best matches is correct. We provide four test cases in order to compare local and global features to evaluate the contribution of our descriptor based on symmetry and texture. Finally, our method is compared with results achieved by stateof-the-art methods on the same i-LIDS MCTS dataset with the same performance measures. 4.1. Dataset The performance of our method is evaluated on the i-LIDS MCTS dataset, corresponding to a real scenario observed by a multi-camera CCTV network monitoring the arrival hall of an airport. The i-LIDS MCTS dataset contains 479 images of variable size, for a total of 119 pedestrians. Each pedestrian is characterized by at least two images representing the same person wearing the same clothes. Such scenario is affected by several problems like illumination changes, occlusions, shape deformation and image blurring that make challenging the re-identiﬁcation process.

4.2. Experimental setup To evaluate our approach, MvsS experiment is performed. According to the MvsS experiment one image from each person is randomly chosen to set up the gallery set G, while the probe set P is composed by the remaining images. Each image pi A P is compared with each image g i A G in order to evaluate the matching or score distance. We repeat the experiment MvsS as described above 100 times in order to build the CMC curve. Four test cases are designed to quantify the contribution of our descriptor based on symmetry and texture and compare local and global features. Cases differ from each other for the kind of descriptor used for pedestrian representation.

Case A: ASIFT descriptor is adopted as local feature. Case B: ASIFT descriptor enriched by symmetry and texture is used as local feature.

Case C: global feature (Weighted Color Histogram) in combination with local features based just on ASIFT descriptors is adopted.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

Case D: global feature (Weighted Color Histogram) in combination with local features based on ASIFT descriptor enriched by symmetry and texture is adopted. As it appears, just local features are used in cases A and B, while global features in combination with local features are adopted in cases C and D. Furthermore, our descriptor based on symmetry and texture is used in cases B and D. Experiments for each test case are performed changing the value of parameter d. Moreover, according to our approach only the keypoints kj located at distance x r d from the reﬂection symmetry axis RFS are adopted for pedestrian representation. Eq. (11) is adopted to evaluate the matching distance and establishes the best feature combination, saving computation time. Finally, Eq. Table 1 Case A. d

Rank 1

Rank 2

Rank 3

Rank 4

Rank 5

Rank 6

Rank 12

8 12 16 20 22 24 26 28

8.94 12.44 13.69 15.72 16.41 17.42 17.48 18.56

12.96 17.58 18.67 22.09 22.71 23.39 24.59 24.34

16.12 20.99 22.54 25.42 27.01 26.61 28.26 27.78

18.81 23.85 26.05 29.39 29.97 29.21 30.89 31.19

21.08 25.89 29.02 32.82 32.45 31.66 33.34 33.73

23.23 27.58 31.54 35.79 34.24 33.84 36.24 36.25

31.85 36.19 42.45 46.17 42.53 42.89 44.37 46.09

(10) is applied to take into account also structural relations and reach a best re-identiﬁcation rate.

4.3. Experimental results: A, B, C, D cases Experimental results on test cases described above are shown in this section. Table 1 contains re-identiﬁcation percentages of rank f1; 2; 3; 4; 5; 6; 12g for different values of d in the test case A. Images A and B, depicting the same pedestrian, produce rank k re-identiﬁcation, if B is correctly identiﬁed as A when B is at the k-smallest distance from A. The chart 4 shows the changes of recognition rates over different values of d parameter in terms of CMC curve. The vertical axis of the graph provides the recognition percentages from distinct experiments performed using various sets of keypoints for pedestrian representation, obtained changing the d parameter value. Each color is related to distinct test running with a speciﬁc value of the parameter d. Moreover, the set of keypoints becomes smaller when the value of d decreases, improving the computation time required by the matching. From the plot in Fig. 4, it can be observed that the trend of the CMC curve is fairly similar for values of d A f20; 22; 28g, while recognition percentage dramatically decreases for lower values of d. Table 2 shows recognition percentages for rank f1; 2; 3; 4; 5; 6; 12g with distinct values d A f8; 12; 16; 20; 22; 24; 26; 28g for the test case B, while the plot in Fig. 5 provides the same results in terms of CMC

Fig. 4. Case A: ASIFT descriptor is adopted as local feature. (For the interpretation of the reference to color in this ﬁgure caption, the reader is referred to the web version of this paper.)

Fig. 5. Case B: ASIFT descriptor enriched by symmetry and texture is used as local feature. (For the interpretation of the reference to color in this ﬁgure caption, the reader is referred to the web version of this paper.)

Table 2 Case B.

Table 3 Case C.

d

Rank 1

Rank 2

Rank 3

Rank 4

Rank 5

Rank 6

Rank 12

d

Rank 1

Rank 2

Rank 3

Rank 4

Rank 5

Rank 6

Rank 12

8 12 16 20 22 24 26 28

10.63 12.60 15.45 18.05 16.52 17.48 18.23 18.85

14.66 16.92 20.30 22.74 22.03 23.78 24.62 26.20

18.39 19.82 23.26 26.01 26.05 27.56 28.05 29.79

20.53 22.27 25.53 28.91 29.46 30.17 30.72 31.58

22.01 24.57 27.20 31.13 31.95 32.21 32.63 33.26

23.42 26.47 28.64 32.97 34.02 34.17 34.24 35.04

31.81 37.19 36.66 42.48 43.89 43.40 44.58 45.45

8 12 16 20 22 24 26 28

36.91 38.04 40.11 41.89 41.31 40.40 41.47 42.11

48.45 49.40 51.50 53.78 53.82 52.92 54.89 55.28

53.25 56.70 58.97 59.64 59.57 59.55 60.70 61.05

56.86 61.13 63.66 64.26 64.18 63.69 65.17 64.49

59.97 64.27 66.09 67.36 67.88 66.84 68.67 68.07

62.59 66.53 68.03 69.82 70.49 69.98 71.74 70.94

73.75 76.23 77.36 79.97 80.49 80.41 80.87 81.60

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Fig. 6. Case C: global feature (Weighted Color Histogram) in combination with local features based just on ASIFT descriptors is adopted. (For the interpretation of the reference to color in this ﬁgure caption, the reader is referred to the web version of this paper.)

7

Fig. 8. Ratio test wrt graph test.

Table 4 Case D.

Fig. 7. Case D: global feature (Weighted Color Histogram) in combination with local features based on ASIFT descriptor enriched by symmetry and texture is adopted. (For the interpretation of the reference to color in this ﬁgure caption, the reader is referred to the web version of this paper.)

curves. Cases A and B are compared in order to observe the effect of our descriptor based on symmetry and texture. Furthermore, each cell of Table 1 is compared to the corresponding cell of Table 2 at the same position. The bold type highlights the highest re-identiﬁcation rate for the same kind of recognition and value of the d parameter. It could be observed that case B slightly improves on rank 1 recognition percentage with all values d. This result supports our conjecture that our descriptor, based on symmetry and texture, is highly speciﬁc, useful in a real time context characterized by a high number of pedestrians. Indeed, provided similarity measures of a certain pedestrian against the others, rank 1 re-identiﬁcation requires just the maximum similarity computation, so saving computational times. In case C (see Table 3 and Fig. 6), by the use of global feature, re-identiﬁcation rate strongly improves for all d values. Conversely,

d

Rank 1

Rank 2

Rank 3

Rank 4

Rank 5

Rank 6

Rank 12

8 12 16 20 22 24 26 28

35.77 37.60 39.83 40.33 40.79 41.7 41.20 42.44

47.43 48.68 50.60 50.99 54.25 54.26 53.55 54.63

53.17 55.41 57.08 58.47 60.82 60.40 59.28 60.66

56.75 60.12 61.67 64.86 64.85 64.64 64.72 65.13

59.84 62.66 64.56 67.99 68.21 67.96 67.78 68.28

62.49 64.81 66.27 70.16 70.79 70.06 69.71 70.33

72.51 74.80 77.65 79.81 78.80 77.35 78.78 79.58

re-identiﬁcation rate slightly decreases in case D (see Table 4) where our descriptor based on symmetry is combined with the global feature. In practice, global feature and our local descriptors are able to recognize different pedestrian classes, but the effect of global features is stronger, letting re-identiﬁcation percentage results to be a bit lower. From the CMC curve in Fig. 7, it can be observed that the trend is quite similar for values of d A f22; 26; 28g. Moreover, setting d¼ 22, just 88% ASIFT keypoints are adopted to represent a pedestrian, as shown in Fig. 2b. Therefore, 12% of ASIFT keypoints can be discarded for each pedestrian reducing the number of keypoints required for the reidentiﬁcation process.

4.4. Experimental results: ratio test wrt graph test Taking into account also structural relations through Eq. (10) a better re-identiﬁcation rate is obtained, like shown in Fig. 8. The increase of re-identiﬁcation rate due to the graph representation is observed until rank equals 35, while for higher rank values both curves tend to be fairly similar. This experiment shows that discriminative power of the salient features could be improved considering structural relations (Table 4). Nevertheless, computational cost clearly increases, since graph matching requires ½ðMNÞ þ DðMNÞ operations instead of MD merely required by ratio test, as discussed in Section 3.2

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

8

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4.5. Experimental results: local features wrt global features

greater details about recognition percentages of rank 1 and 12 over distinct parameters conﬁguration. As shown, recognition percentage for rank 1 and 12 peaked in the α ¼0.6 and β ¼0.4, accounting 44.54% and 79.82%, respectively.

Further experiments have also been conducted to analyze the sensibility of the method against α and β parameters. These parameters, as shown in Eq. (11), give different emphasis to global and local features. In particular, case D, by using of graph pedestrian representation and setting d ¼22, has been analyzed varying the values of parameters α and β. Plot in Fig. 9, shows CMC curves for distinct values of parameters α and β. Each curve is related to distinct test running with a particular parameter conﬁguration. Moreover, best results are achieved setting α ¼0.6 and β ¼0.4, giving quite similar emphasis to global and local features. Nevertheless, recognition percentages dramatically decrease for lower values of α; for this reason global features could not be avoided. Finally, the bar chart (Fig. 10) provides

Finally, we compare our approach setting d ¼22, using our local descriptor based on symmetry and taking into account structural relations through graph pedestrian representation against those achieved by state-of-the-art methods. Speciﬁcally, Farenzena et al. [6] propose SDALF, an appearance method that weighs appearance information extracted from different body parts, in agreement with their distance from the symmetry axis calculated over the entire pedestrian image. Zheng et al. [32] tackle the MvsS case

Fig. 9. CMC curve on varying of parameters α and β. (For the interpretation of the reference to color in this ﬁgure caption, the reader is referred to the web version of this paper.)

Fig. 11. Comparison with state of art results in terms of CMC curves. (For interpretation of the references to color in this ﬁgure caption, the reader is referred to the web version of this paper.)

4.6. Experimental results: our approach wrt state-of-the-art

Fig. 10. Recognition percentages of Rank 1 and 12 on varying of parameters α and β. (For the interpretation of the reference to color in this ﬁgure caption, the reader is referred to the web version of this paper.)

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

9

analyzing different types of visual features: Concatenated Histogram, Concatenated Histogram SIFT and CRRO-BRO. As shown in the black curve of Fig. 11, re-identiﬁcation rate accounts for exactly 44.5% for rank 1 re-identiﬁcation, producing results quite similar to our previous approach in red where d ¼ 1 and SDALF in green. Moreover, our method outperforms Concatenated Histogram SIFT in blue, Concatenated Histogram in cyan and CRRO-BRO in magenta. However, for rank values greater than 3, with a reduction of 12% of the number of ASIFT keypoints adopted to represent each pedestrian, outperforms the results achieved when all ASIFT locations have been adopted, and for certain rank values, also SDALF accounting for 71.71% re-identiﬁcation rate, achieved for rank ¼6, against 70% reached by SDALF. Analogous results are shown in Fig. 12 also in terms of SDR curves. 4.7. Examples of reidentiﬁcation successes/failures

Fig. 12. Comparison with state of art results in terms of SDR curves. (For the interpretation of the reference to color in this ﬁgure caption, the reader is referred to the web version of this paper.)

There are several problems which can affect the reidentiﬁcation process, such as illumination change, pose variation, occlusion and scaling making the re-identiﬁcation problem very hard. The proposed approach works also under these conditions, reaching a re-identiﬁcation success on the cases shown below.

Fig. 13. Illumination change.

Fig. 14. Pose variation.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

10

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Fig. 15. Occlusion.

Fig. 16. Scaling.

Furthermore, our method succeeds in the sequence of images shown in Fig. 13 where there is clearly illumination change on the scene and also in the sequences Figs. 14-16 under conditions of pose variation, partial occlusion and scaling, respectively. However, the method frequently fails when more problems occur in the same scene. As instance, a re-identiﬁcation failure appears when there are both strong illumination change and pose variation, as it can be observed in the sequence Fig. 17. Finally, the approach also fails under full occlusions as the cases shown in Figs. 18 and 19. Further knowledge, like temporal information should be necessary to face this kind of problem, where the target is completely occluded.

5. Conclusions

Fig. 17. Pose illumination changes.

We report a new feature detector and descriptor based on ASIFT enriched by local symmetries densely detected across image and scale space, collected together in a graph representation. ASIFT locations are selected in agreement with the distance from the symmetry axis in order to discard less informative keypoints and to reduce computation time required for re-identiﬁcation task. The basic features are designed for ﬁnding correspondence

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

11

Fig. 18. Pose variation strong occlusion.

Fig. 19. Strong occlusion illumination change.

between difﬁcult image pairs rich in symmetries, while the graph representation is able to catch structural relations. A graph matching method aimed at solving the point-set correspondence problem, takes into account relative structural and geometrical measurements. The results on the i-LIDS MCTS dataset demonstrate that the presence of ASIFT feature enriched by local symmetries and structure of an individual is very informative for re-identiﬁcation. The idea appears completely new and promising, giving a new perspective to further improvements in terms of accuracy, mainly for what concerns the tuning of parameter values and a sensitivity analysis.

Conﬂict of interest statement None declared. References [1] M. Bauml, K. Bernardin, M. Fischer, H.K. Ekenel, Multi-pose face recognition for person retrieval in camera networks, in: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Boston, MA, USA, 2010. [2] D. Cabrini Hauagge, N. Snavely, Image matching using local symmetry features, in: Proceedings of IEEE, CVPR 2012, 2012. [3] Y. Cai, M. Pietikäinen, Person re-identiﬁcation based on global context, in: Proceedings of ACCV Workshop, 2010.

[4] G. Csurka, C.R. Dance, L. Fan, J. Williamowski, C. Bray, Visual categorization with bags of keypoints, 2004. [5] V. Di Gesú, B. Zavidovique, The s-kernel: a measure of symmetry of objects, Pattern Recognit. 40 (3) (2007) 839–852. [6] M. Farenzena, L. Bazzani, A. Perina, V. Murino, M. Cristani, Person reidentiﬁcation by symmetry-driven accumulation of local features, in: Proceedings of IEEE, CVPR 2010, 2010. [7] D. Filliat, A visual bag of words method for interactive qualitative localization and mapping, IEEE Trans. Robot. Autom. (2007) 3921–3926. [8] D. Gray, S. Brennan, H. Tao, Evaluating appearance models for recognition, reacquisition, and tracking, in: IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), 2007. [10] S. Kondra, A. Petrosino, S. Iodice, Multi-scale kernel operators for reﬂection and rotation symmetry: further achievements, in: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW '13), IEEE Computer Society, Washington, DC, USA, 2013, pp. 217–222. [11] S. Kondra, A. Petrosino, Self-similarity and points of interest in textured images, in: Perception and Machine Intelligence, 2012, pp. 306–313. [12] S. Kondra, V. Torre, Texture classiﬁcation using three circular ﬁlters, in: 6th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), Bhubaneshwar, India, 2008. [13] C.-H. Kuo, C. Huang, Inter-camera association of multi-target tracks by on-line learned appearance afﬁnity models, in: Proceedings of ECCV 2010, Hersonissos, Crete, Greece, 2010. [14] R. vanLier, J.S. Wagemans, From images to objects: global and local completion of self-occluded parts, J. Exp. Psychol.: Hum. Percept. Perform. 25 (1999) 1721–1741. [15] Y. Liu, H. Hel-Or, C. Kaplan, L.V. Gool, Computational symmetry in computer vision and computer graphics, Found. Trends Comput. Graph. Vis. 5 (1–2) (2010) 1–195.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i

12

S. Iodice, A. Petrosino / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎

[16] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2) (2004) 91–110. [17] R. Mazzon, S. Fahad Tahir, A. Cavallaro, Person re-identiﬁcation in crowd, Pattern Recognit. Lett., 33 (14) (2012) 1828−1837. [18] J. Neira, J.D. Tardos, Data association in stochastic mapping using the joint compatibility test, IEEE Trans. Robot. Autom. 17 (6) (2001) 890–897. [19] S.R. Palmer, The role of symmetry in shape perception, Acta Psychol. 59 (1985) 67–90. [21] B. Prosser, S. Gong, T. Xiang, Multi-camera matching using bi-directional cumulative brightness transfer functions, in: Proceedings of the British Machine Vision Conference, Leeds, UK, 2008. [22] B. Prosser, W.S. Zheng, S. Gong, T. Xiang, Person re-identiﬁcation by support vector ranking, in: Proceedings of the British Machine Vision Conference, Aberystwyth, UK, 2010. [24] G. Sanroma, R. Alquzar Mancho, F. Serratosa i Casanelles, A new graph matching method for point-set correspondence using the EM algorithm and softassig, Comput. Vis. Image Underst. 116 (2) (2012) 292–304.

[26] U.H. Ofﬁce, i-LIDS Multiple Camera Tracking Scenario Deﬁnition, 2008. [27] G. Yu, J.-M. Morel, ASIFT: An Algorithm for Fully Afﬁne Invariant Comparison, Image Processing On Line. http://dx.doi.org/10.5201/ipol.2011.my-asift, 2011. [28] P. Wenderoth, The salience of vertical symmetry, Perception 23 (1994) 221–236. [29] C. Wilson, A.R. Hicklin, M. Bone, H. Korves, P. Grother, B. Ulery, R. Micheals, M. Zoep, S. Otto, C. Watson, Fingerprint Vendor Technology Evaluation 2003: Summary of Results and Analysis Report, Technical Report NISTIR 7123, NIST, 2004. [30] H. Zabrodsky, Symmetry—A Review, Technical Report 90–16, CS Dept., The Hebrew University of Jerusalem, 1990. [32] W.S. Zheng, S. Gong, T. Xiang, Person re-identiﬁcation by probabilistic relative distance comparison, in: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 2011.

Sara Iodice received the Master Degree in Computer Science from the University of Naples Parthenope. Currently, she is a Fellow Researcher at the CVPRLab of the University of Naples Parthenope. Her research interests are data mining and knowledge discovery, pattern recognition, soft computing, spatio temporal data analysis. She is a student member of the IEEE Computer Society.

Alfredo Petrosino is a professor of Computer Science at the University of Naples Parthenope, where he heads the research laboratory CVPRLab at University of Naples Parthenope (cvprlab.uniparthenope.it). He held positions at University of Salerno, International Institute of Advanced Scientiﬁc Studies (IIASS), National Institute for the Physics of Matter (INFM), and lastly as a Researcher and a Senior Researcher at National Research Council (CNR). He taught at the Universities of Salerno, Siena, Naples Federico II, Naples Parthenope. He is a Senior member of the IEEE, USA, a member of the International Association for Pattern Recognition, USA, International Neural Networks Society, USA. He co-edited six books and more than one hundred research publications in the areas of Computer Vision, Image and Video Analysis, Pattern Recognition, Neural Networks, Fuzzy and Rough Sets, Data Mining. He is/was an Associate Editor of Pattern Recognition journal; a Member of the Editorial Board of Pattern Recognition Letters, International Journal of Knowledge Engineering and Soft Data Paradigms; an Editor of IEEE Trans SMC-Part A, Fuzzy Sets and Systems, Image and Vision Computing, Parallel Computing.

Please cite this article as: S. Iodice, A. Petrosino, Salient feature based graph matching for person re-identiﬁcation, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.09.011i