Hardware implementation of image registration algorithms H S Ranganath
Practical applications of image registration demand throughputs in the order of several hundred of millions of operations per second. Such high throughputs can be achieved by implementing the desired image registration algorithm in hardware. This paper presents architectures for implementation of crosscorrelation, absolute difference and moments methods. In the proposed architecture an effort has been made to min~i~e the amount of hardware needed to perform the reg~.~trat~on function. Keywords: image registration, scene matching, moments method Digital image registration  the problem of locating a given image within a larger image  is fundamental to the disciplines of image processing and pattern recognition. The smaller image to be registered is denoted by W and is known as the window. The larger image whose field of view includes the field of view of W is known as the search area and is denoted by S. It is assumed that W and S are obtained from similar sensors and do not differ in pixel resolution or rotation. S is an M x N array of pixels, each of which may assume one of G possible levels on the grey scale. For i
(1)
W is a K x L array of pixels having the same grey scalerange.Forl
69
Each subimage of S can be uniquely identified by specifying the coordinates of its upper left corner. Let Sij denote the K x L subimage of S whose upper left corner is at the reference point (i, 1) as shown in Figure 1. Then Computer Science Department, University of Alabama in Hun~ville, Huntsville, AL 35899, USA
Reference pwnt
ii .I
(i, j 1
Scorch oreo S
Figure 1. Search area window W is the image area. The field of view of the search area S. the reference point (i, j)
iu
(left) and window (right). The to be registered within the search of W is included within the field The K x L subimage of S at is denoted by S,,
for 1 < m 6 K, 1 6 n d L, 1 d i < M  K + 1 and1 6jdNL+ 1
Sij(m, n) = S(i + m  1,j + n  1)
(3)
The problem of registering W within S is that of finding (i,n such that Sij best matches W. Digital image registration has numerous military, Many space, industrial and robotic applications. interesting problems such as cloud motion tracking, map matching, target tracking and object recognition have been solved using image registration. Registration algorithms are broadly classified as template matching methods and feature matching methods. The crosscorrelation and absolute difference methods are the most widely used template matching methods. In the correlation method, the normalized correlation between W and Sij is used as the measure of their similarity’. The absolute difference method uses the sum of absolute differences between corresponding pixels of W and Sij as a measure of their dissimilarity*. The moments
02628856/86/03 15 l08 $03.00 @ 1986 Butterworth & Co. (Publishers) Ltd ~014 no 3 august 1986
151
method, which has been successfully used for object recognition, is one of the most popular featu~matching registration methods’, The inherent problem associated with ail registration algorithms is the amount of computation involved. For template matching methods, computation is proportional to the product of the number of pixels in the window and the number of subimages in the search area. For feature matching methods computation depends on the type and number of features used. Several schemes, such as twostage template matching, coarsefine template matching and hierarchical search, have been proposed to speed up template matching methods. Twostage template matching a~omplishes registration in two stage$. In the first stage a subimage of W is matched with the corresponding subimage of each Sjj and a few reference points are setected as registration candidates. In the second stage the entire window is matched with the selected subimages of S to determine the registration point. The coarsefine template matching method is also a twostage method in which reducedresolution images are matched during the first stage to select candidates for the second stage, in which the registration point is determined by matching the original images5. The hierarchical search method is a generalized version of coarseline template matching in which matching is done at several resolution levels. At each level promising reference points are selected for further consideration in the next level, and the registration point is determined in the final leve16. In all methods, initial matching is not based on complete information and this may lead to the deletion of the registration point even before the final stage. In many intelligent vision systems (object recognitian systems, autonomous vehicles etc.) fast image registration has become a necessity. Near realtime registration is difficult to accomplish when constrained by the cost, volume of space and power available. As an example, consider. the task of ~giste~ng a 32 x 32 window within a 512 x 512 search area in less than a second. The crosscorrelation method requires approximately 474M additions, 238M multiplications and 23 1461 comparisons. Even when the algorithm is speeded up by a factor of 10 using one of the schemes described earlier, a fairly fast minicomputer like the Vax 780 cannot provide the throughput required. The need for fast and cost effective image registration systems has led to the development of binary image correlators. The binary image correlator quantizes W and S into binary images before matching them. The bilevel quantiza~on of 2%level images results in loss of info~ation, which reduces the reliability of the system. Considerable research has been done toward the development of more reliable binary quantization techniques, with mixed results79. With no advance knowledge of images that will be matched, meaningful quantization is difficult to achieve. In any case binary image correlation is an approximation to the standard correlation method and is expected to have a higher false registration rate than the proven crosscorrelation method. This paper presents hardware architectures for the implemen~tion of three widely used registration methods: the crosscorrelation method, the absolute difference method and the moments method. The next
152
section briefly describes the algorithms, The following section then implements them in hardware. In each architecture an effort has been made to minimize the amount of hardware required to perform the registration function.
IMAGE
REGISTRATION
Crosscorrelation
METHODS
method
When S and W do not differ in pixel resolution or rotation, a measure of simiiarity termed the normalized correlation between W and Sj is computed as f
i
W(m.i?) &(m,
n)
forl
1 (4)
The best match for W is obtained by searching the (M  K + 1) X (N  L + 1) correlation surface for (i*. j*) such that Rfi*, j*) > Rfi, j) for ail (i, ~3 * (I‘*?j*>.
Absolute
difference
method
The absolute difference method computes an error surface E of size (n/l  K + 1) x (N  L + 1) where E(i, 1) is a measure of dissimilarity between W and S, The normalized error between W(m, n) and its corresponding pixel in Sij is defined as &.j, 3% n) = US&?, n)  & where
and
The normalized error Efi,~‘)between W and S, is defined as
for 1 6 i < A4  K + 1 and1 _ Since the ~ompn~tion time for addition or subtraction is less than that for mul~pIi~tion~ the absolute difference method is faster than the crosscorrelation method.
image and vision computing
I sRK, 2
=K,L !
it++++.
. . . .
SRI,N
!
SRK.1 I
+d. HSRJL., SR!,,
SRI,Nl
t_..._l
I
I
1
i
SRI,,
SRI,LI
I
I
t_1SR!,, t_i
I
I
1
c
. . .
Figure 2. SR array. Pixels of S are gated into the SR array at the top left corner in the raster scan order. The gating clock GC and circulating clock CC which control the data movement through the SR array are not shown
Moments
all moments of order three or less are used for registration.
method
The moments method is based on a uniqueness theorem which states that iff(x, y) is piecewise continuous and has nonzero values only in a finite region of the (x, y) plane, then the moments of all orders exist and the moment sequence {m,,} is uniquely determined byflx, y) and conversely Imp,} uniquely determines fix, y). For a digital image these moments are given by mP4 = CC x!~?flx, y) T j
for p, q = 0, 1, 2, 3, . . .
(9)
Central moments which are invarient to translation, normalized central moments which are invariant to translation and size, and Hu’s invariant moments which are invariant to translation, rotation, reflection and size have been described3. Hu’s invariant moments computed from second and thirdorder moments have been successfully used for ship and aircraft recognition, character recognition and classification of an unknown pattern as one of several known patterns. When W and Sj do not differ in rotation or spatial resolution, there is no need to compute normalized or Hu’s moments. The moment sequence {m,,} can be used directly for registration as described below. Let {m,,(W)} and (m,,(i, j)} be the moment sequences of W and S,, respectively. Then for p, q = 0, 1, 2. . . m,,(W)
= i: i mPnYW(m, n) m=l?!=I
(10)
and
m,,(i,_il = f m=l”=I
IitmPnYS,,~m, n)
(11)
The registration problem is now that of finding the Si, whose moments best match the moments of W. Usually
~014 no 3 august 1986
REGISTRATION
ARCHITECTURES
An architecture for the implementation of each method described in the preceding section is developed around an array of registers, referred to as the SR array. The registers of the SR array, numbering KN, are organized in the form of a K x N matrix as shown in Figure 2. K is the number of rows in the window and N is the number of columns in the search area. The SR array is therefore capable at any given time of holding K rows of the search area S shown in Figure 1. The pixels of S are shifted into the SR array at the top left corner under the control of a clock. This clock is referred to as the gating clock and denoted by GC. By using a microcomputer for generating gating clock pulses, the data rate can easily be controlled. At the first gating clock pulse GC,, S(l, 1) is entered into SR,,,. During the second gating pulse GC,, S(l, 1) is shifted from SR,, into SR,,,_ , and the next pixel S( 1, 2) is gated into SR,,,. Thus at the end of KN gating cycles the SR array will be loaded with the first K rows of the search area as shown in Figure 3. At this point in time [SRJ
= S,,,(m, n)
for 1 6m d K and 1 dn Q L
In other words, the KL registers in columns 1 to L of the SR array contain S,,,. At the next gating pulse, S(l, 1) leaves the SR array and S(K + 1, 1) enters the SR array. Also
NU
= Sdm. 4
for1
i.e. gating clock GC, replaces S,,, by S,.,. Thus each gating clock replaces the subimage in the first L columns
153
N
Columl
i
Nl
2
I
Row
x
S(K,fl
S(K1.N)
SRK+V
2
S12,Nl
SIK&Nl)
StKI,N2)
SRKI,Nl
Sf2,NI
1
l
l
l
S(KI,L)
l
l
Sf2,N2)
SR2,tv2
Sf I‘Nf
S(I,NIf
S(l,N2)
%V
SRI,,I
SRI.&2
l
l
*
S(P,L)
l
.s(K,Z)
l
sRK,2
l
l
l
l
SRKI,,
sRKI ,N2
SR2,uI
SR23
.
=K,L
SRK,NI
Kl
l
l
l
l
l
.
.
SO,L)
S(KI.1
sFk=1,2
sRKI, I
S(2.21
l
SR2,2
l
l
l
SRK, I
SWI,P)
S%.L
l
SKI)
l
1
1
S(2,i)
SR2, I
S11‘21
S( i,ll
SRI,,
SRl,l
_~ SRI,,
Figure 3. Status of the SR array after KN gating cycles. Subimage S,,, is in columns 1 to L of the SR array with its successor along the current row. A second clock called the circulating clock is used to vertically rotate the subimage in the first L columns of the SR array (see Figure 2). This clock is denoted by CC. Each circulating clock pulse rotates the subimage in columns I to L of the SR array such that the pixels in register SR,, move into register SR,_r.,, for 2 < m < K and I
i
the crossin Figure SR array array. As
is computed. At the end of KN gating cycles, S,.,(l, n) W(l, n) and S:.,(l, n) will be the outputs of multipliers M, and N,, respectively, for 1 < n d L. The circulating clock is now activated. Each gating clock cycle consists of K circulating clock cycles (Figure 5). The functions performed by the first circulating clock are as follows:
(2) (3)
154
Output of M, = S,,,(2, n) W(2, n) Output of N, = $,(2,
n)
After the second circulating clock pulse CC2
W*(m,4
m=ln=l
(1)
Therefore, after the first circulating clock pulse CC,
method
The digital image correlator which implements correlation algorithm in nardware is shown 4. The first K rows of S are loaded into the and the window W is loaded into the WR the pixels of W are shifted into the WR array f
(4)
into SR,_,, for 2 Q m < K and 1 < n < L. This moves the pixels in SR,, into SR,, for l
The outputs of multipliers M, are added to the previous value in accumuIator A,. (Each GC initializes A, to zero.) The outputs of multipliers N, are added to the previous value in accumulator A,. (Each GC initializes A, to zero.) S,., is shifted such that the pixels in SR,,m move
(A,) = ;f f S,,,(m, n) W(m, n) m=ln=I
Output of M, = S,.,(3, n) W(3, n) Output of N, = S;.,(3, n) Similarly, during every circulating cycle, L product terms are added to A, and L square terms are added to Ar. Finally, at the end of CC,
image and vision computing
Iv
Pixels of
N
I
L
2
I
rEF_ D
R20,j)
Figure 4. Architecture for the crosscorrelation method. The WR array is a K x L array of registers, similar to the K x L SR array, used for holding and circulating W. The other components of the correlator are accumulators (A,, A2, A3), multipliers (15~.Mi, Ni) and a divider (D) GCi
GCi+ /
G N  L + 1. The correlator
output during the next
L gating cycles is not valid and is hence discarded. The Gatmg cycle CCK
Ccl
cc2
CCK
CC1
Figure 5. Gating and circulating clocks. Each gating cycle consists of K circulating cycles. The frequency of the gating clock determines the data rate through the SR array
and the output of the divider D is R’(1, 1). The final circulating clock brings back the register arrays to the states which existed just before the activation of CC,. The host computer reads R’(1, 1) and initiates the next gating cycle, during which
arrival of the (RN + N)th gating cycle begins the correlation of S,, with W. In general, subimage Sij is correlated with W during the [RN + (i  l)S + j  11th gating cycle. Throughout the correlation process the host computer remembers the best match of all subimages correlated. If this is not possible additional hardware is needed to determine (i*, j*). A magnitude comparator can be used to compare the correlation computed during each cycle with the remembered maximum. Two counters for keeping track of the current i and j values, as well as three registers for remembering i*, j* and R2(i*, J*), may be used. Assuming a circulating clock rate of 100 ns, the image correlator presented in this section registers a 32 x 32 window inside a 512 x 5 12 search area in approximately 0.838 s.
Absolute The rising edge of the gating clock clears accumulators A, and A,. S2 is shifted to the position which was occupied by S, , in the previous gating cycle. Ri(l, 2) is computed by the end of the current gating cycle. Thus, the correlator in Figure 4 correlates Sj with W during the (RN + j  1)th gating cycle, for 1 < j
~014 no 3 august I986
difference
method
The hardware implementation of the absolute difference method is shown in Figure 6. The accumulators A,, A, and A, are initialized to zero. During the pixelbypixel loading of W into the WR array, W is computed. After loading, register WR,,n contains W(m, n). Computation of the error surface is controlled by two clocks called the gating clock and circulating clock. Each gating cycle consists of k: circulating cycles, and during each gating
155
Pixel of S
SRK,N
.
.
. . . .
.
.
. . . .
. . . .
SR2,,v
.
.
.
.
SR~,L+I
f____SR2,~
SQ,
.
.
.
.
S?,L+I
__e_
. . . .
. . . .
. . . .
1 SRI,,
. . . .
.
.
.
=2,2
sR2,1

.
.
.
SRI.2
SRI,1
~
%;j L L
c 
9;,& D
4 02
Pixels of W
Figure 6. Architecture for the absolute difference method cycle a new pixel of S enters the SR array. Each rising edge of GC updates A, by adding the pixels entering register column L and subtracting the pixels leaving register column 1. As a result, at the end of the KNth gating cycle, S,~, will be in the lirst L columns of the SR array. Now, for 1 6 n < L
The host computer reads E( 1, 1) and initiates the next gating cycle during which the following tasks are completed: (1) (2) (3) (4)
Output of D, = e(l, 1, 1, n) = I&.,(1, n)  %,  W(1, n) + IQ The first circulating clock now arrives and adds the outputs of D,, Dz, . . . , D, to A, so that
m=ln=l
shifts W and S,,, such that their second rows appear as inputs to the absolute difference operators D,, Dzr. . . , D, giving Output of D, = e(1, 1, 2, n) Similarly, during each circulating cycle the outputs of L difference operators are added to A,. Finally, at the Kth circulating clock, E(1, 1) is present in A, and the register arrays have returned to the states which existed just prior to the activation of CC,. This completes the KNth gating cycle.
156
The rising edge of GC clears A,. S,.* is shifted to the position which was occupied by S,., during the previous gating cycle. A, is updated. A, then contains the sum of all pixels in S,,*. At the end of this gating cycle A, contains E( 1,2).
Similarly, E( I,]) is computed during the (KN + j  1)th gating cycle, for 1 < j < N  L + 1. For the next L gating cycles the output of A, is not valid and is therefore discarded. In general, E(i, 5) is computed during the [KN + (i  1)N + j  11th gating cycle, for l
m,,(i,Jl = f i mPnqSi&n, 4 “,=ln=l
For given values of m, n, p and q, mPnq is a constant
image and vision computing
Search area
Mult+pllers
Multmhers
2 mpq ii,
j) *
l
l
. . .
I
.
2
.
.
.
l
l
l
l
l
l
.
K
.
.
.
.
.
l
.
.
? l
.
.
.
l
l
.
l
l
.
l
l
l
.
K
.
m, (I,/.) l
l
l
.
1 ‘KxL
register
array
whxh
holds
C,, (m,fl )
‘Kxi
register
array
which holds
C,, (i,j)
Figure 7. Architecture for the moments method. Hardware for matching the moments t?f S,,] with the moments oj‘ W is not shown and is denoted by C,,,(m, n). Therefore, if constants C,,(m, n) are precomputed for all desired values of m and n, the calculation of mJi, j) requires only KL multiplications and KL additions for all values of p and 4. The hardware shown in Figure 7 computes the required moments of the subimages of S as they shift through the K x L register array, Since all moments are computed in parallel, each moment requires a K x L register array for holding its precomputed constants. The computation of moments is controlled by gating and circulating clocks as follows: (1) (2) (3) (4)
The first KN gating clocks load the first K rows of S into the SR array. The K x L register array used for the computation of m&i, 1’)is initialized with C,(m, n). Moments of S,,, are computed during the KNth gating cycle with the help of the circulating clock. The rising edge of the next gating clock clears the accumulators and moves SZ into the first L columns of the SR array.
In general, moments of Sij are computed during the [KN + (i  l)N + j  11th gating cycle, for 1 < i
~$4 no 3 august I986
CONCLUSION The hardware implementations of the crosscorrelation, absolute difference and moments methods are similar in structure. The connections between the various components may be serial or parallel. Each image registration unit presented in this paper can be designed using readily available digital hardware and can be attached to a host microcomputer. While a particular problem is addressed, the approach described is general and suitable for the hardware implementation of a wide range of image processing operations. It is possible to place several specialpurpose devices under the control of a host computer to form a ‘tailormade’ intelligent vision system for a given application, at reasonable cost.
REFERENCES Leese, J A, Novak, C S and Clark, B B ‘An automated technique for obtaining cloud motion from geosynchronous satellite data using cross correlation’ J. Appl. ~eteuro~. Vol 10 (February 1971) pp 11%132 Barnea, D I and Silverman, H F ‘A class of algorithms for fast digital image registration’ IEEE Trans. Comput. Vol C2 1 (February 1972) pp 179l 86 Hu, M K ‘Visual pattern recognition by moment invariants’ I’RE Trans. Vol ITS (February 1962) pp 179187 Vanderbrug, G J and Rosenfeld, A ‘Twostage template
157
matching’ IEEE Trans. Comput. Vol C26 (April 1977) pp 384393 5 Rosenfeld, A and Vanderbrug, G J ‘Coarsefine template matching’ IEEE Trans. Systems, Man, Cybernetics Vol SMC7 No 2 (February 1977) pp 104 107 6 Wong, R Y and Hall, E L ‘Sequential hierarchical scene matching’ IEEE Trans. Comput. Vol C27 (April 1978) pp 359366 7 Ranganatb, H S, Roland, J S and Malcolm, W W ‘Comparison of thresholding techniques for TVtoIR
158
image registration’ Proc. IEEE Region 3 Conf., Huntsville, AL, USA (April 1981) pp 474477 8 Roland, J S, Ranganath, H S and Malcolm, W W ‘Improved method for scene matching of dissimilar imagery’ IEEE Trans. Auto Control (June 1980) pp 568569 9 Roland, J S, Ranganatb, H S and Malcolm, W W ‘A pattern recognition technique for scene matching of dissimilar imagery’ 18th IEEE ConJ: Decision and Control, Fort Lauderdale, CA, USA (December 1979) pp 806811
image and vision computing