- Email: [email protected]

Chem. Vol. 18, No. 2, pp. lO~l08, 1994 Copyright 0 1994 Ekevicr Science Ltd Printed in Great Britain. All rights reserved C0!37-8485/94 $7.00 + 0.00

0097-8485(943m2-v

CLUSTER

ANALYSIS

BY SIMULATED

ANNEALING

LI-XIAN SJN, Yu-LONG XIE, XIN-HUA SONG, JI-HONG WANG and Ru-QIN Yu* Department of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, People’s Republic of China

(Received I2 May 1993; in rev&-d firm 21 December 1993) Abstract-The present paper tries to apply a new clustering algorithm based on simulated annealing to chemometric research. A new stopping criterion and perturbation method which are more feasible than those proposed in the literature, are proposed. The algorithm is first tested on simulated data, and then used for the classification of Chinese tea samples. The results show that the algorithm which guaranteed obtaining a global optimum compared favourably with the traditional hierarchical technique and K-means algorithm.

INTRODUCTION Cluster analysis as an unsupervised pattern recognition method is an important tool in exploratory analysis of chemical data. It has found wide application in many fields such as disease diagnosis, food analysis, material industry, etc. Hierarchical and optimization-partion algorithms are the most widely used methods of cluster analysis (Batchell, 1989). One of the major difficulties of conventional clustering algorithms is to obtain a global optimal solution of a corresponding problem. Simulated annealing as a stochastic optimization algorithm (Collins et al., 1988) could provide a promising way to circumvent such difficulties. More recently, generalized simulated annealing has been introduced into chemometrics for

wavelength

selection

(Kalivas

et at.,

1989)

and

calibration sample selection (Kalivas, 1991). The use of cluster analysis methods based on simulated annealing for chemometric research is of considerable interest. In this paper, this sort of algorithm is tested by using simulated data generated on a computer, and then it is applied to the classification of Chinese tea samples. The results compared favourably with those obtained by conventional hierarchical clustering and K-means algorithms (Li & Cai, 1986; Zhang el at., 1991).

THEORY Cluster analysis u?lneaIiJlg

algorithm

based

on

simulated

Simulated annealing (SA) which derives its name from the statistical mechanics simulating the atomic equilibrium at a fixed temperature belongs to a category ‘Author

of stochastic

for correspondence.

optimization

algorithms.

According perature T ability of Boltzmann

to statistical mechanics, at a given temand under thermal equilibrium, the proba given configuration i, f;, obeys the distribution: A=k

exp(--Ei/Ti)

where k is a constant and E, is the energy of the configuration i. Simulated annealing was proposed by Kirkpatrick et al. (1983) as a method for solving combinational optimization problems which minimizes or maximizes a function of many variables. The idea was derived from an algorithm proposed by Metropolis et al. (1953) who simulated the process involving atoms reaching thermal equilibrium at a given temperature T. The current configuration of the atoms is perturbed randomly and then a trial configuration is obtained according to the method of Metropolis et al. (1953). Let ECand E, denote the energy of the current and trial configuration, respectively. If E, < EC, which means that a lower energy has been reached, the trial configuration is accepted as the current configuration. If E, > E,, then the trial configuration is accepted with a probability which is directly proThe perturbation portional to exp( - (E, - E,)/T). process

is repeated

until

the

atoms

reach

thermal

equilibrium, i.e. the configuration determined by the Boltzmann distribution at the given temperature. New lower energy states of the atoms will be obtained as T is decreased and Metropolis simulation processes are repeated. When T approaches zero the atomic lowest-energy state or the ground state is obtained. Cluster analysis can be treated as a combinational optimization problem. Selim % Alsultan (1991) and Brown & Entail (1992) described the analogy between simulated annealing and cluster analysis. The atomic configuration of simulated annealing corresponds

LI-XIAN SUN et al.

104

to the assignment of patterns or samples to a cluster in cluster analysis. The energy E of the atom configuration and temperature T in simulated annealing process correspond to the objective function J and control parameter T in cluster analysis, respectively. Suppose n samples or patterns in d-dimensional space are to be partitioned into k clusters or groups. Different clustering criterions could be adopted. In this paper, the sum of the squared Euclidian distances from all samples to their corresponding cluster centers is used as the criterion. Let A = [a&

= 1,2, . , ., n;j = 1,2, . . ., d)

be an n x d sample data matrix, and W = [we](i = I,& . . ., n;g = 1,2, . . ., k) be an II x k cluster membership matrix, wip= 1, if sample i is assigned to cluster g and otherwise

Z = [z,,](g = 1,2, . . ., k;j = 1,2, ...,d) be a k x d matrix of cluster centers

where

Step 4. Let frag = 1, let p be the threshold probability and if IGM < N/2, p = 0.80, else, p = 0.95, a trial assignment matrix W, can be obtained from the current assignment W, by following method. If i z n, then, let i = i - n, else, let i = i + 1, take sample i from the sample set, initial class assignment (wiZ) of this sample is expressed byf(wherefbelongs to arbitrary class of k classes), i.e. f = wip, then draw a random number u (u = rand, where rand is a random number of uniform distribution in the interval [O,l]), if u > p, generate a random number r which lies in the range [l,k 1, here r #f, put sample i from class f to class r, let we = r, let Jlag = 2. Otherwise, take another sample, repeat the above process until jfag = 2. Step 5. Let corresponding trial assignment after above perturbation be W,. Calculate the objective function value J, of the assignment. If J, < J,, let W,= W,, J,=J,. IfJ,

The sum of the squared Euclidian distances as the objective function to be minimized: J(w,,,

is used

. . *, WI/c;.. .W”l,. . ., W./J

irIg=’

j=l

The clustering is carried out in the following steps: Step I. Set initial values of parameters. Let T, be the initial temperature, T2 the final temperature, n the temperature multiplier, N the desired number of Metropolis iterations, ZGM the counting number of a Metropolis iteration and i the counting number of a sample in the sample set. Take T, = 10, T, = lOem, p = 0.74.9, N = 4n, IGM = 0 and i = 0. Step 2. Assign an initial class label in k classes to all of R samples randomly and then calculate corresponding values of the objective function J. Let both the optimal objective function value Jb and the current objective function value J, be J, the corresponding cluster membership matrix of all samples be W,. Tb is the temperature corresponding to the optimal objective function Jb. T, and W, are the temperature and cluster membership matrix corresponding to the current objective function J,, respectively. Let T, = T,, Tb= T,, IV, = W,. Step 3. While the counting number of Metropolis sampling step IGM < N, go to step 4, otherwise, go to step 7.

Simulation on the computer In order to verify the effectivity of cluster analysis based on simulated annealing, the algorithm was tested by using simulated data generated on the computer. The simulated data were composed of 30 samples containing two variables (x, y) for each. These samples were supposed to be divided into three classes. The data were processed by using cluster analysis based simulated annealing, hierarchical cluster analysis (Ward, 1963) and K-means algorithm (Li & Cai, 1986; Zhang it al., 1991), respectively. As shown in Table I, the optimal objective function (Jb) values obtained were 166.9288, 189.3531 and 168.7852 for clustering based on simulated annealing, hierarchical cluster analysis and K-means algorithm respectively. Comparing the results in the column 4 with those of column 7 in Table I, one notices that there is only a disagreement of the class assignment for the sample No. 6 in simulated data set. No. 6 was distributed to class I and class 2 in simulated data set and by simulated annealing, respectively, meaning No. 6 is close to No. 14 in class 2. This shows that the SA method is more preferable than hierarchical cluster analysis and K-means algorithm. The objective function values of K-means algorithm with 50 iterative cycles are listed in Table 2, the lowest value is 168.7852. One notices that the behavior of K-means algorithm is influenced by the choice of initial cluster centers, the order in which the samples were taken, and, of course, the geometrical properties

Cluster analysis

105

data & parameters

Generate an initial cluster membership

matrix Wb mndomly

w,=

Tc=Tl;

J~=J;J~=J;

Do perturbation

operation

& calculate I :

I

Wb ;

Wt :

to obtain

7Elg = I; p=O.$;else;

ifIGMsN/2;

p=O.95;

end

while flag f 2 if i > n; i=i-n;

else; i-i+l;

f= w. . u-mnd: g ‘ ’ if o>p; w. =r; (r+f); %

end

LYag=2;end

+ Calculate if

Jt =t if

; WC= w*;

Jt SJ

6’

* Jb= J,;

Jt :

Jc = Jt; Wb=

Wt;

IGM=O;

end

end

Fig. 1. Flow chart of a cluster analysis based on simulated annealing. of the data. The tendency of sinking into local optima is obvious. Clustering by simulated annealing can provide more stable computational results. Classifcation

of ten samples

Liu et al. (1987) studied classification of Chinese tea by using hierarchical cluster analysis and principal component analysis. In their study, three categories of tea of Chinese origin were used: green, black and oolong. Each category contains two

varieties: Chunmee (C) and Hyson (H) for green tea; Keemun (K) and Feng Quing (F) for black tea; Tikuanyin (T) and Se Zhong (S) for oolong tea. Every sample in each group was assigned a number according to its quality, which was given by tea experts on the basis of taste, the best quality was numbered as I. It must stressed that the assignment of quality by the tea experts is valid only for samples belonging to the same category and variety. The names of the samples are composed of the first letter of the

106

LI-XIAN Table I. Comparison

of results obtained

by

SUN er al.

using different

methods in the classification

of simulated

data Classification Simulated No.

data

x 1.0000 3.0000 5.0000 0.0535 0.3834 5.8462 4.9103 0 4.0000 2.oooo 11.000 6.000 8.00&l 7.OOOO lO.ooO 9.5163 6.9478 I I .5297 10.8278 9.0000 7.2332 6.591 I 1.2693 4.1537 3.5344 1.5546 7.7147 5.0269 4.1809 6.3858

3 4 5 6 7 8 ,I I1 I2 13 14 15 16 17 I8 19 20 21 22 23 24 25 26 27 28 29 30 Objective

function

Actual class of simulated data

Y

Simulated annealinn

K-means*

0

3.0000 2.0000 4.OOOo 0.4175 3.0920 4.2625 2.6326 5.OOcQ 1.0000 5.6515 5.2727 3.2378 2.4865 6.OooO 3.9866 I .5007 3.9410 2.0 I59 4.7362 12.000 9.tmaa 9.5373 8.OOoo ll.COO 11.6248 8.0910 7.OOOo 9.8870 9.4997

2 2 2 2 2 : 2 2 2 3 3 3 3 3 3 3 3 3 3

value (lb)

169.3275

*The result with lowest value of Jb among

Table 2. Js values obtained 177.2062 306.3492 177.2062 176.4388 171.6456 373.6821 176.4388

189.3531

SO independent

variety name, followed by the number indicating the quality. The data for various tea samples were processed by using cluster analysis based on simulated annealing and K-means algorithm. The results obtained by using the two methods were compared with those given by Liu et al. (1987) by using hierarchical cluster analysis. The results are summarized in Table 3, where the number 1, 2 or 3 in columns 2, 3, 4 and 5 means that the tea samples are classified into three groups (tea samples denoted by the same value such as 1, 2 or 3 are classified into the same group). The objective Jb for the hierarchical clustering obtained by Liu et al. (1987) was calculated according to equation (I). As shown in Table 3, different results can be obtained by using different methods with the same criterion. Objective function J obtained by using cluster analysis based on simulated annealing

307.3 I02 169.3275 173.0778 178.9398 173.0778 202.4826 181.4713 177.2062

Hierarchical dusterinn

results

166.9288

cycles.

was the lowest among all methods listed in Table 3. It seems that cluster analysis based on simulated annealing can really provide a global optimal result. Column 4 of Table 3 shows the classification results of K-means algorithm with lowest Jb among 50 independent iterative cycles. Comparing the results in the column 5 with those of column 2 in Table 3, one notices that there is only a disagreement of the class assignment for the tea sample K2. K2 was distributed to class 1 and class 2 by hierarchical clustering and simulated annealing respectively. As mentioned above, the assignment of quality by the tea experts is valid only for samples belonging to the same category and variety. According to hierarchical clustering sample K2 is classified as I, meaning K2 is at least as good as C4, H3, etc. Simulated annealing gave a classification of class 2 for K2, qualifying it as the same order as CS, H4, etc.

by using K-means algorithm initial clustering

168.7852 173.0778 169.3275 168.7852 181.4713 173.0778 202.4826

168.7852

iteration

169.3275 173.0778 173.0778 373.6821 173.0778 202.4826 173.0778

181.4713 178.9398 171.6456 173.0778 306.3492 178.9398 178.9398

with 50 different 181.4713 347.3008 177.2062 202.4826 169.3275 347.3008 177.2062

random

247.3008 169.3275 181.4713 168.7852 177.2062 169.3275 347.3008

107

Cluster analysis Table 3. Comparison of results obtained by using different methods in the classification of Chinese tea samples Classification results Sample

Hierarchical clustering

K-means’

Cl c2 c3 C4

1 I

I I

:

I 2

C5 C6 Cl ::

2 2 !

2 2 I

H3

I

H4 HS KI K2 K3 Kd FI F2 F3 F4 FS F6

: 1 1 2 2 1 1 I

K-meanst

Simulated annealing

I

I

I I

I

I I I

:

II I 1 I I

: 2 I I I

: 2 2 2 2 I

I 1 1 1 I I I

1. 2

2 2 2 2

1 1 1 1

: I 2 2 2 I I I

P:

23

23

:

:

T3 T2 T4 Sl s2 53 s4

3 3 3 3 3 3

3 3 3 3 3 3

3 2 3 2 2 3 3

: 3 3

50.8600

140.6786

119.2311

50.6999

Objective function value (Ja)

I

1

1

2 2

: 3

*The result obtained in one iteration cycle with arbitrary initial clustering. tThe result with lowest value of Jb among 50 independent iteration cycles.

As the value of Jb for simulated annealing is slightly lower, the results for proposed algorithm seems to be more appropriate in describing the real situation. The clustering results obtained by K-means algorithm seems not sufficiently reliable. Discussion on simulated annealina Selim & Alsultan (1991) pointed out that a no-stopping point is computationally available on clustering analysis based on simulated annealing. Searching some stopping criteria for the use of simulated annealing cluster analysis deserves further investigation. it is rather time-consuming to proceed the calculation until T, < T,, as theoretically r, itself should approach zero (in this paper, r, = lo-* is taken). In general, the exact value of T, is unknown for practical situations. The present authors propose a stopping criterion based on the ratio of temperature T, which corresponds to the optimal objective function J,, to current temperature T,, when T,/T, 4 lo-lo, one stops computation (step 7, uide supra). For example, in the treatment of when T,= 3.8349 x 10ms4 and T,=9.6598 x 10-44, one stops computation. This is a convenient criterion and saves computing time substan-

tially comparing to the traditional approach using extremely small TZ. The methods to carry out the perturbation of the trial states of sample-class assignment in cluster analysis as well as the introduction of perturbation in the simulated annealing process also deserve consideration. In the present paper, a perturbation method based on changing class assignment of only one sample at each time is used (Fig. 1). Such a method seems to be more effective, as usually only the class assignment of one sample is wrong, and only the class assignment of this sample needs to be changed. Brown & Entail (1992) took the sample to be perturbated on a random basis, the corresponding perturbation operation is described as follows: do perturbation operation to obtain W,: flag = 1; if IGM < N/2; P = 0.8; else; p = 0.95; end while J%Z~# 2 i = rand(l,n); where i is a random which lies in the interval [ 1,n]. f = wiir; u = rand; if u >p; w,g=r;flag =2; end end. It seems to us that in such a method the equal opportunity in perturbation for each sample might not really be guaranteed. Every sample will have equal opportunity in perturbation in the present method, so the procedure given in step 4 (vide supra) takes less computation time in obtaining comparable results (Table 4). On the other hand, Selim & Alsultan (1991) proposed the perturbation method below in which the class assignments of several samples may be simultaneously changed in each perturbation: do perturbation operation to obtain IV,: flag = I; if IGM

LI-XIAN SUN et al.

108

Table 4. Comparison of results obtained by using different perturbation methods in the classification of simulated data Classification Actual class of

simulated data

Selim’s method

I

I

I I

:

I

I

I 1 1

:

I

1 I 2 2 2 2 2

2 :

2 I

1 1

results of

BrOWIlk method I

I

:

I I

I

: I 1 I :

2 :

I

2

Present method

2 2 2 2

I

I 2 1 1

1

1 2 : 2 2

:

2 3 3 :

3 :

3 3

2 3 3 3 3 3 3

3 3 3

2 3 3

3 3 3

3 3 3

209.4124

173.7650

166.9288

23.5069

II.7061

Objective function (Jb) Computation time (h)

3 3

I.1627

may be obtained by using the algorithm. Feasible stopping criterion and perturbation method solution

were proposed. The present paper uses minimization of the sum of the squared Euclidian distances as clustering criterion. As one knows that Euclidian distances are suitable only for spherical distribution data set. Searching other clustering criteria suitable for different kinds of data sets for the use of simulated annealing cluster analysis deserves further investigation. Program aoailabiliry-The authors

program

is available

from the

on request.

Acknowledgement-This work was supported Natural Science Foundation of China.

by National

REFERENCES Batchell N. (1989) Chemomef. InteN. Lab. Syst. 6, 105. Brown D. E. & Entail C. L. (1992) Patf. Recog. 25, 401. Collins N. E., E&se R. W. & Golden B. L. (1988) Am. J. Math. Man. Sci. 8, 209. Kalivas J. H. (1991). J. Chemomet. 5, 37. Kalivas J. H., Robert S. N. & Sutler J. M. (1989) Anal. Chem. 61, 2024. Kirkpatrick S., Gelatt C. L Vecchi M. (1983) Science 220, 671. Li G. G. & Cai G. L. (t986) Computer Pat&m Recognition Technique, Chap. 3. Shanghai Traffic University Press, Shanghai. Liu X. D.. Espen P. V.. Adams F. & Yan S. H. (1987) Anal. Chim. Acta 200, 424. Metropolis N., Rosenbluth A., Rosenbluth M., Teller A. & Teller E. (1953) J. C&m. Phys. 21, 108. Selim S. 2. & Alsultan K. (1991) Putt. Recog. 24, 1003. Ward J. H. (1963) J. Am. Dal. Assoc. 58, 236. Zhang Q., Wang Q. R. & Boyle R. (1991) Part. Recog. zq 331.