Cluster analysis by the K-means algorithm and simulated annealing

Cluster analysis by the K-means algorithm and simulated annealing

Chemometrics and intelligent laboratory systems ELSEVIER Chemometrics and Intelligent Laboratory Systems 25 (1994) 51-60 Cluster analysis by the K-...

874KB Sizes 0 Downloads 11 Views

Chemometrics and intelligent laboratory systems

ELSEVIER

Chemometrics and Intelligent Laboratory Systems 25 (1994) 51-60

Cluster analysis by the K-means algorithm and simulated annealing, Li-Xian Sun, Fen Xu, Yi-Zeng Liang, Yu-Long Xie and Ru-Qin Yu * Department of Chemistry and Chemical Engineering, Hunan University, Changsha, Hunan 410082, China

Received 1 November 1993; accepted 16 May 1994

Abstract A modified clustering algorithm based on a combination of simulated annealing with the K-means algorithm (SAKMC) is put forward and applied to chemometrics research. An empirical stopping criterion and a perturbation method which is more feasible than those proposed in the literature are suggested. The algorithm is tested on simulated data, and then used for the classification of samples of Chinese medicine calculus bovis or bezoar and samples of Chinese tea. The algorithm which guaranteed in obtaining a global optimum with shorter time compared favorably with traditional cluster analysis based on simulated annealing and the K-means algorithm.

1. Introduction

Cluster analysis as an unsupervised pattern recognition method is an important tool in exploratory analysis of chemical data. It has found wide application in many fields such as disease diagnosis, food analysis, chemical industry, etc. Hierarchical and optimization partition algorithms are the most widely used methods of cluster analysis [l]. One of the major difficulties of conventional clustering algorithms, however, is to obtain globally optimal solutions. Simulated annealing is a stochastic optimization algorithm [2] and is a promising way to circumvent these difficulties. More recently, generalized simulated an-

* Corresponding 0169-7439/94/$07.00

author.

nealing has been introduced in chemometrics for selection of wavelengths [3] and calibration samples [4]. The use of cluster analysis methods based on simulated annealing for chemometrics research is of considerable interest. In this laboratory a cluster analysis procedure based on simulated annealing (SAC) was put forward [Sl in which a stopping criterion and a perturbation method which are more feasible than those proposed in the literature [6,7] have been proposed. In the present paper, the SAC algorithm is modified by combining it with the K-means algorithm, and the modified algorithm, SAKMC, is tested by using simulated data generated on a computer, and is then applied to the classification of the Chinese medicine calculus bovis or bezoar and tea samples. The results compared favorably with those obtained by conventional simulated annealing and K-means algorithms [8,9].

Q 1994 Elsevier Science B.V. All rights reserved

SSDIO169-743900049-O

L.-X. Sun et al. /Chemometrics and Intelligent Laboratory Systems 25 (1994) 51-60

52

2.2. Cluster analysis algorithm based on K-means aigotithm and simulated annealing

2. Theory 2.1. K-means algotithm

Consider e2,. . . , a,,

IZ samples in d d~ensions, a,, where sample vector ai = [ail, ai2, +- -, aid], i = 1, 2,. . . , n, and assume that these samples are to be classified into k groups. The algorithm [8,9] is stated as follows: Step 1. Select K arbitrary samples from all n samnles as K initial cluster centers z’1”’ where zr) = [z$‘j, ZK(0) (0)’ ~$1, g = 1, 2,. . . , k the superz,2, * 9*, script (0) refers to the initia; zero center assignment. Step 2. At the hth iterative step dist~bute the n samples among k clusters, using the criterion a, E si’), if IIa, -zy) II < IIa, z!“)II for all m = 1 2,. . . , n and i = 1 2,. . . , k, g # i, where sib) denotes the se; of samples whose cluster center is .$), and I]* II represents the norm. Step 3. According to the results of Step 2, the new cluster centers zkb+r), g = 1,2,. . . , k, are calculated such that the sum of the squared Euclidian distances from all samples in sr) to the new cluster center, i.e., the objective function

zp,. . .

m=

)

)

1,2,...,n

zf

is minimized ’ If is the sample mean of si”. Therefore, ZCh+l)=: ..!g

C

ni3 0, [email protected]

a,,

g=1,2

>***>k

where n, is the number of samples in sib). The name ‘K-means’ is obviously derived from the manner in which cluster centers are sequentially updated. Step 4.If zth+r)=zCh) for g=l 2,...,k the algo&hm ha: converged &d stop, otherwise go to Step 2.

Simulated annealing (SA) derives its name from the simulation of atomic equilibria at a fixed temperature and belongs to the category of stochastic optimization algorithms. According to statistical mechanics, at a given temperature q and under thermal equilibrium the probability fi of a given configuration i, obeys the BoltzmannGibbs distribution:

where K is a normalization constant and E, is the energy of the configuration i [6,10]. Simulated annealing was proposed by Kirkpatrick et al. fll] as a method for solving combinatorial opt~ization problems where a function of many variables is minimized or maximized. The idea was derived from the algorithm proposed by Metropolis et al. 1121 which simulates the process of atoms reaching thermal equilibrium at a given temperature T. The current configuration of the atoms is perturbed randomly to give a trial configuration which is evaluated in the following manner. Let EC and E, denote the energy of the current and trial configuration, respectively. If E, E,, then the trial configuration is accepted with a probability proportional to exp[ - (E, - l&)/T]. The perturbation process is repeated until the atoms reach thermal equilibrium, i.e., the configuration determined by the ~ltzma~ distribution at the given temperature. New lower energy states of the atoms will be obtained as T is decreased and Metropolis’ simulation processes are repeated. When T approaches zero the atomic lowest-energy state or the ground state is obtained. Cluster analysis can be treated as a combinatorial optimization problem. Selim and Alsultan [6] and Brown and Entail [7] described the analogy between simulated annealing and cluster analysis. The atomic configuration of simulated annealing corresponds to the assignment of patterns or samples to a cluster in cluster analysis. The en-

L.-X. Sun et al. /Chenwmetrics and Intelligent Laboratory Systems 25 (1994) 51-60

ergy E of the atom induration and temperature T in the simulated annealing process correspond to the objective function J and control parameter T in cluster analysis, respectively. Suppose y1 samples or patterns in d-dimensional space are to be partitioned into k clusters or groups. Different clustering criteria could be adopted. In this paper, the sum of the squared Euclidian distances from all samples to their corresponding cluster centers is used as the criterion. Let A = [aJ (i = 1,2,. . . , n; j = 1,2,. , . , d) be an n x d sample data matrix, and W= [wig] (i = 1, 2,...,g=l, 2 ,...,k) be an n X k cluster membership matrix, wig = 1 when sample i is assigned to cluster g, and otherwise wig = 0. Let 2 = [z,j](g=l, 2,..., k; j=l, 2 ,..., d) be a kxd matrix of cluster centers where

The sum of the squared Euclidian distances is used as the objective function to be minimized: J(Wri,.*.,Wlk;

.*.P&*.?,,+)

(1) Selim and Alsultan [6] proposed the following pe~urbation method in which the class assignments of several samples may be simultaneously changed in each perturbation: do perturbation

operation to obtain IV,:

flag = 1; if IG~~N/2; P=O.8; eise;p=O.95; end i = 0; whileflagf2ori#n if i > n; i = i - n; else; i = i + 1; end f = wi,; u = rand; if u>p; wi,=r;flag=2;end end

On the other hand, Brown and Entail [7] took the sample to be perturbed on a random basis, and the corresponding perturbation operation is described as follows:

do ~rturbation

53

operation to obtain W,:

flag = I; ifIGM*EN/2;P=0.8;else;p=0.95;end while flag 2.2 i = rand (1, n); where i is a random

which lies in the interval [I, n] f = wig; u = rand; if u>p; Wig = r; flag = 2; end end

The present authors [S] have proposed a cluster analysis procedure based on simulated annealing (SAC) following the perturbation method which was found preferable to those suggested by Selim and Alsultan as well as by Brown and Entail: do perturbation

operation to obtain W,.

flag = 1; ifrGM~N/2;P=O.g;e~e;p=O.9~;e~ i = 0; while flag z 2 ifi>n;i=i-n;elve;i=i+l;end f = wig; u = rand; if u >p; wig =r;&g=2;end end

On the basis of the above works, a modified clustering algorithm based on a combination of simulated annealing with the K-means algorithm WKMC) is put forward in this paper. The calculation steps of SAKMC are the same as in SAC 151 except for the starting point. In SAKMC, an initial class label among k classes for ail iz samples is obtained using the K-means algorithm instead of by random assignment in SAC. The clustering is carried out in following steps: Step 1. Set initial values of parameters. Let T, be the initial temperature, T. the final temperature, or, the temperature multiplier, N the desired number of Metropolis iterations, ZGM the counting number of Metropolis iteration and i the counting number of a sample in the sample set. Take T = 10 7 = lo-* N=q,‘lGM’=$and i=,_P=o’7No*9’ Step 2. Obtain’the initial class labels in k classes of all IZ samples by the K-means algorithm (Selim and Alsultan as well as Brown and Entail [f&7]assign these initial

54

L.-X. Sun et al. / Chemometrics and Intelligent Laboratory Systems 25 (1994) 51-60

class labels in k classes to the n samples) and then calculate corresponding values of the objective function J. Let both the optimal objective function value Jb and the current objective function value J, be

Generate an initial cluster membership

J, the corresponding cluster membership matrix of all samples be IV,. T, is the temperature corresponding to the optima1 objective function J,, T, and WCare the temperature and cluster membership

matrix Wb by

K-meansalgorithm

& calculateJ ;

yes

Do perturbation operation to obtain Wl :

IGM s N/2; p = 0.8; else ; p = 0.95; end if i > n; i 5; i - n; else;

=r;

i = i + I: end

(rsf);

jlag=2;end

Calculate $ : if

Jt
Jt

c

; W,=Wt sJb

; J, = Jt

;

; Jh= J, ; Wb= Wt ; [GM= 0; end

end

Fig. 1. Flow chart of a cluster analysis by the K-means algorithm and simulated annealing.

55

L.-X. &UIet al. / Chemometrics and Intelligent Laboratory Systems 25 (1994) 51-60

6-

l

4A i 2-

.

00

4

2

0

8

6

X

Fig. 2. The plot of simulated data set i.

matrix, respectively, corresponding to the current objective function J,. Let T, = T,,

Step 3. When the counting number of Metropolis sampling step IGl\/i < N, go to Step 4, otherwise go to Step 7.

Tb = T,, W, = W,,.

87

I

(I

.

. .

I . .

.

I

-

.

.

I .

.

. .

. . .

0;

0

,

t

I

1

2

4

6

8

X Fig. 3. The plot of simulated data set II.

56

L.-X. Sun et al. /Chemometrics

and Intelligent Laboratory

Systems 25 (1994) 51-60

Table 1 The comparison of results of different approaches Simulated data set I

Objective function value, J, Computation time (min)

Simulated data set II

K-means

SAC

SAKMC

K-means

SAC

SAKMC

63.7707 3

60.3346 70

60.3346 55

114.6960 5

114.5048 660

114.5048 183

Step 4. Let flug = 1, let p be the threshold probability and if ZGh4 Q N/2, set p = 0.80, otherwise set p = 0.95. A trial assignment matrix W, can be obtained from the current assignment W, in the following way: If i> n, then, let i =i -n; otherwise let i = i + 1, take sample i from the sample set, initial class assignment (wig) of this sample is expressed by f (where f belongs to an arbitrary class of k classes), i.e., f = wig, then draw a random number u (U = rund, where rund is a random number of uniform distribution in the interval [OJI), if u >p, generate a random number r which lies in the range [l&l, here r #f, put sample i from class

Table 2 The normalized data of microelement clustering analysis methods

Step

Step

Step

A

f to class r, let wig = r, let flag = 2; otherwise, take another sample, repeat the above process until fig = 2. 5. Let corresponding trial assignment after above perturbation be W,. Calculate the objective function value .Zt of the assignment. If Jt <.Z,, let W, = W,, .Z,=.Zt. If .Zt<.Zb, then J, = .Zt, W,, = W,, ZGM = 0. 6. Produce a random number y, i.e. y = rund, if y < exp[-(.Z, - .ZJ/T,l, then W, = W,, .Z,= Jt. Otherwise ZGM = ZGM + 1, go to Step 3. 7. Let T, = pT,, ZGM = 0, J, = J,, WC= W,,. If T, < T, or T,.T, < lo-“, then stop; otherwise, go back to Step 3. flow chart of the program is shown in Fig. 1.

contents in natural and cultivated calculus bovir samples and comparison of different

Sample No. a

Cr

cu

Mn

Ti

Zn

Pb

MO

Ca

K

Na

P

1 2 3 4 5 6 7 8 9 10 11 12 13

0.2601 - 0.5501 - 0.2094 -0.1412 - 0.0352 0.4039 - 0.8455 - 0.5539 - 0.5880 - 1.5648 0.0178 2.3159 1.4905

1.2663 - 0.4793 1.4416 - 0.7887 0.3886 - 0.1633 1.6040 - 0.9086 - 0.6811 -0.7790 0.9968 - 0.8352 - 1.0622

- 0.3583 0.4264 1.3222 - 0.3329 0.9366 0.3890 - 0.8126 - 0.7482 - 0.4788 - 1.0007 - 0.9148 - 0.6767 2.2487

- 0.8806 - 0.7349 - 0.9609 - 0.9448 - 0.8966 -0.2495 -0.2655 0.2371 1.6784 -0.4273 0.6422 1.9193 0.8831

2.1462 1.6575 1.3797 - 0.4879 - 0.3549 -0.4589 - 0.5768 -0.4448 -0.5340 -0.5804 - 0.5779 -0.5841 -0.5835

0:7075 0.9897 0.2796 2.3681 0.5646 -0.6124 - 0.0425 -0.0360 - 1.0765 -0.3776 - 0.4834 -1.1406 - 1.1406

0.3092 0.3092 1.8715 -0.1121 - 0.6738 -0.9546 1.1693 -0.5334 -0.9546 -0.1121 1.5906 -0.9546 -0.9546

2.4408 1.1206 0.7906 0.8089 - 0.4288 -0.0437 - 0.5755 -0.8322 -0.4471 -0.8505 - 0.4838 -0.6580 -0.8413

- 0.8782 - 0.9742 -0.7823 - 0.3985 0.7528 1.3284 -0.2066 1.7122 - 1.0406 0.8487 - 0.3026 -1.1660 -0.9742

- 0.7953 -0.8503 -0.7128 0.4146 1.2395 0.6620 0.3871 1.9544 -0.7953 0.6620 0.1671 -1.4827 -0.8503

2.7953 - 0.7975 -0.9002 0.3316 - 0.0790 - 0.3356 - 0.6435 - 0.4382 0.7422 - 0.2329 0.7422 -0.7462 -0.4382

Objective function value, J,, Computation time (min)

K-means

SAC

SAKMC

96.7396 3

94.3589 21

94.3589 12

a Nos. l-3 are cultivated calculus Louis and Nos. 4-13 are natural calculus bouis samples.

L.-X Sun et al. /Chemometrics

57

and Intelligent Laboratory Systems 25 (1994) 51-60

3. Results and discussion 3.1. Simulation on the computer In order to verify the effectivity of the present method, the algorithm was tested using two simulated data sets generated on the computer. The simulated data sets were composed of 30 samples (data set I) and 60 samples (data set II) containing two variables (x, y) for each (see Figs. 2 and Fig. 3, respectively). The samples were supposed to be divided into three classes. The data were processed by using cluster analysis based on simulated annealing (SAC) and cluster analysis by K-means algorithm and simulated annealing WKMC), respectively. As shown in Table 1, the computation times required for obtaining corresponding optimal objective function values (1,) were about 3 mm, 70 min and 55 min for clustering based on K-means algorithm, simulated annealing (SAC!), and SAKMC for data set I, respectively, and 5 min, 660 min, 183 min for data set II, respectively. One notices that the larger the data set, the longer the imputation time. 3.2. Classification of calculus bovis samples Calculus bouis is a traditional Chinese me~cine;‘it is widely used for the treatment of fever and sore throat. The microelement contents in natural calculus bovis and cultivated calculus bovis samples were determined using an inductively coupled plasma (ICP) procedure 1131. The normalized data of microelement contents and results obtained using different clustering methods are listed in Table 2. The K-means algorithm took the shortest time to obtain the final result (Jr, = 96.7396), but one notices that according to Table 2 this is really a local optimum. Samples No. 4 and No. 7 belonging to cultivated calculus bovis were mi~lassi~ed into natural calculi bovis by the K-means algorithm. Table 2 shows that both SAC and SAKMC could provide a global optimum U, = 94.35891, only sample No. 7 belonging to cultivated calculus bovis was misclassified into natural calculus bouts. In this case, if sample No. 4 was classified as cultivated calculus bovis, the corresponding objective function Jb

would be 95.2626. This indicates that sample No. 4 was closer to natural culculus bovis. From the above results, one notices that calculus bovis samples can be correctly classified into natural and cultivated on the basis of the microelement contents by means of SAC and SAKMC except for sample No. 4. 3.3. Classification of tea samples Liu et al. [14] classified samples of Chinese tea using hierarc~cal cluster analysis and principal component analysis. In their study, three categories of tea of Chinese origin were used: green, black and colong. Each category contains two varieties: Chunmee (C) and Hyson (H) for green tea, Keemun (K) and Feng Quing (F) for black tea, Tikuanyin (T) and Se Zhong (S) for colons tea. The samples in each group were assigned numbers according to their quality as determined by tea experts on the basis of tasting. The best quality was numbered as 1. It should be stressed that this assignment of quality only ranks samples belonging to the same category and variety. The samples are named by the first letter of the variety name, followed by the number indicating the quality. The data which involve concentration (% w/w, dry weight) of cellulose, hemicell~ose, lignin, polyphenols, caffeine and amino acids for various tea samples were processed using cluster analysis based on simulated annealing and K-means algorithm in the present paper. The results obtained using the two methods were compared with those of Liu et al. [141 using hier~chical cluster analysis. The results obtained are summarized as follows: Hierarchichal clustering: Class 1: Cl-4, Hl-3, M-2, Fl-4 Class 2: C5-7, H4-5, K3-4, F5-7 Class 3: Tl-4, Sl-4 Objective function value Jb: 50.8600 (calculated according to Eq. (1)) Computation time: 10 min K-means: Class 1: Cl-7, Hl-5, Class 2: Tl-2, Sl-2

Kl-4,

Fl-7

58

L.-X. Sun et al. /Chemometrics and Intelligent Laboratoty Systems 25 (1994) 51-60

Class 3: T3-4, S3-4 Objective function value J,,: 119.2311 amputation time: 6 min SAC and SAKMC Class 1: Cl-l, Hl-3, Kl, Fl-4 Class 2: G-7, H4-5, IG!-1, F5-7 Class 3: 111-4, Sl-4 Objective function value 1,: 50.6999 Computation time: 107 min (SAC); WKMC)

dency of sinking into local optima is obvious. Both cluster analysis based simulated annealing (SAC) and SAKMC can approach global optimal solutions but SAKMC converges faster than SAC. The SAKMC method is thus preferred over simulated annealing (SAC) and K-means algorithm in classification of real samples. 3.4. Discufsion of simulated annealing

68 min

Selim and Alsultan [6] mentioned that no stopping point was available for clustering computations based on simulated annealing. The search for stopping criteria for simulated annealing cluster analysis deserves further investigation. It is rather time consuming to proceed the calculation until T, < T,, as theoretically T2 itself should approach zero (in this paper, T2 is taken as 10-99). In practical situations, the exact value of T2 is often unknown. The present authors propose a stopping criterion based on the ratio of temperature Tb which corresponds to the optimal objective function J, to the current temperature T,, when T,.T, < lo-“, one stops the computation (Step 7, vide supra). For example, in the treatment of data set I when T, = 4.0720 X 1O-27 and T,, = 4.7690 X 10-17, one stops computation. This is a convenient criterion and saves computing time substantially compared to the traditional approach using extremely small I;. The methods for perturbing the trial states of sample class assignment in cluster analysis as well as the introduction of perturbation in the simulated annealing process also deserve consideration. Brown and Entail [7] put forward a perturbation method (BEM) in which samples were perturbed on a random basis. It seems to us that in such a method the equal opportunity in perturbation for each sample might not be really guaranteed. On the other hand, Selim and Alsultan [6] proposed a perturbation method (SAM) in which

The three classes obtained by hierarchical clustering, SAC, and SAKMC are: (1) high quality green and black teas; (2) low quali~ green and black teas; and (3) oolong teas. One notices that there is only a disagreement of the class assignment for the tea sample K2 from the above results. K2 was distributed to class 1 and class 2 by different clustering procedures. Both SAC and SAKMC gave the lowest objective function value. As mentioned above, the assignment of quality by the tea experts is valid only for samples belonging to the same category and variety. According to the hierarchichal clustering sample K2 is classified as 1, meaning K2 is at least as good as C4, H3, etc. Simulated annealing gave a classification of class 2 for K2, qualifying it as the same order as C5, H4, etc. The K-means algorithm is inferior as shown by the objective function. It puts all the green and black teas into the same group and separates the oolong teas into a high and a low quality group. Comparing the results obtained from the simulated data, calculus bovis samples and tea samples, one notices that different methods give different results, with the K-means algorithm converging to local optimal solutions in the shortest time. The behavior of the K-means algorithm is influenced by the choice of initial cluster centers, the order in which the samples were taken, and the geometrical properties of the data. The ten-

Table 3 Comparison of results using different perturbation methods for simulated data set I Objective function value, Jb Computation time (min)

SAKMC

SAC

BEM

SAM

60.3346 55

60.3346 70

60.3346 248

63.4884 2787

L.-X. Sun et al. /Chemomettics and Intelligent Laboratory Systems 25 (1994) 51-60

the class assignments of several samples might change simultaneously in each perturbation. Such a method seems not to be effective, since often only one sample is misclassified, and only the class assignment of this sample needs to be changed. In the present paper, a perturbation method changing class assignment of only one sample at each time was used (Fig. 1). Every sample has equal opportunity to be perturbed in the present method and the procedure given in Step 4 (vide supra) takes less computation time than BEM and SAM in obtaining comparable results (Table 3). A comparison of different perturbation operations is given in Table 3. One notices that the method of Selim and Alsultan [6] is slowest in reaching the global optimum, and that the present method converges quite quickly. Comparing the results obtained in this paper, one notices that the proposed algorithm which combines the K-means algorithm and simulated annealing with the modified perturbation method can obtain the optimal solution within quite a short time. The main reason why SAKMC converges faster than SAC is that SAKMC uses the results of the K-means algorithm as initial guess while both SAC and SAKMC adopt the same stopping criterion. As the K-means algorithm is very quick one gets faster convergence with the SAKMC version.

4. Conclusions Cluster analysis based on simulated annealing is a very useful cluster algorithm although it is usually computationally slow. The SAKMC proposed in this paper is more effective than the K-means algorithm and simulated annealing with conventional perturbation methods and is also preferable to hierarchical cluster analysis for classification of Chinese tea samples. A global optimal solution may be obtained by using the algorithm. A feasible stopping criterion and a perturbation method were proposed. The modified algorithm in which the initial assignments of samples were obtained by using the K-means algorithm and the global optimal solution is searched by simulated annealing with modified perturba-

59

tion method can reach the optimal solution in a fairly short time. The present paper uses the minimization of the sum of the squared Euclidian distances as clustering criterion. It is well known that Euclidian distances are suitable only for spherically distributed data sets. Other clustering criteria suitable for different kinds of data sets and for the use of simulated annealing cluster analysis deserve further investigation.

Acknowledgement

This work was supported by the National Natural Science Foundation of China. The authors thank Professor R. Manne for his valuable remarks and encouragement.

References [ll N. Batchell, Cluster analysis, Chemometrics and Intelligent Laboratory Systems, 6 (1989) 105-125. [2] N.E. Collins, R.W. Eglese and B.L. Golden, Simulated annealing - an annotated bibliography, American Journal of Mathematics and Manage Science, 8 (1988) 209-307. [3] J.H. Kalivas, S.N. Robert and J.M. Sutler, Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry, Analytical Chemistry, 61(1989) 2024-2030. [4] J.H. Kalivas, Generalized simulated annealing for calibration sample selection from an existing set and orthogonalization of undesigned experiments, Journal of Chemometrics, 5 (1991) 37-48. [51 L.-X. Sun, Y.-L. Xie, X.-H. Song, J.-H. Wang and R.-Q. Yu, Cluster analysis by simulated annealing, Computers and Chemistry, 18 (1994) 103-108. I61 S.Z. Selim and K. Alsultan, A simulated annealing algorithm for the clustering problem, Pattern Recognition, 24 (1991) 1003-1008. [71 D.E. Brown and C.L. Entail, A practical application of simulated annealing to clustering, Pattern Recognition, 25 (1992) 401-412. [81 G.G. Li and G.L. Cai, Computer Pattern Recognition Technique, Shanghai Traffic University Press, Shanghai, 1986, Ch. 3 (in Chinese). [9] J.T. Tou and R.C. Gonzalez, Pattern Recognition Principles, Addison Wesley, Reading MA, 1974. I101 V. Cerny and S.E. Dreyfus, Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm, Journal of Optimization Theory and Applications, 45 (1985) 41-51.

60

L.-X. Sun et al. /Chemometrics and IntelligentLaboratorySystems25 (1994) 51-60

[ll] S. Kirkpatrick, C.D. Gelatt and M.P. Vecchi, Qptimization by simulated annealing, Science, 220 (1983) 671-680. [12] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller and E. Teller, Equation of state ~l~lations using fast computing machines, Journal of Chemical Physics, 21 (1953) 1087-1092. [131 Q.M. Zhang, RD. Yan, S.J. Tian and L.M. Li, Tbe

comparison of chemical compositions in natural calculus bovis and cultivated calculus bovis, ChineseMedicalherbs, 14 (1991) 15-17 (in Chinese). [14] X.D. Liu, P.V. Espen, F. Adams, S.H. Yan and M. Vanbelle, Classification of Chinese tea samples according to origin and quality by principal component techniques, Analytica ChimicaActa, 200 (1987) 421-430.