- Email: [email protected]

Continuous Optimization

Speed-up simulated annealing by parallel coordinates Hong Ye, Zhiping Lin

q

*

School of Electrical and Electronic Engineering, Nanyang Technological University, Block S2, Nanyang Avenue, Singapore 639798, Singapore Received 8 July 2002; accepted 10 January 2005 Available online 31 March 2005

Abstract High computational cost is an obstacle to the applications of simulated annealing, especially for moderate and high dimensional problems. Inspired by multidimensional visualization techniques, the notion of parallel coordinates system, or parallel coordinates in short, is introduced into the optimization research area to speed up the convergence rate of simulated annealing. Numerical studies demonstrate that the proposed method can converge to global solutions with reduced computational cost in terms of both the number of function evaluations and CPU time. 2005 Elsevier B.V. All rights reserved. Keywords: Simulated annealing; Multidimensional visualization; Parallel coordinates; Global optimization

1. Introduction Simulated annealing is a well-known stochastic global optimization technique for obtaining high quality solutions to diﬃcult optimization problems [1–4]. Its main drawback is the considerable computational eﬀort required. Several variations of the classic simulated annealing have been proposed in the literature to improve the convergence q

This work is supported by the Academic Research Fund, Ministry of Education, Singapore. * Corresponding author. Tel.: +65 6790 6857; fax: +65 6792 0415. E-mail address: [email protected] (Z. Lin).

rate, see e.g. [5,6]. However, high computational cost is still an obstacle to applications of simulated annealing based algorithms, especially for moderate and high dimensional problems. This paper shows how to speed up the convergence rate of simulated annealing by introducing the notion of parallel coordinates, a coordinate system originating from multidimensional visualization [7,8]. The rest of the paper is organized as follows. Section 2 introduces and develops the concept of parallel coordinates. The proposed method of simulated annealing with parallel coordinates is presented in Section 3. Section 4 describes the pseudo-code for implementing the proposed method. The performance of simulated

0377-2217/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2005.01.039

60

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

annealing with parallel coordinates is demonstrated by numerical studies in Section 5. Section 6 concludes the paper.

2. Parallel coordinates Visualization is the process that transforms numerical data into visual representations, which are much easier to be understood. A general tool for visualizing multidimensional data is parallel coordinates [7,8]. In traditional Cartesian coordinates, all axes are mutually perpendicular. The classic method of parallel coordinates represents the d-dimensional data as values on d coordinates parallel to the yaxis equally spaced along the x-axis. A d-dimensional data element X(x1, . . ., xi, . . ., xd) is drawn by locating the xi along the ith coordinate with a marker *. A polygonal line then connects all the markers, as the solid line shown in Fig. 1. Each data element in the d-dimensional space corresponds to a polygonal line in parallel coordinates. The original purpose of using parallel coordinates is to visualize multidimensional data in two dimensions with low complexity to ﬁnd certain features among them. With slight modiﬁcation, this paper introduces parallel coordinates into the optimization research to speed up the convergence rate of simulated annealing by transforming the original search domain into another search domain.

Instead of being connected by a polygonal line, the corresponding values on d coordinates of each d-dimensional data element are linked by a smooth curve, as the dashed one shown in Fig. 1. Using Fourier series, a general function Y(g) is constructed to express the curve as: m X Y ðgÞ ¼ a0 þ ðaj sinðjgÞ þ bj cosðjgÞÞ; ð1Þ j¼1

where a0, aj and bj are variables describing diﬀerent curves; g 2 [0, d 1]; m is a predeﬁned integer. Thus, each smooth curve in parallel coordinates corresponds to a point in the d-dimensional space and can be represented by the function Y(g). We deﬁne xi ¼ Y ði 1Þ m X ¼ a0 þ faj sin½ði 1Þj þ bj cos½ði 1Þjg; j¼1

ð2Þ where i = 1, . . ., d. For an arbitrary given set of values for the variables x1, . . ., xi, . . ., xd, it is clear that the above system of linear equations (2) has at least one solution set a0, a1, b1, . . ., am, bm if (2m + 1) P d. When (2m + 1) < d, the system has no solution in general. However, we can always ﬁnd a solution set a0, a1, b1, . . ., am, bm so that Y(i 1) can approximate xi, for i = 1, . . ., d. In such a case, no curve can pass exactly through all the * markers in Fig. 1.

3. Simulated annealing with parallel coordinates Let f(X) = f(x1, . . ., xi, . . ., xd) be an objective function with d variables in the domain of interest D 2 Rd. The optimization problem is to ﬁnd f ðX Þ ¼ min f ðXÞ: X2D

ð3Þ

3.1. Simulated annealing with general parallel coordinates Fig. 1. Parallel coordinates display of a nine-dimensional data element by a polygonal line (solid) and a smooth curve (dashed).

As explained previously, the set of points {X} in the d-dimensional space becomes a group of smooth

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

curves in parallel coordinates and can be described by a set of functions {Y(g)}. Since each function Y(g) corresponds to a point X in d-dimensional space with the value of f(X), {X} is a vector space of functions and f(X) is a functional depending on function Y(g), denoted by V[Y(g)]. Consequently, the optimization problem becomes functional optimization aimed at ﬁnding a function Y(g) with the minimal value of objective functional V[Y(g)]. There are many standard methods for the functional optimization problem. A common one is the Ritz method [9]. Implementing the Ritz method is made possible by using appropriate Ritz functions. In this paper, Eq. (1) is taken as the Ritz function with (2m + 1) < d. The mathematical formulation for the functional optimization problem is min V ½Y ðgÞ )

Y ðgÞ2fXg

min

Y ði1Þ2fxi g

f ðY ð0Þ; . . . ; Y ði 1Þ;

. . . ; Y ðd 1ÞÞ ¼ fY ðA Þ ¼ min fY ðAÞ: A2DA

ð4Þ

That is, the problem reduces to ﬁnding the best solution A* of the (2m + 1)-dimensional function fY(A), where fY(A) = fY(a0, a1, b1, . . ., aj, bj, . . ., am, bm), DA 2 R2m + 1. The components of A are Ritz coeﬃcients. Minimizing the objective function fY(A) can be done by any global optimization algorithm, such as simulated annealing. After the global solution A* in domain DA is obtained, function Y(g) at A* with minimal value of V[Y(g)] can be presented by Eq. (1), as the dashed curve shown in Fig. 2. As (2m + 1) < d, generally

Fig. 2. Parallel coordinates display of the global solution X* b (dashed). (solid) and the approximate solution X

61

the curve described by Y(g) at A* will not pass exactly through X* in parallel coordinates, but give b which can be calcuan approximate solution X, lated by substituting the value of A* into the system of linear Eq. (2). To improve the accuracy of the solution obtained by the above stage, the next stage of the simulated annealing with parallel coordinates is b as the starting point and further minimize to set X the original objective function f(X) by a local optib is a good approximation mization algorithm. If X to the global solution X*, for example, it is a nearglobal solution or suﬃciently close to X* in domain D, the local optimization algorithm will be able to converge at the global solution. In our proposed method, parallel coordinates is employed to transform the search domain D of original objective function f(X) into another search domain DA of objective function fY(A). Consequently, in stage one, the search by simulated annealing is made in domain DA rather than in domain D. Though search domain and objective function have been changed, the implementation of simulated annealing is the same. For clarity, it is described as follows. Let fY(A) be the function to minimize, A the vector in domain DA and (a0, a1, b1, . . ., aj, bj, . . ., am, bm) its (2m + 1) components. Beginning with a random starting point A0, a succession of points A0, A1, . . ., Ai, . . . is generated toward the global minimum of the cost function fY(A). New candidate points are generated around the current point Ai applying random moves along each coordinate direction in turn, independently from other directions. The step vector v records the maximum increments possible along each direction. A candidate point A 0 with decreased value of objective function is always accepted as Ai+1. Otherwise it is accepted with probability p(DfY) = exp(DfY/ T), where DfY = fY(A 0 ) fY(Ai) and T is called temperature. After Ns cycles of random moves along each coordinate direction, the step vector v is adjusted to maintain the average percentage of accepted moves at about one-half of the total number of moves. A high number of accepted moves with respect to rejected ones means a small diﬀerence in function values compared to the temperature, implying that the function is explored

62

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

with too small steps, therefore the step length should be increased. On the contrary, the step length should be decreased. The simulated annealing begins with a starting temperature T0 high enough given by the user. After NT times of the step adjustments, the temperature T is reduced and the best point reached so far is recorded as A*. A new sequence of moves is made starting from A*, until NT times of the step adjustments are reached again, and so on. The process is stopped when no more useful improvement can be expected, according to a terminating criterion. More detailed information on the implementation of simulated annealing is referred to [1]. The successive points taken in domain DA may appear to be completely random after they are mapped to domain D. Therefore the landscapes of functions fY(A) and f(X) are generally diﬀerent. From previous descriptions, it is clear that when (2m + 1) P d, the curve described by Y(g) at A* can pass exactly through X* in parallel coordinates. Therefore we have fY(A*) = f(X*). In the proposed method, Eq. (1) is taken as the Ritz function with (2m + 1) < d. In this case, generally the b in domain D of the point corresponding point X A* in domain DA is only an approximation to b is a good approximation, we have X*. If X b f ðX Þ. The accuracy of the solufY ðA Þ ¼ f ð XÞ tion can be further improved in stage two of the proposed method. Compared with original objective function f(X), function fY(A) has fewer unknown variables and a smaller search region. As the computational cost by global optimization

algorithms may grow exponentially with the problem size, reduction on the dimension of the search domain by parallel coordinates provides a means to speed up the convergence rate of simulated annealing. Note that the integer m plays a key role in the trade-oﬀ between convergence rate and quality of solution. A smaller value of m leads to smaller search domain and more improvement on the convergence rate. However, if m is too small and the b may dimension of the given problem is high, X not be a good approximation to the global solution X*. Starting from it, the local optimization algorithm will fail to converge at X*. To maintain b while keeping m small, the method the quality of X of section parallel coordinates is proposed in the next section. 3.2. Simulated annealing with section parallel coordinates Consider an objective function with d variables, where d is not a very small number. Divide the d variables into k sections, each of which has u variables, and leave the remaining variables in the (k + 1)th section, as shown in Fig. 3. The number of remaining variables is rem(d/u), where rem(Æ) means taking the remainder after division. For variables in the sth section (s = 1, . . ., k), construct the Ritz function Ys(g) based on Eq. (1) and form the Ritz coeﬃcient set As(as,0, as,1, bs,1, . . ., as,v, bs,v), where 2v + 1 < u. The variables in the (k + 1)th section remain unchanged. That is, for

Fig. 3. Illustration of section parallel coordinates.

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

the Ritz function Yk+1(g) with Ritz coeﬃcient set Ak+1(xku+1, . . ., xd), let Yk+1(i 1) = xku+i for i = 1, . . ., rem(d/u). Hence f(X) becomes a functional relying on a group of Ritz functions, noted as V[Y1(g), . . ., Ys(g), . . ., Yk(g), Yk+1(g)]. For each Ys(g), g 2 [(s 1)u, su 1]. In succession, minimize the objective functional with unknown Ritz coeﬃcient sets A(A1, . . ., As, . . ., Ak, Ak+1) by simulated annealing. The approximate solution b X b 1; . . . ; X b s; . . . ; X b k; X b kþ1 Þ of X* can be obtained Xð through substituting the value of As into corresponding Ritz function Ys(g). It then acts as the starting point for stage two of the proposed method. The ratio n of saving on dimensions by section parallel coordinates is: n¼

½u ð2v þ 1Þk 100%; d

63

Step 5. Start from the point A0, minimize the objective function fY(A) by simulated annealing until the terminating criterion is met. Step 6. Record the best solution A* found in domain DA. b Step 7. Calculate the corresponding solution X in domain D of A*. Stage two: local minimization on f(X) b be the starting point. Apply a Step 8. Let X local optimization method to minimize the objective function f(X). Step 9. Record the solution as the ﬁnal solution X* and its objective function value f(X*). Detailed pseudo-code for the classic simulated annealing phase is referred to [1].

ð5Þ

which is k times over the ratio of general parallel coordinates. Note that the general parallel coordinates is a special case of section parallel coordinates with u = d, v = m and k = 1. The saving on computational cost is proportional to the ratio n. For example, if d = 27, u = 9, v = 3 and k = 3, from Eq. (5), we have n = 22.22%.

4. The algorithm For clarity of exposition, the pseudo-code of simulated annealing with section parallel coordinates is given as follows:

5. Test functions and results 5.1. Test functions Quasi-Rosenbrock function. The Rosenbrock test function [1] with ﬁxed global solution at 1 on each dimension is notorious in optimization benchmarks, because a small change in variables can give unpredictably large changes in value of the objective function. To assess the performance of the proposed method, a quasi-Rosenbrock function with d dimensions is designed as follows: f ðXÞ ¼ f ðx1 ; . . . ; xi ; . . . ; xd Þ ¼

d1 X

ð100ððxkþ1 þ 1 pkþ1 Þ ðxk þ 1 pk Þ2 Þ2

k¼1

Initialization Step 1. Set the values of k, u and v. Step 2. Generate a random starting point A0 in domain DA. Step 3. Construct the objective function fY(A) by section parallel coordinates. The original objective function f(X) with d dimensions in domain D becomes the objective function fY(A) with [(2v + 1)k + rem(d/u)] dimensions in domain DA. Stage one: simulated annealing on fY(A) Step 4. Set control parameters and terminating criterion of simulated annealing.

þ ð1 ðxk þ 1 pk ÞÞ2 Þ;

ð6Þ

where P(p1, . . ., pi, . . ., pd) is the global solution which is known a priori and can be set arbitrarily. The unconstrained optimization problem is deﬁned by minimizing the quasi-Rosenbrock function. The optimal function value is zero. Quasi-COST function. The COST test function [10] with ﬁxed global solution at 2.9035 on each dimension has at least 2d local minima. To assess the performance of the proposed method, a quasi-COST function with d dimensions is deﬁned by:

64

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

• terminating criterion . Simulated annealing is and terminated when jfn fntj 6 (fn fopt) 6 , where t = 1, . . ., N, fn is the function value at nth temperature reduction and fopt is the current smallest function value.

f ðXÞ ¼ f ðx1 ; . . . ; xi ; . . . ; xd Þ d X ¼ ððxk 2:9035 pk Þ4 k¼1

16ðxk 2:9035 pk Þ2 þ 5ðxk 2:9035 pk ÞÞ:

ð7Þ

The unconstrained optimization problem is to minimize the quasi-COST function with P(p1, . . ., pi, . . ., pd) as the global solution. 5.2. Test conditions The version of simulated annealing used in this study comes from [1]. As suggested by Cornan et al. [1], the values of some control parameters for simulated annealing are set as: • step variation Ns = 20, where Ns is the number of cycles of random moves along each coordinate direction, after which the step length is adjusted to maintain the average percentage of accepted moves at about one-half of the total number of moves; • temperature reduction NT = max(100, 5d), where NT is the number of the step adjustments after which the temperature is reduced and d is the number of variables in the objective function; • varying criterion ci = 2 (i = 1, . . ., d), where ci is the coeﬃcient used to adjust the step length along each direction; • number of successive temperature reductions to test for termination N = 4, where N is the number of times the tolerance for termination on function value at each temperature reduction must be achieved before end. The remaining control parameters to be adjusted and listed with test results for diﬀerent cases are: • starting temperature T0; • temperature reduction coeﬃcient tr, used to control the cooling rate of simulated annealing. The temperature reduction equation is Tn+1 = tr · Tn, where n is the index denoting successive temperature reductions;

The three control parameters are vital to the performance of simulated annealing. Among them, the starting temperature T0 is more critical than the other two because if it is not high enough the algorithm would be easily blocked in local minima, regardless of the settings for tr and . Experience has shown that the value for tr should be between 0.8 and 0.99 [11]. As observed in our empirical studies, when the starting temperature T0 is not high enough, the global minimum is hard to reach even when tr = 0.995. Note that much greater computational eﬀort is required for higher value of tr. Terminating criterion is the tolerance on function value for termination of the algorithm when it ceases to make progress. The tests for simulated annealing with parallel coordinates are made on the quasi-Rosenbrock functions with 9, 18, 27, 56 and 82 variables and the quasi-COST functions with 30 and 50 variables respectively. The results are compared with those of classic simulated annealing as well as the quasi-Newton method [12], a local optimization algorithm. All the test results are carried out under the MATLAB environment on IBM-PC/Pentium compatible machines with 2.4 GHz clock speed. For the comparison of the proposed method and classic simulated annealing, starting points are generated as follows: Firstly, generate a uniformly distributed random point A0 in domain DA. Secondly, map point A0 to point X0 in domain D based on Eq. (2). A0 and X0 are the starting points for the proposed method to search in domain DA and for the classic simulated annealing to search in domain D respectively. Because the landscapes and the number of variables of functions fY(A) and f(X) are diﬀerent, there is no absolute equivalence of the starting points in domains DA and D. However, by parallel coordinates, the starting point A0 in domain DA corresponds to a unique starting point X0 in dob denote main D and hence fY(A0) = f(X0). Let X

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

65

Table 1 The global minima P and the starting points of the quasi-Rosenbrock functions No. of variables: 9

No. of variables: 18

9

18

P 2 [2, 2] A0 2 [0, 1000]7

P 2 [2, 2] A0 2 [0, 1000]14

No. of variables: 27 27

P 2 [2, 2] A0 2 [0, 1000]21

No. of variables: 56 56

P 2 [2, 2] A0 2 [0, 200]44

No. of variables: 82 P 2 [2, 2]82 A0 2 [0, 100]64

X0: generated from A0 Table 2 The global minima P, the optimal function values f* and the starting points of the quasi-COST functions No. of variables: 30

No. of variables: 50

P 2 [2, 2]30 A0 2 [0, 5]24 f* = 2.35E3

P 2 [2, 2]50 A0 2 [0, 5]40 f* = 3.9166E3

domly with uniform distribution on the intervals listed in Table 1. The global minima P of the quasi-COST functions, the optimal function values f* and the starting points are listed in Table 2 for the 30 and 50 dimensions respectively. 5.3. Test results

X0: generated from A0

the point in domain D that corresponds to the global solution A* obtained in domain DA. It follows b f ðX Þ where X* is the global that fY ðA Þ ¼ f ð XÞ solution in domain D. Hence, it makes sense to use starting points in both domains DA and D in the above manner. The global minima P and the starting points of the quasi-Rosenbrock functions with 9, 18, 27, 56 and 82 variables respectively are generated ran-

5.3.1. Test results on quasi-Rosenbrock function Quasi-Rosenbrock function with nine variables. Apply the method of general parallel coordinates. Set m = 3 therefore A is a seven-dimensional variable vector. Detailed test results of a single run are reported in Table 3. For both simulated annealing with and without parallel coordinates, set the starting temperature T0 = 1E6 and reduction coeﬃcient tr = 0.9. Due to the approximation of solution caused by the condition (2m + 1) < d to the Rize function Y(g),

Table 3 Tests on quasi-Rosenbrock function with nine variables Method: Simulated annealing (SA), no. of variables: 9

Distance to global solution

Final function value

No. of function evaluations

No. of iterations

8.7223 8.0999E8 8.0999E8

334,000 648 334,648

167 47 214

530000 908 530,908

265 69 334

560,000

280

5323

409

SA with parallel coordinates

Control parameters: T0 = 1E6, = Stage one: SA Stage two: Local optimization Final result CPU (in seconds)

Classic SA

Control parameters: T0 = 1E6, = 1E1, tr = 0.9 SA 4.3978E3 4.3217E3 Local optimization 2.0572E3 2.0249E3 Final result 2.0572E3 2.0249E3 CPU (in seconds) 306.3152 + 0.1812 = 306. 4964 Control parameters: T0 = 1E6, = 1E6, tr = 0.9 Final result 1.3012E4 3.9211E8 CPU (in seconds) 325.767

Ratio of saving on dimension: n = 22.22% Ratio of saving on no. of function evaluations: 40.24% Ratio of saving on CPU time: 39.2% Quasi-Newton method

1E1, tr = 0.9 2.8857 2.8782E4 2.8782E4 197.443 + 0.339 = 197.782

4.1545E4

6.1919E4

66

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

Table 4 Tests on quasi-Rosenbrock function with 18 variables Method: Simulated annealing (SA), no. of variables: 18

Distance to global solution

SA with parallel coordinates

Control parameters: T0 = 1E6, = Stage one: SA Stage two: Local optimization Final result CPU (in seconds)

Classic SA

Control parameters: T0 = 1E6, = 1E6, tr = 0.9 Result 1.9933 CPU (in seconds) 668.064 Control parameters: T0 = 1E7, = 1E6, tr = 0.9 Final result 2.7058E4 CPU (in seconds) 788.853

Ratio of saving on dimension: n = 22.22% Ratio of saving on no. of function evaluations: 42.12% Ratio of saving on CPU time: 11.8% Quasi-Newton method

1E1, tr = 0.9 4.1178 4.9044E4 4.9044E4 694.98 + 0.187 = 695.167

4.8201

after stage one, the function value of 8.7223 is the best which the simulated annealing with parallel coordinates can achieve and is the global minimum in domain DA. Thus the terminating criterion = 1E1 is suﬃcient. Because the corresponding point in domain D of the solution obtained during stage one is a near-global solution, local optimization by the quasi-Newton method [12] in stage two can help to improve the result and ﬁnd the global minimum at 8.0999E8. For classic simulated annealing, set the same terminating criterion = 1E1 followed by the quasi-Newton method for local search, it fails to converge at the global minimum. This implies that the search is prematurely terminated and the solution at the terminating criterion is still too far from the global minimum. Therefore it is necessary to set = 1E6 to achieve a comparable accuracy. As shown in Table 3, the savings on the number of function evaluations and CPU time by the proposed method compared with the classic simulated annealing are 40.24% and 39.2% respectively. Note that the quasi-Newton method by itself fails to reach the global minimum beginning at the same starting point in domain D. Table 11 shows the test results averaged over 10 runs beginning with random starting points. Quasi-Rosenbrock function with 18 variables. Apply the method of section parallel coordinates.

Final function value

No. of function evaluations

No. of iterations

42.7385 7.5907E8 7.5907E8

358,000 1922 359,922

179 87 266

3.9866

580,000

290

1.8418E7

608,000

304

4697

247

85.5313

Set k = 2, u = 9, v = 3 and therefore A is a 14dimensional variable vector. The test results for one run are reported in Table 4. For both simulated annealing with and without parallel coordinates, ﬁrstly set starting temperature T0 = 1E6. The simulated annealing with parallel coordinates can converge to the optimal solution while the classic simulated annealing cannot converge, even with other settings for tr and . After improving the starting temperature to T0 = 1E7 for the latter, the classic simulated annealing converges to 1.8418E7. The savings on the number of function evaluations and CPU time are 42.12% and 11.8% respectively. Test results averaged over 10 runs beginning with random starting points are shown in Table 11. Quasi-Rosenbrock function with 27 variables. Apply the method of section parallel coordinates. Set k = 3, u = 9 and v = 3. Therefore A is a 21dimensional variable vector. Detailed test results of a single run are reported in Table 5, while Table 11 shows the results averaged over 10 runs beginning with random starting points. Quasi-Rosenbrock function with 56 variables. Apply the method of section parallel coordinates. Set k = 6, u = 9 and v = 3. Therefore A is a 44dimensional variable vector. Detailed test results for one run are reported in Table 6.

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

67

Table 5 Tests on quasi-Rosenbrock function with 27 variables Method: Simulated annealing (SA), no. of variables: 27

Distance to global solution

Final function value

SA with parallel coordinates

Control parameters: T0 = 1E6, = 1E1, tr = 0.9 Stage one: SA 5.0491 54.5202 Stage two: Local optimization 2.1841E4 1.7295E7 Final result 2.1841E4 1.7295E7 CPU (in seconds) 1.1378E3 + 0.327 = 1.1382E3

Classic SA

Control parameters: T0 = 1E6, = 1E6, tr = 0.9 Result 3.8312E3 CPU (in seconds) 1.7268E3 Control parameters: T0 = 1 E9, = 1E6, tr = 0.9 Final Result 2.8318E4 CPU (in seconds) 1.5364E3

Ratio of saving on dimension: n = 22.22% Ratio of saving on no. of function evaluations: 60.27% Ratio of saving on CPU time: 25.9% Quasi-Newton method

3.7599E3

2.6429E7

No. of function evaluations

No. of iterations

371,700 3804 375,504

177 122 299

1,104,300

409

955,800

354

401

2.0924E4

2.1167E6

12,535

Distance to global solution

Final function value

No. of function evaluations

Table 6 Tests on quasi-Rosenbrock function with 56 variables Method: Simulated annealing (SA), no. of variables: 56

SA with parallel coordinates

Control parameters: T0 = 1E15, = 1E1, tr = 0.9 Stage one: SA 6.8433 1.5563E3 Stage two: Local optimization 2.0807E4 4.8523E8 Final result 2.0807E4 4.8523E8 CPU (in seconds) 1.6505E4 + 1.074 = 1.6507E4

Classic SA

Control parameters: T0 = 1E15, = 1E6, tr = 0.9 Result 5.136E8 Control parameters: T0 = 1E35, = 1E1, tr = 0.9 SA 1.0181E6 1.0283E8 Local optimization 1.9933 3.9866 Final result 1.9933 3.9866 CPU (in seconds) 2.2889E4 + 4.862 = 2.2894E4 Control parameters: T0 = 1E35, = 1E6, tr = 0.9 Final result 9.6226E5 3.1118E7 CPU (in seconds) 2.3655E4

Ratio of saving on dimension: n = 21.43% Ratio of saving on no. of function evaluations: 67.59% Ratio of saving on CPU time: 30.2% Quasi-Newton method

7.3419

Due to the approximation of solution caused by the condition (2v + 1) < u to the Ritz function, after stage one, the function value of 1.5563E3 is the best achieved by the proposed method. There

53.4892

No. of iterations

1,680,800 9077 1,689,877

382 151 533

5,084,800 54,071 5,138,871

908 898 1806

5,213,600

931

16,937

281

is no more improvement on the result even with higher starting temperature T0. Thus the terminating criterion = 1E1 is suﬃcient. Because the distance to global solution is only 6.8433, it is a

68

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

near-global solution. Therefore after applying the quasi-Newton method in stage two, the ﬁnal result converges at 4.8523E8. For classic simulated annealing, it is necessary to set = 1E6 to achieve the global solution. As shown in the table, the classic simulated annealing fails to converge at T0 = 1E15. Until the starting temperature is increased to T0 = 1E35, it converges at the function value of 3.1118E7. Note that using the same terminating criterion = 1E1 as the proposed method followed by quasi-Newton method for local search, classic simulated annealing cannot converge to the global minimum. The savings on the number of function evaluations and CPU time are 67.59% and 30.2% respectively. Table 11 shows the test results averaged over 10 runs beginning with diﬀerent starting points. Quasi-Rosenbrock function with 82 variables. Apply the method of section parallel coordinates. Set k = 9, u = 9, v = 3 and therefore A is a 64dimensional variable vector. Detailed test results for a single run are reported in Table 7. Table 11 shows the test results averaged over 10 runs beginning with diﬀerent starting points. Numerical studies on quasi-Rosenbrock function with diﬀerent dimensions indicate that, with parallel coordinates, the requirement on the starting temperature of simulated annealing is lower

than that without parallel coordinates and its inﬂuence on solution accuracy is less. Together with the saving on the size of search domain, they make noteworthy contribution to the improvement on the convergence rate of simulated annealing. 5.3.2. Test results on quasi-COST function Quasi-COST function with 30 variables. Apply the method of section parallel coordinates. Set k = 3, u = 9, v = 3 and therefore A is a 24-dimensional variable vector. Detailed test results for one run are reported in Table 8, while Table 11 shows the test results averaged over 10 runs beginning with random starting points. Table 9 shows the impact of choosing diﬀerent sets of values for k, u and v. Among the three, u and v are more critical and k can be calculated by k ¼ IntðduÞ, where Int(x) means taking the integer part of x. In the ﬁrst case of k = 3, u = 9 and v = 1, the dimension of the optimization problem is only 12 and the computational costs on number of function evaluations and CPU time are low. However, the solution obtained during stage one is too far from the global solution as a starting point for a local optimization algorithm to globally converge in stage two. In the second case of k = 3, u = 9 and v = 2, the dimension of the optimization problem is 18. The solution ob-

Table 7 Tests on quasi-Rosenbrock function with 82 variables Method: Simulated annealing (SA), no. of variables: 82

Distance to global solution

Final function value

SA with parallel coordinates

Control parameters: T0 = 1E10, = 1E1, tr = 0.9 Stage one: SA 8.9122 4.3251E3 Stage two: Local optimization 3.0077E4 1.1847E7 Final Result 3.0077E4 1.1847E7 CPU (in seconds) 3.9792E4 + 2.3769 = 3.9794E4

Classic SA

Control parameters: T0 = 1E10, = 1E6, tr = 0.9 Result Control parameters: T0 = 1E80, = 1E6, tr = 0.9 Final result 1.5418E4 CPU (in seconds) 9.4134E4

Ratio of saving on dimension: n = 21.9% Ratio of saving on no. of function evaluations: 88.74% Ratio of saving on CPU time: 57.73% Quasi-Newton method

8.9823

No. of function evaluations

No. of iterations

1,747,200 22,744 1,769,944

273 264 537

4.0158E7

1.5719E7

1917

80.5882

30,455

4.1903E4

353

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

69

Table 8 Tests on quasi-COST function with 30 variables Method: Simulated annealing (SA) no. of variables: 30

Distance to global solution

Final function value

SA with parallel coordinates

Control parameters: T0 = 1E3, = 1E1, tr = 0.95 Stage one: SA 2.1119 2.2325E3 Stage two: Local optimization 1.8174E4 2.35E3 Final result 1.8174E4 2.35E3 CPU (in seconds) 2.6771E3 + 0.031 = 2.6772E3

Classic SA

Control parameters: T0 = 1E3, = 1E6, tr = 0.95 Final result 2.0007E4 CPU (in seconds) 3.5588E3

Ratio of saving on dimension: n = 20% Ratio of saving on no. of function evaluations: 59.2% Ratio of saving on CPU time: 24.7% Quasi-Newton method

2.35E3

1.841E3

25.6284

No. of function evaluations

No. of iterations

561,600 271 561,871

234 8 242

1,380,000

460

2134

61

Table 9 Comparisons of diﬀerent sets of values for k, u and v on quasi-COST function with 30 variables Method: Simulated annealing (SA), no. of variables: 30

Distance to global solution

Final function value

No. of function evaluations

No. of iterations

Control parameters of SA: T0 = 1E3, = 1E1, tr = 0.95 SA with parallel coordinates

No. of variables: 30D ! 12D (k = 3, u = 9, v = 1) Stage one: SA 10.4595 1.7536E3 Stage two: Local optimization 9.7866 2.2651E3 Final result 9.7866 2.2651 E3 CPU (in seconds) 845.6971 + 0.1812 = 845.8783

446,000 1516 447,516

223 45 268

No. of variables: 30D ! 18D (k = 3, u = 9, v = 2) Stage one: SA 2.4484 2.1707E3 Stage two: Local optimization 1.8589E4 2.35E3 Final result 1.8589E4 2.35E3 CPU (in seconds) 1.3115E3 + 9.5453 = 1.3210E3

448,000 150,004 598,004

224 4837 5061

No. of variables: 30D ! 24D (k = 3, u = 9, v = 3) Stage one: SA 2.1119 2.2325E3 Stage two: Local optimization 1.8174E4 2.35E3 Final result 1.8174E4 2.35E3 CPU (in seconds) 2.6771E3 + 0.031 = 2.6772E3

561,600 271 561,871

234 8 242

tained during stage one is close enough to the global solution for a local optimization algorithm to globally converge. However the computational costs are increased. In the third case of k = 3, u = 9 and v = 3, the dimension of the optimization problem is 24 and consequently the computational

costs are the highest among the three cases under comparisons. Note that the quality of the solution obtained during stage one is high therefore the computational costs in stage two are very low compared with the ﬁrst two cases. These results demonstrate the trade-oﬀ between the convergence

70

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

Table 10 Tests on quasi-COST function with 50 variables Method: Simulated annealing (SA), no. of variables: 50

Distance to global solution

Final function value

No. of function evaluations

No. of iterations

SA with parallel coordinates

Control parameters: T0 = 1E5, = 1E1, tr = 0.95 Stage one: SA 2.1147 3.7913E3 Stage two: Local optimization 2.3524E4 3.9166E3 Final result 2.3524E4 3.9166E3 CPU (in seconds) 9.5220E3 + 0.4641 = 9.5225E3

1,312,000 1176 1,313,176

328 22 350

Classic SA

Control parameters: T0 = 1E5, = 1E6, tr = 0.95 Final Result 2.3256E4 CPU (in seconds) 1.0893E4

3.9166E3

2,775,000

555

3.8476E3

3314

48

Ratio of saving on dimension: n = 20% Ratio of saving on no. of function evaluations: 52.68% Ratio of saving on CPU time: 12.58% Quasi-Newton method

37.4798

Table 11 Test results averaged over 10 runs for each problem Methods

Quasi-Rosenbrock function SA with parallel coordinates

Classic SA

Quasi-COST function SA with parallel coordinates Classic SA

No. of variables

Mean of distance to global solution

Mean of ﬁnal function value

Mean of no. of function evaluations

9 18 27 56 82

7.1952E5 3.2034E4 2.3940E4 3.4296E4 6.4279E4

1.5413E8 1.2127E7 6.8241E8 3.0015E7 4.3364E7

337,795 352,110 377,860 1.8568E6 1.7724E6

9 18 27 56 82

1.0159E4 1.4846E4 1.1736E4 5.2331E5 4.3014E4

3.7331E8 1.5139E7 8.8214E8 4.9234E7 1.7446E7

563,600 616,500 1.2177E6 5.1912E6 1.5801E7

30 50

2.3160E4 2.8212E4

2.35E3 3.9166E3

549,719 1,304,000

255.6667 346.21

2.3540E3 9.3852E3

30 50

2.1484E4 2.7293E4

2.35E3 3.9166E3

1,379,625 2,773,125

459.875 554.625

3.3921E3 1.1298E4

rate and the quality of solution. Reasonable values for u and v, as observed from our numerous empirical studies, are u = 9 and v = 3. Quasi-COST function with 50 variables. Apply the method of section parallel coordinates. Set k = 5, u = 9, v = 3 and therefore A is a 40-dimensional variable vector. Detailed test results for a

Mean of no. of iterations 213.8 255.3333 275.25 638 455.25 281.8 308.25 451.26 927.13 1926.15

Mean of CPU (in seconds) 285.9208 618.2301 1.1051E3 1.8217E4 3.7282E4 501.7438 992.2029 2.2197E3 2.1718E4 9.5653E4

single run are reported in Table 10. Table 11 shows the test results averaged over 10 runs beginning with diﬀerent starting points. The diﬀerence of the requirement on the starting temperature for the proposed method and classic simulated annealing is not observed in the numerical studies on quasi-COST function. A pos-

H. Ye, Z. Lin / European Journal of Operational Research 173 (2006) 59–71

sible reason is that the starting points are generated from much smaller intervals compared with those for quasi-Rosenbrock function. 6. Conclusions In this paper, the notion of parallel coordinates is introduced into the optimization research to transform the whole search domain into a smaller one with lower dimensions. It provides a means to speed up the convergence rate of simulated annealing for moderate and high dimensional optimization problems. Numerical studies have demonstrated the good performance of the proposed method. Our work presented here is a successful attempt to speed up simulated annealing by parallel coordinates. It can be applied to other global optimization algorithms as well.

[2]

[3]

[4]

[5] [6]

[7]

[8]

Acknowledgment [9]

The authors would like to thank the anonymous reviewers for many helpful comments which have improved the presentation of the paper considerably.

[10]

[11]

References [12] [1] A. Corana, M. Marchesi, C. Martini, S. Ridella, Minimizing multimodal functions of continuous variables with the

71

simulated annealing algorithm, ACM Transactions on Mathematical Software 13 (3) (1987) 262–280. T.M. Alkhamis, M.A. Ahmed, V.K. Tuan, Simulated annealing for discrete optimization with estimation, European Journal of Operational Research 116 (3) (1999) 530– 544. K. Steinho¨fel, A. Albrecht, C.K. Wong, Two simulated annealing-based heuristics for the job shop scheduling problem, European Journal of Operational Research 118 (30) (1999) 524–548. A. Antunes, D. Peeters, On solving complex multi-period location models using simulated annealing, European Journal of Operational Research 130 (1) (2001) 190– 201. H. Szu, R. Hartley, Fast simulated annealing, Physical Letters A 122 (3) (1987) 157–162. L. Ingber, Adaptive simulated annealing (ASA): Lessons learned, Journal of Control and Cybernetics 25 (1) (1996) 33–54. W. Schroeder, K. Martin, B. Lorensen, The Visualization Toolkit, Prentice Hall PTR, Upper Saddle River, NJ, 1998, pp. 400–401. A. Inselberg, B. Dimsdale, Parallel coordinates: A tool for visualizing multi-dimensional geometry, Proceedings of the First IEEE Conference on Visualization (1990) 361– 378. K.D. Hjelmstad, The Ritz method of approximation, in: Fundamentals of Structural Mechanics, Prentice-Hall, Upper Saddle River, NJ, 1997, pp. 157–186. Y. Yao, Dynamic tunneling algorithm for global optimization, IEEE Transactions on System, Man, and Cybernetics 19 (5) (1989) 1222–1230. E.H.L. Aarts, J. Korst, Simulated Annealing and Boltzmann Machines, John Wiley & Sons, Inc., New York, 1989. T.F. Coleman, M.A. Branch, A. Grace, Optimization Toolbox for Use With MATLAB. Users Guide, Version 2, The Math Works Inc., Natick, MA, 1999.