- Email: [email protected]

10 Dwember 1982

INFORMATXON PROCESSING LETI’ERS

A NOTE ON UPPER BOUNDS FOR THE SELECTION PROBLEM Tatsuya MOTOISI department of ~nf~~ati~n Science, Facdty

ofEngineering lbaraki animosity, 4 - 12 - I Nakana~a~a,

Hitacki 316, Japan

Received I March 1982; revised version received 20 July 1982

Keyvodr: Minimum comparison selection, worst-case analysis, optinlum sorting, computational complexity

A~rding to Knuth [$I, the history of the selection problem goes back to Dodgson [2] who pointed out in 1883 that the second best player often loses the second prize in lawn-tennis toumaments; about 1930 Hugo Steinhaus posed the problem of finding the ~nimum number of tennis matches required to select the first- and second-best players from n ~ntestants, assuming a transitive ranking; and now the Steinhaus problem is generalized to our selection problem of finding the worst-case ~~rnurn number of comparisons Vi(n) required to select the ith largest from n distinct numbers. By s~et~, we have

Vi(n) = Ve i+ i(n)Thus we may assume that 1 s i s f frill . Throu~out this paper log, is denoted by lgMany people have m,Bdeefforts to give good upper/lower bounds for the problem. AS for lower bounds,

Kirkpatrick f5] unified the theory; he showed that [$(IJn+i+ Vi(n) 2

n+i-3+

1)J C osj+i_,llg

if +n

‘1

l),

if i a; in,

(1)

i and thit the result surpasses other results (see [5] for a detailed account of lower bound theory). As [email protected] bounds, in their classical paper Hadian and Sobel[3] showed that ’

Vi(n) I; n - i f (i - l)[lg(n - i + 2)l .

(2)

Several refinements (e.g., ;4,8]1)were done by constructing variants of the Hadian-Sobel algorithm, but these are all essentially small improvements on (2); the Hadian-Sobel algorithm and its variants need O(n lg n) comparisons when i - in. Until 1972 it was not known whether the selection problem inherently needs O(n lg n) comparisons; r”inally,Blum et al, f I] obtained the n) upper bound; further study, due to Schonhage et al. [73, led to a much sharper upper bound for i = [

Vf n,q (4

5

3n + o(n).

Since Schtinhage et al.‘s scheme can be easily generalrlxed for general values af i, we obtain Vi(n) s 3n + o(n) 214

for every i,

(3) 0020~~0190/82/0000-ocroo/%o2.75 @ 1982 North~HoIhM

Volume 15,

tNFORMATION PROCESSING LETTERS

Number5

IODecember

1982

Let us now consider a question: For what values of i can (3) be asymptotically surpassed? Blum et al. [ I] considered the similar question for their (vn + o(n))-algorithm and obtained the result lim sup ndoo

.

Vl9(*- rd + r(n) n

;+

Sl+

i

lg$

1

q

forO

(4)

where p = 0.203688’. Thus their (qn + o(n))-algorithm can be surpassed for i < pn. How abc;lt the case of Schilnhage et al.? After a rough comparison of (2) and (3), the Hadian-Sobel algorithm is seen to give a better upper bound than that of Schanhage et al. 171only for very small values of i, i.e., for i < 2n/lg n. But considerhg Blum et al.% [I] result, this ought to be considerably improved. In rL’, uw 9..ote we generalize (4) by applying Blum et al.% scheme [I) to a general (cn + o(n))-algorithm, and show that thb generalized result can be surpassed when we apply the scheme to the generalized Schonhage et al.‘s (3n + &))-algorithm [7:. In addition we show that there exists an asymptotically optimal selection algorithm provided that i = o(n). Explicitly, our results are the following: (i) If there exists a (cn + o(n))-algorithm for selection,

I1 lim sup

n-roe

Viq(n-QJ + I(n) ~ n

I

1 +

if q = 0, (c - ]),2

I

C-l lhdc4)/@?)1+4c9 * k-

c-

4cq

1

ifO

if -

C

4c

c-

1

4c *19(+.

(ii) For the Schonhage selection algorithm, c = 3 and (i) may be sharpened to

lim sup n-+czzi

Vls(n-1)J +1(n) < I n

i

1 1+1- ’ “it ‘Asq)1+ 5q 1lg l/( Sq)]

if q = 0,

3

if flqSf.

if 0 < q < +,

(iii) If i = o(n), Vi(n) = n -t o(n).

In this section we introduce an algorithm, due to Blum et al. [I], which asymptotically surpasses &h&&age et aI.‘s (71 for i < in. The contestants constitute a totally ordered set, but the order is initially not known; at any stage the algorithm’s knowledge of inequality relations between contestants is given by a partial order or equivalently a Hasse diagram; for example,

b

indicates that b < a, d < a and d c c. The f&lowing is an algorithm obtained from Blum et al. 11)by interchanging Step 1 and Step 2. Algorlthn SELE~‘~~A, p). (This algotithn selects the ith largest from n distiwct elements. This contairr;;two parameters p and A which will be chosen later; p is any positive constant and A is any algorithm for solving the selection problem.) 215

10December1982

INFORMATIONPROCWSINGLETTERS

Volume15,Number5

Step 1. If i r pm, then select the i th largest by using algorithm A. Sfep 2. Partition n elements into ! in.1 pairs and possibly one leftover, and compare each pair,

Step 3. Select the ith largest m from ! f n] larger elements of Step 2 by using (recursively) SEI_ECT[&p],

Step 4. Discard all elements known to be smaller than m, since these cannot be the ith largest. Step 5. Select the ith largest from the remaining elements by using algorithm A.

3. Analysis In this section we investigate how SELECT[,A, p] can surpass a general-(cn + o(n))-algorithm. As a matter of convenience, we introduce a notation V(q)=!“, ‘Ikor~m

sup

vls(n- rjJ+ 16) n -- , O

1. isf there exists a (cn + o(n))-algorithm

(5)

for selection, called PICK(C),then SELFCT[PICK(C),

(c - I)/&] brings abotrt an upper bound,

if q=O,

1

V(q) 5

\ t

1 + (c - 1j/$N-

if0 < q < (c - 1)/W),

- 1)/(4cq)l

I)/CJcs:l+$+[lg(c

if (c - 1)/(4c) I q s 4,: - __

C

Roof. Let p be any positive number, and let Q(i, n, p) be the worst-case number of comparisons required in SELECT[PICK(C),~]. Obviously Q(i, n, p) = cn + o(n)

for i > pn.

(6)

For i < pn, since SELECT[PICK(C), p) is recursively called t = [ lg pn;q times and for each jth calling, 1 r; j s t, the number of contestants is ln/2’J , Q(i, n, p) 5 &K!‘l

+ 41.~/2’] + o(n/2’) + t(c(2i) + o(i)) I=n - n/2’ +, cn/2’ + 2cti + o(n) .

step2

=n+(c216

. step5

step1

l)n/2”+2cti+o(n). .

(7)

Volume 15, Number 5

INFORMATION PROCESSING LETTERS

10 December 1982

By combining (6) and (7) we obtain

Q(i, n, P) s

en + o(n) n+(c-

l)n/2r’gpn/i1

if i 2 pi;. ificpn.

+2c[lgpn/ili+u(n)

(8)

Thus from (S), (8) and the fact that Vi(n) I; Q(i, n, p), it follows that if q = 0, ‘gp’ql +2cqIgp/q

(9)

ifO

Setting p = (c - 1)/(4c) gives the theorem.

o

It should be noted that setting p = (c - 1)/(4c) minimizes the right-hand side of (9); thus, we cannot obtain a better upper bound from (9). To see this, let F(p, q) be the right-hand side of (9), and remember that p is an arbitrary positive number in (9). For q = 0 the proposition is obviously true. For q > 0, since [ lg p/q1 varies over {1, 2, 3,. . . ) with the p’s varying over @ I q < p}, and since q > p implies that F(p, q) = c = 1 + (c - 1)/2O + 2cq 0, l

we obtain $

>

F(p, q) = min( I+ (c - 1)/2’+2ckqtk=O,

1,2 ,... ).

Now, by an elementary analysis, min( 1 + (c - l)/2k + 2ckq I k = 0, 1,2,. . .) = 1 + (c - l)/2k + 2ckq if and only if (i) k=Oand

or

959

(ii) k> 1 and (c-

l)/(c.2k+2)sq_<

(c-

I)/(c.~~+‘).

Hence if -sq, ‘G $;F(p,

q) =

1

‘-

l+(c-l)/2k+2ckq

ifk~land(c-I)/(c~2k+2)~q~(c-~)/(c~2k+’) if

C

= 1 f (C _ 1)/2bC-

We have the fdlo$irlg COrOlhUy

corollary to the proof of Theorem

4c

l

5 q, =F[+q).

C-l

W(~q)l+ 2cq Ig 4cq I

C-

1

otherwise i

1.

2. If i = o(n), then Vi(n) = n + o(n).

Proof. Let p be any positive constant. Then for sufficient large values of n, we obtain from (I), (9) and the fact that Vi(n) s Q(i, n, p), .

1+:+;1g

n

z+

o4n) r/n

vi(n)<

1 +(c_

1)/2rkpn,il

I

+zc

Ig

I

E

i

i 1 n+

o(n).

n 217

10 December 1982

INFORMATION PROCESSING LETTERS

Volume 15, Number 5

Thus lim

Vi(n)/n=

n-+ao

1.

D

From (3) and Theorem 1 we obtain if q=O,

1

V(q) r5

I +

2‘-r’g1’(6q~ +6[Ig 1/(6q)]

ifO

Can this be surpassed? Let us now reconsider the proof of Theorem 1. Suppose that p is any positive number, and that GS denotes the generalized Schonhage et al.‘s algorithm. GS is roughly given IL Fig. 1. The initioi step of GS is a pairing step; especially for GS invoked at Step 5 of SELECTEGS,p], the initial pairing step is to form a Hasse diagram,

i-l

But we can save i - 1 comparisons out of these i (or i - 1) comparisons, since after Step 4 of [email protected],p]

‘ciieremaining 2i (or 2i - 1) elements constitute a Hasse diagram,

b

t! m (0) b

b

b

i-l

Thus (7) can be improved in SELECT[GS,p], Q(i, n, p) I

+ 3[n/2*] + o(n/2*) + t(3(2i) - (i - 1) + o(i)) = n + 2’-% + 5ti + o(n).

i[n/2j] .j-i

,

.

I

I

*

Step 5

Step 1

step 2

This leads to the following theorem in the same manner as (7) leads to Theorem 1.

Theorem 3 1 v(q) s

1 + p-r’8

ifq=O, l/G q,1 +5[lg

1/(5q)]

ifflcjS4.

i 3 In

ifo

conclusion, the bounds for V(q), due to Theorem 3 and (l), are illustrated in Fig. 2. Note that for

(zk + 1)i - (2k -i- 1) < n 5 (2”’ + 1)i - 1,O I; k, if 0 5 j < (n - (2k + 1)i + l}/2k, if(n-(2’+ 218

l)i+

1)/2’$;j*i-2.

Volume 15, Number 5

INFORMATION

PROCESSING LETTERS

10 December 1982

(l/2,3) 9 T upper bwtd

0

I

I l/5

I

I

I 112

9

Fig. 2.

Fig. 1.

Thus

= Osjri-2

=

1 k

n-i+ i+j

1 1=

k(i - 1) + [(n - (2k + l)i)/2k] + 1

if (2k f 1)i - 1 I n s (2k+1 +

k(i - 1)

if(2k+l)i-(2k+1)

I)i -

(2k+1 + l), k 2 0, l)-1,kkO.

This leads to a corollary of (l), v(q) 1

[email protected]+q) 1+2-k+(k-.2-k)q

if $SqSi, if1/(2k+‘+l)rq<1/(2k+1),k>1.

Acknowledgment

I wish to thank the referee for his/her kind and detailed comments, and Prof. 2. Takeda for his careful reading. References [I] M. Blum, R.W. Floyd, V. Pratt, R.L. Rivest and R.E. Tarjan, Time bounds for selection, J. Comput. System3 Sci. 7 (1973) 448-461. [2] CL, Dodgson, St. James’s Gazette, August 1 (1883) 5-6.

[31 A. Hadian and M. Nobel, Selecting the tth largest using binary errorless comparisons, Tech. Rept. 121, Dept. of Statist., Univ. of Minneapolis, 1969. VI L. Hyafil, Bounds for selection, SIAM J. Comput. 5 ( 1976) 109- 114. PI D.G. Kirkpatrick, A unified lower bound for selection and set partitioning problems, J. ACM 28 (1981) 150- 165. WI D.E. Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching (Addison-Wesley, Reading. MA. 1973). VI A. Schonhage, M. Paterson and N. Pippenger, Finding the median, J. Comput. Systems Sci. 13 (1976) 184- 199. PI C.K. Yap, New upper bounds for selection, Comm. ACM 19 (1976) 501-508.

219