- Email: [email protected]

16 April

1968)

THIS paper considers multistage games with two participants. When investigating these processes it is usually assumed that the participants pursue opposing interests in the sense of a chosen functional and that the optimal strategies of the sides (pure or mixed) are determined by the saddle point of this functional [ll. In this paper opposition of the interests of the sides is not postulated, but the outcome of the process is estimated from the point of view of one of the participants. The tendency to find the best guaranteed result (the guaranteed estimate [21) for a chosen side leads to the formulation of different minimax problems. The formulations of these problems and the correspondence between them are presented for discrete processes of general form. An algorithm for the numerical calculation of multistage games, using the idea of reselection in the state space [31, is written down for a specific system of Lanchester type equations. 1. Formulation

of the problem

We consider a controlled dynamic process described by the following system of difference equations: z(k+

1) = /(z(k), u(k), P(k)),

k=O,l,...,

N-l.

(1.1)

The vector z = Iz,, . . . , znI belongs to the Euclidean space En and at each instant describes the phase state of the process in which the two sides X and Y participate. At each instant k the control u = lu’, . . . , zfl is chosen by the side X from the compact set U: the control u = lu’, . . . , us1 is chosen by the side Y at each instant k from the compact set V. We assume that the quality of the process is estimated by the side X using the criterion *Zh. uychisl.

Mat. mat. Fiz.,10. 4, 934-941, 1970.

162

163

F. I. Ereshko

N-l

(1.2) f, 0, f, to be continuous.

We suppose the functions initial state of the process

z(O) is fixed,

From the point of view of the discovery result

of (1.2) for this process,

values

of these

We consider

by the side X of the best guaranteed

it is possible

to formulate

games will play the part of guaranteed

of the side X in the process Let the strategy

that the

z(N) is free.

various

estimates

games.

The

of the share

(1.1). (1.2) [al.

of the side X be the choice

of the vector

u = {u(O), u(l),

. . . . u(N - 1) 1, and the strategy of the side Y the choice of the vector v = of the vectors u and v completely MO), . . . . u(N - l)l. The assignment determines

the development

the greatest

of the process,

consequently,

gain of the side X, that is, the greatest

under the worst conditions

for it is defined 01=

Let the strategies

I = I(u, v).

value

Then

of the criterion

(1.2)

as

max min I. , U

(1.3)

u*, v* be such that maxminZ=minI(u”,v)=Z(u’,v”). U V \

The strategy

u* is the best guaranteeing

if the side X chooses increase

u* every deviation

strategy

for the side X, that is,

of the side Y from the strategy

the gain of the side X. We call u* the optimal

programmed

v* only

control

of

the side X. We consider function

u (k, d,

Repeating

the case

the reasoning

of the share

when the strategy

and the strategy given

of the side X is the choice

of Y is the choice

above,

of’ the function

we arrive at a second

guaranteed

of the side X in the process 02 =

and of the best guaranteeing

strategy

max min 1 U(k,:) v(k, I) u*(k, z) (optimal

max minl=~~nI(zz*(k,;),~(~,r))=Z(~*(~,~).UL(kr;)). U(k,2) Vlk,:, 12)

synthesis)

of the

u (k, 2). estimate

164

Multistage

games for Lanchester’s

equations

Finally, a game with the hypothesis of sequential information can be f~ulated as follows: at each instant k let both sides be i~~rned about the current state of the process z(k): the side X adopts the decision d(k), and this decision becomes known to the side Y. Then the greatest guaranteed result which the side X can obtain is given by o3 = max min max min.. . max min I. a(N---I) c(N-1) UP) ufOt U(I) m(i) It is obvious that all the guaranteed estimates initial state of the process z(O). It can be shown

ol, 02, w3 are functions of the

[41that

Moreover, we notice that max min I = max min 1. utk, 4 y ufk, 4 tik, 2) This implies that the use by the side X of a synthesis can lead to an increase of the guaranteed result in comparison with the case of the programmed control. In what follows we will interest ourselves in the problems of finding the guaranteed estimate a3. For the situation z(k) we now define an (N- k)-stage game with the hypothesis of sequential information and the quality criterion N-I

Ik=

r,

fi(Z(i), u,u)+ff)(z(N)).

i-=k

We introduce a similar definition for the point z(k + 1) with the quality criterion N-1 I”“+’

= ZJ i=k+l

Then the guaranteed estimates connected by the relation

cl&Yk (z(k)

)=

fi(Z(i),

c,$-’

:Fy;)n[fk

%

VI-t

(z(k)),

@b(W)*

~3~-‘-i(z(k

+ oN-k-l(Z(k+

1))l.

-j- 1))

are

(1.5)

F. 1. Ereshko

2.

for computing

Algorithm

We consider

a specific

process

165

o3 for a Lanchester described

by a system

type system of Lanchester

[ll:

t,ype equations

yj(k-(

i)=max{O.yiO_-~zi(k)nji(k)uji(k)}, i=l

1, . . , , n;

i =

The vectors the sides

x(k)

=

j = 1,. . . , m;

{xi(k)},

yk =

X and Y, respectively,

a,*(k) > 0

{y,(k)}

at the instant

will be considered

k=O,l,...,

N--l.

will be called

the resources

k. The coefficients

to be known functions

of the discrete

of

flij(k) 3 0, argument

h. The functions uj;(k), ujj(k) will be called the distributions of the resources of the sides X and Y respectively. We agree to consider the quantities x(O), y(O) as prescribed,

and the quantities

x(N), y(N) as free.

As the pay-off to the side X after the N-stage

process

(2.1) we consider

N-l I=r(

~ll(~(k),y(k),r(k+l),y(k+l),u(k),u(k))+~(z(N),y(N)), k=O

(2.2) where Jk = fk(s(k),

We will consider assume

y(k),

z(k

+ i), y(k + 1)) +$&(k))

the distribution

that they are subject

of resources

to the following

as controlling normalization

+

q4(W)).

(2.3)

functions

and

conditions

for

every k: m

c

Uji(k) 4 1,

uji(k) 3 0;

j=l

(2.4)

166

Multistage

games for Lunchester’s

equations

We consider the problem of finding the guaranteed estimate w) = wl. For this purpose we use the recurrence relation (1.5):

~N-k(~(k>,Y(w)=

max min [JR + UN+--l (z(k+ u(k) n(k)

I),Y(~+

We note that with the acceptance of the informational hypothesis of equations (2.1) can be rewritten in the form Xi(k + l)=si(k)-

i))?.

(2.5)

the system

E74j(k)Vij(k)Pijlk)j=l

n

l)=Yj(k)-

Yj(k+

lEri(k)uji(k)uji(k). i=l

Xi

(k + 1) 2 01

Yj(k+ I) 20.

For each pair x(lz), y(k) we define the following sets of attainability: pkvl is the set ofvalues 'i(k + 1) 2 0 obtained from (2.6) =b++ 1)) x(Qu(h') subject to the constraints

(2.4):

&&&) = {y (k + 1))

is the set of values

Yj(K + 1) %O obtained from (2.6) subject ‘to the constraints (2.4). We now notice that after fixing some values z (k) , y (k) , x (k + II>E P$~~), y (k + 1) E

d&/(h)r

we thereby impose on u;j (k), uji Ud constraints

of the following type:

m

Az#c+l)[email protected]+4)--zi(k)=-

c

.Z/j(k)vU(k)BU(k)f

j=l

(2.7)

converse problem, is, the of the parameters ZL~~(K), satisfying (2.7), as a leads to non-unique solution. denote independent remaining at disposal of by u*(k), remaining at disposal of by v*(k). for fixed of x(h), the choice u(k) is to the of y + 1) Rk+’ and some of u*(k); correspondingly, choice of is equivalent the choice z(k+ 1) and u*(k). will denote fact by: = {!/(k relation (2.5)

i), u”(QL (2.3) into

L- {x(k can be

1), v*(k)}.

as follows:

167

F. 1. Ereshko

~N-k(m,Y(4)=

(2.8)

maxmin(l~+~N~k~1(2(k+1),Y(~+~)))= u(k) U(h)

max

max

min

y(kf1)

u*

s(h+l)

min[fk(5(~),y(k),J(lc+

l)*Y(k+l))+

v*

+(Fk(Y(k+l),u’)+~k(5(k11),vg)+

ON-k-l(~:(k+l),y(k+1))]=max

min max min [...I= y(k+l, .Y(k+l) u t o*

maxmin[f~+max~~(y(k+~1),~*)+~in~lk(~(~+~),~*)+~N~-k~‘l~ u* u* y&+1) r(k+lJ We use relation

(2.8) to construct

mate wj of the initial state

N-stage

a scheme

problem,

for computing

based

the guaranteed

on the ideas

of reselection

[31. We first note that xi(h) , yi( k) are non-increasing

space

the discrete

in the

functions

of

k, that is,

argument 0 <

xi(k)

0 < Yj(k) G YAO),

0 < Xi(k + 1) <

0 < Yj(k + 1) < Yj(k), We then introduce

esti-

the stages

their aid at each stage

of the variables

k we execute

6j = 6 (yj)

4i = 4 (Zi),

the subdivision

Xi(k).

of the segments

and with [0, xi(O)],

In general all the Ai, aj are [0, yj(O)], i = 1,. . . , n; j = 1,. . . , nt. different. We call the set of nodes of the nets the spaces of the states X and Y. The possible of the net.

values

We now consider First by means

stage.

of x(k), y(k) (k = 1, . . . , NJ will be arranged

a direct

Each pair of points

of a finite

reselection

o’(z(N--l),y(N---l))=

In the solution controls

spaces,

transforming

x(N - l), y

associated

of this problem the phase

of a positive

programming

u*

We will call the combination

(2.9)

x(N), y(N) belonging

to verify the existence

point x (N - l), ,y(N in the case

to solve the two non-linear

B).

is

with the quantity

for each pair of values

it is first necessary

maxcp(y(N),U*), (problem

UV- 1) of the state spaces

min [~N--I(~(N--~)?Y(N-I),x(N),Y(N))+ I/(N) x(Pj) min~N_l(Z(N),u*)+Q)(1C(N),Y(N))]. c*

y(N) (of problem A), and then, question,

of the algorithm.

max

yxw--I(Y(N),u*)+

to the state

description

at the nodes

of permissible

1) into the point x(N), answer

to the first

problems

ming(s(N),u’) z)* of these

subproblems

an elementary

game.

168

Multistage

games

for Lanchester’s

equation

We will discuss in greater detail the content of an elementary game. Problem A has the following formulation: find at least one solution u UV- 1) u UV- 11 of the system of equations

A&.(N)+2 $/j(N--

I)v&V--

1)l3ij(N--

l)=O.

j-1

(2.10)

L\yj(N)+i]l5~(N_I)~ji(N_I)aj~(N_I)=O i=l

subject to the constraints (2.41. We reduce this problem to a linear programming problem by a well-known formal method. We denote by ci, dj the discrepancies in (2.101, that is,

i=l

and formulate the problem of the discovery of controls yielding a minimum of the linear functional n m ECi

+ Cdj j=l

i=l

subject to the constraints

i=l,

(2.4), (2.11) and ci 2 0, dj &O.

If a solution of this problem is realized for ci = 0, dj = 0; i = 1, . , . , n; . ..) m, a of the problem A Problem B has the

formulation: find the

min*iv-I

(0)

21

subject to the constraints Uij>O, i=l

and

Asi(N)+~yj(N-l)uijBij(N--l)=O, j=l

u and

realizing

169

F. 1. Ereshko

maX

(u)

TN-1

u

subject to the constraints AYj(N)+

~~~(~-~)~jinji(~-l)=O,

~Uii&l,

i=l

j-l

ujf

2 0.

Second stage. By means of a finite reselection, with each pair of points xUV - 2), y(N - 2) of the state spaces there is associabd the quantity

~(~-~1),~(~-~))+max~~-2(~(~-1),~*)+ u*

f>,v*)+wi(5(N--l),y(N-I))l.

min *+_2(1~(N -

V’

This does not change the content of the elementary game. The successive execution of such constructions in accordance with (2.8) leads to a solution of the original problem of the discovery of a guaranteed estimate. We notice that in the process of discovering a guaranteed estimate we have also determined the best guaranteeing strategy for the side X. Indeed, sinee we have assumed that at each instant k both sides have available i~~mation about x(k) and y(k), the best guaranteeing choice of the side X must be made by the rule 12.5). It is clear from this that a guaranteeing strategy will be completely described if we find ~#-~~(~(k), y(k)) for all possible pairs x(k), y(h), where 0 < k < N-

1,

0 < z(k)

< z(O),

0

< y(O),

This operation is contained

in the algorithm explained above. We will discuss

some particular cases.

Let the pay-off have the form

N-l I=

z k=O

fk(5(k),y(k)*z(k+I),y(k+l))+a,(5(N),yIN)).

In this case the recurrence relation has the form

Ct.)N-k(z(k),y(k))=max

min (fk$.WN-k-l(r(k+l),~(k+l))),

[email protected]+l) ZfW-1)

and the elementary game reducesto problem A. As already mentioned the solution of the converse problem (2.7) is as a rule not unique. This implies that the same pay-off for fixed x(k), y(k), x(k + l), y (k + 1) may be obtained with a different distribution of resources. This feature of the pay-offs considered leaves some

170

Multistage

games

for Lanchester’s

freedom in the choice of the guaranteeing the guaranteed estimate 03.

equation

strategy, ensuring the same value of

If the pay-off is terminal

the recurrence relation takes the simpler form coN-*(2 (k), y(k))

= maxminoN-“-l(z(k+l), u(k+1)x(k+1)

y(k+

1)).

The elementary game is reduced to problem A. It follows from the description of the algorithm that the duration N of the process affects only the duration of the calculations. The principal computational difficulties arising in the realization of the scheme of calculation, are connected with the solution of the one-step games, the greatest requirements being imposed on the size of the computer memory. We add that this feature of the algorithm is characteristic for all schemes using the ideas of dynamic programming. In this connection various approximate approaches to the determination of the guaranteeing estimate w3 are of great importance. As such approaches, which in this case are of a heuristic nature, we can indicate the method of subdivision of the net and the method of moving tubes, the extension of which to multistage games presents no difficulty. In conclusion the author sincerely thanks N. N. Moiseev and Yu. B. Germeier for formulating the problem and for valuable advice. Translated

by J. Berry

REFERENCES 1.

Application of the Theory of Games in Mi1itar.y blatters voennom dele) “Sov. radio”. Moscow, 1961.

2.

GERMEIER, YU. 9. Methodological and hlathematical Fundamentals of Operational Research and the Theor?/ of Games (text of lectures) (Metodologicheskie i matematicheskie osnovy issledovaniya operatsii i teorii igr (tekst lektsii)). Nos. I-V, VTs MGU, Moscow, 1967.

3.

MOISEEV, N. N. The methods of dynamic programming in the t.heory of optimal controls. Zh. uychisl. Mat. mat. Fiz. 4, 3, 485-494, 1964; 5, 1, 44-56, 1965.

4.

ERESHKO, F. I. and PROPOI, 1970. 2, 42-48,

A. I.

Theory

(Primenemie

teorii

igr v

of dynamic games. Ttlkhn. kibernetika.