Repositioning control of a two-link flexible arm by learning

Repositioning control of a two-link flexible arm by learning

PII: sooo5-1098(%)00199-9 Pergamon Repositioning 0 A,rromarrco. Vol. 33. No. 4. pp. 579-590. 1997 1997 Elsewer Sconce Ltd. All rights reserved Prm...

1MB Sizes 3 Downloads 27 Views

PII: sooo5-1098(%)00199-9




A,rromarrco. Vol. 33. No. 4. pp. 579-590. 1997 1997 Elsewer Sconce Ltd. All rights reserved Prmted in Great Britain om-1098/97 $17.00 + 0.00

Control of a Two-link Flexible Arm by Learning* P. LUCIBELLO,?


and G. ULIVIf

A linear state-steering learning algorithm is proven to reject nonlinear disturbances. Some experimental results showing the repositioning control of a two-link flexible arm are reported. Key Words-Nonlinear robustness.


flexible arms; learning


finite dimensional algorithm is presented which by trial searches for a control law is able to move a two-link flexible arm between two given equilibrium configurations in a finite time interval. Using a singular perturbation analysis, the possibility of considering the system as a linear perturbed one is shown, allowing the use of a linear learning algorithm which is able to reject all the nonlinear disturbances. The effectiveness of this algorithm is proven by theoretical arguments and experimental results. Robustness with respect to unmodeled high frequency dynamics is also addressed. 0 1997 Elsevier Science Ltd.


singular perturbation


algorithm is set up to track a trajectory connecting the given equilibrium points. It may happen that during a trial the robot reaches the desired equilibrium point even if the selected trajectory is not tracked exactly. This would cause the update of the control for the next trial and the robot could no longer reach the desired equilibrium point at the end of the new trial. Moreover, the problem of trajectory tracking for a flexible arm between two equilibrium points is made more difficult by the non-minimum-phase behavior of this system. The problem described can be classified as a state steering one, for which ad hoc learning algorithms, which are different from output tracking ones, are needed. On the basis of these motivations, the problem of steering the state of a control system by learning has been investigated theoretically (Lucibello, 1992b, c, d). These algorithms are based upon the seiection of a finite-dimensional subspace of the linear space of all control functions defined over a finite time interval. As the search for the steering control is restricted to this subspace, the resulting algorithm is finite dimensional. In this paper we consider the repositioning control of a two-link arm, the second one of which flexible, using a learning strategy. We show that by using state feedback the nonlinear system can be transformed into a perturbed linear one, for which a general theory has been presented previously (Lucibello, 1994). We also prove that the proposed learning algorithm is robust, not only with respect to sufficiently small nonlinear plant perturbations, but also with respect to unmodeled high-frequency dynamics. This problem, known as spillover, has been pointed out previously by Balas (1978) for the case of output regulation of a linear flexible structure. It has been shown (De Luca et al.,


Output tracking by learning has received and is receiving a great deal of attention in the literature. Interesting results have been achieved both for rigid manipulators (Casalino and Bartolini, 1984; Arimoto, 1990; De Luca et al., 1992; Arimoto et al., 1994) and for robots having elasticity in their joints (Miyazaki ef al., 1986; Lucibello, 1993) or links (Poloni and Ulivi, 1991; Lucibello, 1992a). In many cases theoretical studies have been validated by experimental results. Other robotized tasks exist, where output tracking is not required. Typically, an elementary robot operation consists in moving the robot between two given equilibrium states defined by assigned positions of the end effector. This task can be implemented by using trajectory-tracking algorithms, but this adds unnecessary constraints to the motion. Indeed, suppose that a learning * Received 8 August 1994; received in final form 8 October 1996. This paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor Kenko Uchida under the direction of Editor Tamer Bapr. Corresponding author Dr Stefano Panzieri. Tel. +39 6 44585366, Fax +39 6 44585367; E-mail [email protected] t Dipartimento di Informatica e Sistemistica, Universita di Roma ‘La Sapienza’, Via Eudossiana, 18,00184, Roma, Italy. $ Dipartimknto di Informatica e Automazione, Universita degh Studi ‘Roma Tre’, Via della Vasca Navale 84, 00146 Roma, Italy. 579

P. Lucibello et al.


1992; Lucibello, 1992d) that a similar problem occurs in the case of learning control, and that it can be overcome using suitable filters designed in a linear setting. The implementation of the learning algorithm is described for an experimental flexible arm built in the authors’ laboratory. The experimental results, presented at the end of the paper, confirm the validity of the approach. 2. STATE STEERING BY LEARNING

2.1. Linear systems Consider a linear system


i(t) = Ax(t) + h(t),

x(0) =x0,


Condition (6a) implies that the system at time tf is at rest, and condition (6b) implies that a given linear function of the state and the input takes on a prescribed value at time tf. Note that in some cases we could not be able to specify in advance the value of u(tt). This can happen when some unknown, even if repetitive, disturbances act on our system in the final position, but we want the system to be at rest for t = tf. In this case the mapping a,, will be zero, leaving to a, the task of specifying a particular output of the system. Finally, observe that, in practice, system positioning +Qmust be a vector of measurable quantities that can be compared with the desired one +d defined in Problem 2.1.


where x(t) E R” is the state, and the control function u(t) belongs to the linear space U of piece-wise continuous functions which map the time interval [0, tr] onto IR”. Together with (l), consider a linear invertible map

2.1. Given the control system (l), an initial state x0, and the map C, find a piece-wise continuous control u(t), t E [0, td, such that

Proposition 2.1. Suppose that the pair (A, B) is


then Problem 2.1 is solvable.


Include u(t) among state variables, dynamically extending (1) with m integrators:










with (cldassigned. Definition 2.1. We refer to

x(O) = x0,


[ Uf


Finally, the integration




of equation

ti(t) = u(t) With the above assumptions we consider linear of the state together with the combinations control input. In this way we also control the final value of the state derivative, implying the possibility of specifying equilibrium states. In fact, we could choose C and $I~ as


where u(t) is a new control, and u. is arbitrary. Since extension of the integrators does not destroy controllability, the extended system is controllable. Hence, there exists a control u(t) which steers the state of the extended system from the initial condition to the point

(4) as the system positioning at time tf.

u(O) = uo,

will provide 2.1.

a continuous


(9) to Problem n

Some remarks are now in order:

6) If the two states (initial and final) are

with w = (a,, We,): R”““+ R”’ being a linear map that guarantees the invertibility of C, to define an equilibrium position &, E I?“. Note that, in this case, the system positioning $I is composed of state derivatives and a linear combination of the state and the input giving:

equilibrium points, it is not possible arbitrarily to assign the values of u. and ur: in this case we have to satisfy condition (6a). (ii)

Proposition 2.1 does not imply that the system must be extended with m integrators. It can be done, in this context, only if there is a need to smooth the control input u(t).


Even if the solution to Problem 2.1 has

i(tf) = [email protected],) + Bu(tf) = 0 W.&f)






Repositioning been given as a continuous function, in the following we extend our analysis to the class of piece-wise continuous functions, giving for this larger class conditions that must hold for the solvability of the problem. If the plant is not exactly known, i.e. if the pair (A, B) as a point of R”““’ X 88”““’ belongs to the neighborhood of the nominal point (A, B), the direct computation of a steering control is not possible, even if such steering control exists. The existence of this control is guaranteed by the fact that, if the pair (A, B) is controllable, there is a neighborhood of this point such that each pair belonging to it is controllable. In this case a learning strategy can be applied in order to solve the problem iteratively. To this end, restrict the class of control laws to the following;

u(t) =


control by learning

The reader may refer to Wonham (1979) for examples of linear functions & such that W is of plant invertible under the hypothesis controllability. Equation (12) is the starting point to develop a learning algorithm. Suppose that at each trial the system is initialized always at the same point (x0, uO), and that the control function is given by n+m u”‘(t) = 2 (Y!‘&(f) + u,,,



where the superscript [j] identifies the function or the parameter used in the jth trial. At the end of each iteration, the sysfem positioning error is expressed by e[jl= *d - 4”’ = I(Id- CL - Cwx[jl,


Let the control law (14) be updated by adjusting the vector x using the iteration

c (Yk4k(f)+ uo,

xbl + G ,lil,

xti+ll =




k = 1,. . . , n + m,

= 0,


where (YeE iw and $k : [0, tf] + R”’ are piece-wise continuous functions. At time t‘, one has fI+f?* II eA(“-‘)&&(t) dt x(t,) = eA”xO+ c ffk I0 k=l

where G is a linear map. Premultiplying CW, one obtains CWX [/+I1= cwx[jl


(16) by



which together with (15) gives e[j+ll

= +d _ CL

_ cwxli+ll

= Gd -


fl +


eA(‘l-‘)Buo I



= &I

c k=l

[email protected])




These equations are rewritten in matrix form as: 4ff) [ u(tr)








elj+ 11= (1 _ CwG)


E=I-CWG eA(‘N-r)B+l(t) dt


I eA(tr-f)Wn+m(k) dt







4, +m(tf>


has all the eigenvalues in the unit open disk of the complex plane, then etil+O uniformly as j+m. The existence of a map G such that the eigenvalues of E are in the unit open disk is proven by the fact that CW is invertible. In fact, by setting

eA(‘f-‘)Buo dt

G = (CW)-‘, [


where I is the identity map on IV+“. If the map


. . .


The linear and finite dimensional dynamics of the error, defined over the countable set of the trials, is therefore given by






Clearly, if W is invertible a steering control of the type (II), which solves Problem 2.1, exists.


E turns out to be equal to the null map, which provides zero error in one step. Obviously, this requires that the plant is known exactly. Moreover, each G satisfying G=h(CW)-‘, provides

error convergence.


In any case, since


P. Lucibello et al.

the eigenvalues of E continuously depend on (A, B) as a point of lR~“x”[email protected]‘““~,if these eigenvalues are inside the open unit disk at the point (A, B) E R(nxn)+(nxm), then there exists a neighborhood of this point such that this property is preserved for all pairs belonging to it. Note that proper choice of A can help to ensure faster convergence when the system is not the nominal one. 2.2. Nonlinearly perturbed systems Owing to the uniform convergence, all perturbations which are not persistent and sufficiently small in norm are rejected, including nonlinear perturbations, whereas small persistent errors will correspond to sufficiently small persistent errors (total stability theorem) (Hahn, 1967). In particular, we consider plant perturbations of the type i(t)

= Ax(t) + h(t)

+ q[x(t),

In this section we show how the flexible-arm model can be put in the linear perturbed form by using state feedback. We suppose that the arm undergoes motion which is the sum of a rigid motion plus a small elastic deviation. We further suppose, supported by the experience, that the elastic displacement field can be described, with respect to a floating frame of reference, by the linear combination of a finite number of low-frequency modes of deformation. Moreover, under the hypothesis of small elastic displacements, the dependence of the velocities of the material points on the elastic displacements is neglected. As shown in Fig. 1, let O,(t) and e,(t) be the first and second joint rotations, respectively, and t(t) E Iw” the vector of the amplitudes of the modes of deformation. Set

u(t), t], (23)

where E is a small parameter. The linear part of this system (obtained by setting E = 0) is referred to as the unperturbed linear system; a learning algorithm was designed on the basis of this linear system as the unperturbed learning algorithm. The dependence of the learning algorithm on nonlinear plant perturbations has been investigated previously (Lucibello, 1994). Here the most relevant conclusions are given. Suppose that G(t) E R”, t E [0, tf], is a control function which steers to the state of the unperturbed linear system to xf at time tf. Consider the set of all the inputs near a(t), and the set “21,of all the state evolutions ((t; u) induced by the inputs belonging to QX. Define the set %,, = {w E R” : 11 w - c?(t) II< vl, t E [0, tf]} and let ((t; u) be the state evolution of the unperturbed linear system corresponding to a given control u(t) E q,, t E [0, tf]. Define the bounded set ‘I& = {z E R” 1z = ((t; u), u(t) E %,,, t E [O, trl>.


If the set n(e) is continuous over % X %, X [0, tf], then it has been proven (Lucibello, 1994) that for sufficiently small E the nonlinear perturbation &q(a) is nonpersistent for the unperturbed learning algorithm and, therefore, it is rejected. In particular, robustness with respect to plant-parameters uncertainty, when the plant dynamics continuously depends on them, is ensured. It is emphasized that all the perturbations considered must be sufficiently small in order to preserve algorithm convergence. HOWever, the experimental results presented later show that, for the laboratory arm used, ‘significant’ perturbations are rejected by the implemented algorithm.

Let the virtual work of the torque applied at the second rotating joint be given by r2(t)yT6[. Then, the Lagrange equations of motion are of the type:

(25,26) where M, F and K are the mass, damping, and stiffness matrices of the second flexible link, the terms bij(&) complete the inertia matrix of the whole robot, and n(a) is the vector of the Coriolis and centripetal terms. The inputs to the system are z,(t) and q(t), the torques applied to the two joints of the robot arm. These equations are detailed by Lucibello and Ulivi (1993) and

Fig. 1. Flexible link variables.



control by learning

where K,,, is a positive gain and u, is a new control input. Substituting this into (32~) gives

De Luca et al. (MO), who tested model-based algorithms. By using a learning approach to the control, less information is required. In order to transform the robotic system to a linear perturbed one, we use high-gain feedback. For this purpose, we utilize the Hamilton equations of motion:

li, =

-if41+Kp,(ql -v)l

= - i

k&l(~2h + d12(e2)P2 + Kplh - U,)].




ri= -$+ p$)] - [F(o],

With this position, (32a)-(32d) can be written in the singularly perturbed canonical form (Kokotovic er al., 1986):

where H = T + U is the sum of the kinetic energy and potential energies, p is the vector of the momenta, and q is the vector of the generalized coordinates. In the present case the Hamiltonian takes the form

4, = 4 de2h + d12(e2)P2,


q2= d2,(e2kl + d,,(e,)p,,


a = -h(e2h

+ K,,fq~ - udlt P2 = -Kg2

where B(m) denotes the whole inertia matrix in (26). The momenta are defined as p,

=-$=b,,e, +b,*i 1


+ d22(e2)p2).

$ +






+ ‘Yr2 - F.i = - K[ + yz2 - F[_

ti2 = d2,


(0,)~ l + dZ2wP2,


0, = Tl, 02 = -&2

(32~) + 7~2 - W2,(021p,

+ d22(e2)p2).


Consider a control law

21 = - i

[41+ K,,(q, - u,)]

- u,),





+ YG - [email protected](f)21p,

+ d22(e2)p2). (37c)



state evolution


Denoting by dij(02) the elements of the inverse of the inertia matrix partitioned as in (26), the Hamiltonian equations can be rewritten more explicitly as 41 = 41((32lPl


d22(e2)d,‘(e2)[-K,l(41 - ~5)- d12(e2b2i

P2 = -Q2 p2 = - z

is obtained

- ~1) - 42(~21~211

and the slow system, governing on this manifold, is given by

42 = p, = -


The equilibrium manifold equation from (35~) by letting E = 0:

4, = -Kp,h

and their derivatives can be computed using


+ 7~2 - W,,(e,lp,

Pl.eCl= Gi’(W-Kpdq, (29)

+ d12wp2


Let z be the deviation of the variable pl from the equilibrium manifold, then z




From (35~) one derives the fast transient of z as WV) -


= -4



where B = t/s is the fast time-scale. Due to the fact that d1,(6J2) is always positive, the fast system is exponentially stable. Since the fast system is exponentially stable, for sufficiently large t’, the solution of the slow system (37) is an s-approximation of the unperturbed system (35) for all t E [t’, tr]. Hence, for sufficiently small E, we can deal with (37),

P. Lucibello et al.


which is a perturbations. differentiable, (38) by using has

linear system with nonlinear To see this, suppose that q(t) is so that e,(t) exists, and rewrite Lagrange coordinates. One then

@j] =







K[ = Yr2(t)

+ n,[%(t),

4(t)] - b%(t).


If we assume that b,(t), e,(t), t E [0, tf] are sufficiently small, since Q(V) vanishes with b,(t), this system becomes a linear perturbed one and the learning algorithm can be designed on the basis of its linear part b,(t) = -K,,,&(f) Ml(t)

as x, = suppose

x!y(t,) [ $/l(f‘) 1’ c:R2r+’ -+ R2’+’



e,(t) = -&1&(t) Ml(t)

reduced state of the flexible arm (vl, . . . , v,, 4,) . . , O,), we initially that

+ F[(t)

+ J+(f),


+ Kc(t) = YrT(t).



A further question that needs to be addresses is algorithm robustness with respect to unmodeled high-frequency dynamics (spill-over effect). In an open loop, the number of vibration modes used in flexible-arm modeling influences the accuracy of arm-motion prediction, whereas in a closed loop the neglected high frequency modes of vibration may be deterimental to the stability of learning algorithms. In order to clarify this point, suppose we apply the learning algorithm only to the steering of the flexible link. For this purpose, consider (41b) expressed in modal coordinates, i.e. the coordinates relative to the flexible link, Vi(t) + 2w;G;V;(t) + Wfkj(t) = PjTT(t),

v;(t) E R Q;(O)= $,

Vi(O) =


i= 1,2,. . . ) s

but, when implementing the learning algorithm, the contributions of the neglected highfrequency modes are fed back through +“I that is built up, at the end of each trial, with measures made directly on the system. This mechanism may destabilize the learning algorithm, which has been designed on the basis of the reduced model. We now investigate the stability of the learning algorithm under the hypothesis that the neglected modes of vibration are in the high-frequency range. Let us redefine tih as w,,/&, h = r + 1,. . . , s, and, in order to preserve the modal damping level, & as (3,/G. Equations (42) can then be rewritten as: Vi(t) + 2wiGi+;(t) + W’Vj(t) = fiir2(t) &iih(t) + [email protected],&+h(t)

where wi is the ith natural circular frequency, Wi is the ith damping factor, and pi is a constant. Note that we assume here that the modes of vibration are uncoupled. To investigate the spill-over, suppose that an arm model with the lowest r
i = 1, 2, . . . , r;

h = r + 1, . . . , s.

manifold is obtained The equilibrium singularly perturbing the above system, letting E -+ 0 in (44b): +h.eq





+ &vh(t)

= &,r2(t)




(44b) by and


Defining the deviation of Ch (h = r + 1, . . . , s) from the equilibrium manifold as




?$, = ir, - +h


(46) the time-

h = r + 1,. . . , s,


where u is the fast time. This transient is globally exponentially stable and the Tikhonov theorem applies (Kokotovic et al., 1986). According to this theorem, if E # 0, for sufficiently large t the actual solution is uniformly approximated by the solution of the slow system, i.e. the one given by (44a) computed along the slow manifold, and (44b), with E -+ 0: vi(t) + 2U;&,+,(f) +




i =

1, . . . , r (484




Now, observe that the slow solution (48b) relative to the neglected modes is not affected by

Repositioning This

the control input. contribution of



%I(&), V&f)




to the measured I/&‘)is constant, not depending on the actual value of r,(t). As a consequence, if we want to analyze the behaviour of the learning algorithm, we have to write #jl as *I11 = c [ with



X”‘(ff) r2





(50) and

the vector xS= (v,+,, . . . , v,, Qr+,, . . . , Q,) is the state relative to the neglected modes, and the iteration-independent term C’ = x,(t,) is a constant. With this position, computing L and W in (13) on the basis of the reduced system, we can rewrite (15) as eljl = +d - @I = lCld- CL - CwXljJ - Clx,(r,). (51) Using the update law (16), it follows that the error dynamics is again given by (19) and, with a proper choice of G, we have geometric stability for the slow system, implying that e[jl+O as j-m, At this point, provided that tf is sufficiently between the learning large, the difference dynamics corresponding to the slow system and the learning dynamics of the full one is a continuous function of E, and vanishes for E = 0. Hence, geometric stability is preserved for sufficiently small E. Finally, the disturbance due to the neglected dynamics is not persistent. In fact, only if at the end of a trial #jl = tCldis the control no longer updated, and at the next trial the same point is reached. As a final remark, we want to stress the fact that it is not possible to select the exact final value of the full state of the link. In fact, even if system positioning @jl+ edId, from (50) we know that infinite solutions are allowed for xr(tf), x,(tr) and r2(ff). To appreciate this point fully, consider the case of moving the end point of the arm between two given mechanical equilibrium points. To impose the equilibrium condition in the final one we will set the state derivative to be zero at time tf. Choosing C and l(ld as in (5), with w,, = 0 and a, such that $&,is the position of the end point, +d will contain velocities, accelerations, and the tip position. Following from the previous considerations, even if the end point reaches the desired position, the arm may not be in equilibrium at time tf, since there is no guarantee that at time tf


control by learning

the derivative of the full state of the arm, is zero. This means that for t > tf the arm could start to move again. However, this disturbance may be negligible if the robotic system is stabilized exponentially. 5. THE



The arm used in the experiments is a direct-drive planar chain with two revolving joints and two links, the second of which (the forearm) is very flexible (De Luca et af., 1990, Lucibello and Ulivi, 1993). A sketch of the arm is shown in Fig. 2. The first link (the rigid one) is 300 mm long and was obtained from an aluminium pipe 1OOmm diameter and 5 mm thickness. The second link has been designed to be very flexible in a plane orthogonal to the axes of the two motors (the horizontal plane) and stiff in the other directions and torsionally. It is composed, as depicted in Fig. 3, of two parallel sheets of harmonic steel coupled by nine equispaced aluminum frames, which prevent torsional deformations. Each sheet is 700 X 70 X 1 mm. The frames are 70 X 70 mm square, with a circular hole drilled at the center to reduce their weight. They have a sandwich structure two external layers of 2 mm thick aluminum enclose an internal layer of 1 mm neoprene. The continuity of the external layers is interrupted on the opposite sides by four parallel 1 mm long cuts, which preserve the neoprene continuity. As the neoprene is very flexible, two parallel hinges are used so that the steel sheets can bend only in the horizontal plane. The total mass of the flexible link is about 2 kg. The inertias of the first and second links are J, = 0.447 and J2 = 0.303 kg m2, respectively. The motors driving the two links can provide maximum torques of 3.5, and 1.8Nm and are supplied by two 20-kHz pulse-width modulated (PWM) amplifiers. Each motor is equipped with a 5000 pulse/turn encoder and a DC tachometer. The second-link deformations are measured by an optical sensor with a precision of about 0.1” (De Luca et al., 1990). The light from three lamps positioned along the link is reflected by three rotating mirrors on three detectors. When a detector is illuminated, the position of the corresponding mirror is measured by an encoder. In this way it is possible to determine the displacement of the lamps during the experiments and to reconstruct the link shape to a good approximation. The displacement-sensor logic contains a speed regulator for the motor driving the mirrors and a phase-locked loop which increases the number of the pulses


P. Lucibello ef al.

1 - First motor 2 - Joint 3 - Shaft 4 - Ball bearings 5 - First link (rigid)

6 - Secondmotor 7 - Secondlink hub 8 - Second link (flexi

I-700-4 Fig. 2. Sketch of the arm.

provided by the encoder (a 500 pulse/turn unit) by a factor of 16. The deflections measured by this sensor are rather noisy and require some processing to improve their quality. 6. IMPLEMENTATION


For the implementation of the learning controller, only the first mode of deformation was retained and equations (41) were used. The full model parameter values have been reported by De Luca et al. (1990). A proportional and a derivative loop are closed at each joint so that the two control torques are given by 71 = -i 72 =



[& + k,,(& k,,2 62 +

[email protected]





The joint-clamped



[ 2 ’


kd2 = 0.8.

of the first

Choice of system positioning 4. The robot is required to move between two equilibrium configurations defined as having zero velocities and accelerations and a zero deflection of the flexible link. Since the plane of motion is horizontal and the final position is an equilibrium configuration, the control torques at time t = ff must also be zero. This constraint, taking into account the proportional and derivative loops of (52), can be enforced by requiring that Ui(tr) = ei,, and & = 0 (i = 1, 2). The other constraints are that, at time tf, the positions of the joints and the end effector are the same, and equal to the desired values 0i,, (i = 1, 2). Hence the positioning error is defined as e[il = *d _ @I,


with E =0.5,

mode is fi = 1.25 Hz; the frequency neglected mode is f2 = 7.25 Hz.


with a choice of the linear map C in (4) such that the system positioning is given by (53)

of the deflection

* = (e,, 81, VI, e2,



8tipv u2),


where 8tip is the elastic displacement of the end effector with respect to the axis of the flexible link, and the desired position is given by (cld= (old, 0, old, e2d, O,O, 0, e2d).


Choice of rhe functions 4(t). Two distinct sets of functions have been selected: simple time polynomials and harmonic functions. In both cases, taking into account that the number of state variables of the robotic system plus the two control inputs is 8, we have to choose 8 base functions. As polynomials we used:

[b,(t) -


. . .

hingeFig. 3. Flexible link structure.















t5 ’




control by learning

while the harmonic functions used were [41(t)

. . . 48Wl






I-cos$ \

\ \






Fig. 4. Initial and final arm configurations.



I-cos3f mf 1





(58) mf



with t E [0, tf]. Finally, to smooth the control input, we added two integrators. Therefore the inputs to the system (i.e. to the integrators) are computed as (59) Computation of the learning matrix G. The main problem consists in computing matrix W in (13a). Each column of W has been computed applying a single 4; at the input of the system model and simulating it in the interval [0, tf]. The matrix G is given by (22) and the value of parameter A in (22) has been set equal to 0.7, a good compromise between robustness and rate of convergence. Implementation of the learning iteration (16). The implementation of this step is quite straightforward by means of a digital computer. It must be carried out off-line at the end of each trial after the evaluation of the positioning error, and does not need particular computational efficiency. 7. EXPERIMENTAL

P, (& = 0 rad, & = --k/2 rad) to the equilibrium point P2 (0, = k/2 rad, e2 = a/2 rad), as shown in Fig. 4, in 2.5 seconds. At the points PI and Pz the tip deflection is null. Each trial was started from P, and when the motion had ended the arm is repositioned at the same point. The same task was carried out first by using the polynomials, and second by using the harmonic base functions. For each experiment two sets of graphs are presented. One set shows the state variables versus time over the interval [0,3] seconds, for different trials. These graphs provide a general perspective of the system’s behavior. In order to assess the convergence of the learning algorithm, a graph of the euclidean norm of the error (54) plotted versus the trial number is given. This graph is the most appropriate one from which to judge the convergence of the algorithm. The results of the experiment with the polynomials are reported in Figs 5-10. Figures 5 and 6 show the trajectories of the first joint angle, the second joint angle, and the tip deflection during the eighth trial over the time interval [0,3]. Similarly, Figs 7 and 8 show the trajectories of the first joint angular velocity, the second joint angular velocity, and the tip elastic velocity over the same time interval. To complete the error vector, in Fig. 9 the values of the control inputs ul and uz are also reported. Finally in Fig. 10, the norm of the error (54) is plotted versus the trial numbers. Note that the


Several experiments have been done using the described set-up and have confirmed the validity of the proposed approach. Tests have revealed other nonlinearities besides those explicitly addressed by the model; in particular, dry friction and saturation of the motor drives. In controller worked any case, the learning properly and demonstrated its characteristic of robustness. Here, we describe two typical experiments, during which the links moved at the maximum speed allowed without motor drive saturation. The arm has to move from the equilibrium point





1.5 see

I 2


Fig. 5. 8, and O2 trajectories at the eighth trial.



P. Lucibello et al. 3.51





I’ . . . . . . . . .._.. it. J. _____...______________.._____ __-- _-a---------.




t_ 0.5


0 --.lO


/ 0.5

/ 1

/ I .5








I/I’ ,)’ ,’ ,/&_‘-


/*.* I’








Fig. 6. Deflection of the second link at the eighth trial.


. . . . . . . . . . . . . . . . . . . . . . . . __; . . . . . . . . . . . . . . . .._ y______ ,* .. I’ ‘\ ,#‘2 *L__/’




Fig. 9. Inputs IJ, and u2 at the eighth trial.

3 2.5 __.........;

. . . . . . . . . . y...:,;,.

_......__ j ._........_; .\

... ..


3 _A; . . . . . . . . . . ..i. . . . . . .. . . . . . . ..j . . .








I 1.5 set





Fig. 7. b, and 6, trajectories at the eighth trial.

. ... . .. . i ... . .. . .. .





iterations Fig. 10. Euclidean norm of the error in the trials.


full system positioning error is reported, in relation to the zero iteration. The same graphs for the harmonic base functions are given in Figs 11-16. Note how the state trajectories change by changing the functions which define the control applied. No significant differences were observed with regard to the convergence to zero of the norm of the error (54).

We have presented a learning strategy for the repositioning control of a two-link flexible arm. By restricting the search for the steering control to some finite-dimensional subspace of the input space, finite-dimensional learning algorithms have been developed which need a finite memory for their implementation. Robustness of the model with respect to a very large class of perturbations, including unmodeled highfrequency dynamics, has been proven by means



Fig. 8. Velocity of deflection of the second link at the eighth trial.












Fig. 11. 8, and & trajectories at the sixth trial.

J 3








control by learning







Fig. 12. Deflection of the second link at the sixth trial.

-11 0




1.5 set










Fig. 16. Euclidean norm of the error in the trials.

of theoretical arguments and has been verified experimentally using a laboratory arm. Ongoing research concerns both the possibility of cyclically steering the state of the system along a finite sequence of equilibrium points, thus avoiding the problem of exact reinitialization of the system at each iteration, and optimizing a given performance index during the repositioning.



Acknowledgements-This funds from Murst 40%.

Fig. 13. 6, and I$ trajectories at the sixth trial.


was partly supported



Arimoto, S. (1990). Learning control theory for robotic motion. Int. J. Adaptive Control Sig. Process., 4,543-564. Arimoto, S., S. Kawamura and F. Miyazaki (1984). Bettering operation of robots by learning. J. Roboric Syst., 1, 123-140. Balas, M. J. (1978). Active control of flexible systems. J.

0.3 0.2 0.1

Radl o set

Optim. Theory Applies, 25415436.

Casalino, G. and G. Bartolini (1984). A learning procedure for the control of movements of robotic manipulators. In

-0.1 .0.2


-0.3 -0.4;









Fig. 14. Velocity of deflection of the second link at the sixth trial.






1.5 set



Fig. 15. Inputs v, and v2 at the sixth trial.







Amsterdam, The Netherlands, pp. 108-111. De Luca, A. and G. Ulivi (1992). Iterative learning control of robots with elastic joints. In Proc. 1992 IEEE International Conf on Robotics and Automation, Nice, France, pp. 1920-1926. De Luca, A., L. Lanari, P. Lucibello, S. Panzieri and G. Ulivi (1990). Control experiments on a two-link robot with a flexible forearm. In Proc. 29th IEEE Conf. on Decision and Conrrol, Honolulu, HI, pp. 520-527. De Luca, A., G. Paesano and G. Ulivi (1992). A frequency-domain approach to learning control: implementation for a robot manipulator. IEEE Trans. Indust. Electron., IE-39,1-10. Hahn, W. (1967). Stability of Motion. Springer-Verlag, Berlin. Kokotovic, P., H. K. Khalil and J. O’Reilly (1986). Singular Methods

in Control:

Analysis and Design.

Academic Press, London. Lucibello, P. (1992a). Output tracking of flexible mechanism by learning. In Proc. IEEE International Symp. on Intelligent Conrrol, Glasgow, UK, pp. 29-33. Lucibello, P. (1992b). Learning control of linear systems. In Proc. American Control Conf., Chicago, IL, pp. 18881892. Lucibello, P. (1992~). Point to point polynomial control of linear systems by learning. In Proc. 31st IEEE Conf on Decision and Control, Tucson, AZ, pp. 2531-2532.


P. Lucibello et al.

Lucibello, P. (1992d). Output regulation of flexible mechanisms by learning. In Proc. 31sr IEEE Conf: on Decision and Control, Tucson, AZ, pp. 2533-2538. Lucibello, P. (1993). On learning control for a class of flexible joint arms. In Proc. International Symp. on Intelligent Robotics, ISIR, Bangalore, pp. 627-636. Lucibello, P. (1994). State steering by learning for a class of nonlinear control systems. Automatica, X),1463-1468. Lucibello, P. and G. Ulivi (1993). Design and realization of a two-link direct drive robot with a very flexible forearm.

Int. J. Robotics Automation, 8, 113-128.

Miyazaki, F., S. Kawamura, M. Matsumori and S. Arimoto (1986). Learning control scheme for a class of robots with elasticity. In Proc. 25th IEEE Conf on Decision and Control, Athens, Greece, pp. 74-79. Poloni, M. and Cl. Ulivi (1991). Iterative learning control of a one-link flexible manipulator. In Proc. 3rd IFAC Symp. on Robot Control, Vienna, Austria, pp. 393-398. Wonham, W. M. (1979). Linear Multivariable Control: A Geometric Approach. Springer-Verlag, Berlin.