- Email: [email protected]

Contents lists available at ScienceDirect

Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc

Approximation by neural networks with scattered data q Shaobo Lin a, Xiaofei Guo b, Feilong Cao c, Zongben Xu a,⇑ a b c

Institute for Information and System Sciences, School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, Shanxi Province, PR China Department of Applied Mathematics, Northwestern Polytechnical University, Xian 710129, Shanxi Province, PR China Institute of Metrology and Computational Science, China Jiliang University, Hangzhou 310018, Zhejiang Province, PR China

a r t i c l e

i n f o

Keywords: Neural networks Scattered data Reproducing kernel Trigonometric polynomials

a b s t r a c t In this paper, the approximation capability of feed-forward neural networks (FNNs) formed P as nk¼1 ck /ðx xk Þ is considered, where / is an even periodic continuous function and xk ’s are scattered data. The best approximation error of trigonometric polynomials and the continuous modulus of the activation function / are introduced to describe the approximation capability of FNNs. Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction In the past two decades, we have witnessed the enormous emergence concerning the approximation capabilities of feedforward neural networks (FNNs). FNNs can be formally described as devices producing input–output functions depending on ﬂexible parameters. Generally speaking, the input–output functions possess the form of linear combination of functions computable by units speciﬁc for the given type of networks. Both coefﬁcients of the linear combinations and parameters of the computational units are adjustable in the process of learning. Theoretically, any continuous functions can be approximated by FNNs to arbitrary desired degree provided the number of hidden neurons is large enough. This result is usually called the density problem of FNN approximation. For example, by using standard functional analysis approach, Cybenko [9] established the density theorem of FNNs with the well known sigmoidal activation function. Similar results can also be found in Funahashi [12], Hornik et al. [13], Li [15], Leshno et al. [14], Chen and Chen [6], Chui and Li [7]. and references therein. Compared with the density problem, a related and more important problem is that of complexity: to determine how many neurons is necessary to yield a prescribed degree of FNN approximation, i.e., one tries to describe the relationship between the rate of approximation and the number of neurons in the hidden layer. Up till now, many authors have published results on the complexity issue of FNNs with various activation functions. For example, by taking the bounded sigmoidal function as the activation function, Chen [5] established a Jackson-type error estimate for FNN approximation on the interval ½0; 1. Recently, Cao and Zhang [4] extended it to Lp space, i.e., they give an Lp error estimate for such an approximation. By taking monotone odd function as the activation function, Cao et al. [3] constructed an FNN to approximate the continuous function and established a Jackson-type inequality on the unit interval. For more details about the complexity issue, we refer

q The research was supported by the National 973 Programming (2013CB329404), the Key Program of National Natural Science Foundation of China (Grant No. 11131006), and the National Natural Science Foundations of China (Grants No. 61075054). ⇑ Corresponding author. E-mail address: [email protected] (Z. Xu).

0096-3003/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.amc.2013.08.014

30

S. Lin et al. / Applied Mathematics and Computation 224 (2013) 29–35

the readers to Barron [2], Mhaskar and Micchelli [19], Maiorov and Meir [17], Makovoz [18], Ferrari and Stengel [11], Xu and Cao [23], etc. On the other hand, the problem of effectively representing an underlying function based on its values sampled at ﬁnitely many distinct scattered sites X ¼ fxi gni¼1 is important and arises in many applications such as machine learning [8], scattered data ﬁtting [22] and differential or integral equations [21]. For example, due to the so-called representation theorem of learning theory [20], the solutions to both the regularized least square algorithm and support vector machine algorithm take the form as n X /ðx xk Þ;

ð1Þ

k¼1

where xk ’s are the sites at which the data are collected. Therefore, it seems natural to consider networks that evaluate a function of the form as (1). In the present paper, by taking the even and periodic function as the activation function, we can construct an FNN formed as (1) to approximate any continuous functions. To this end, we ﬁrst give an integral representation for any trigonometric polynomials. Then, we construct an FNN to approximate the integral representation. At last, the approximation bound for trigonometric polynomials implies a Jackson-type error estimate for the constructed neural network. The modulus of smoothness of the activation function and the best trigonometric polynomial approximation error are introduced to describe the approximation capability of FNNs. 2. Approximation trigonometric polynomials by FNNs In this section, we construct an FNN formed as (1) to approximate trigonometric polynomials. To this end, some deﬁnitions concerning reproducing kernel Hilbert space (RKHS) are required. Let /ðx; yÞ : ½p; p ½p; p ! R be continuous, symmetric and positive deﬁnite, i.e., for any ﬁnite set of distinct points X ¼ fxi gli¼1 ½p; p, the matrix f/ðxi ; yj Þg16i;j6l is positive deﬁnite. Then the RKHS, H/ , associated with the kernel / is deﬁned (see [1]) to be the closure of the linear span of the set of functions f/x :¼ /ðx; Þ : x 2 Xg with the inner product h; i/ and norm k k/ satisfying h/x ; /y i/ ¼ /ðx; yÞ,

h/x ; gi/ ¼ gðxÞ;

8x 2 X; g 2 H/ ;

and kgk/ :¼ hg; gi/ , respectively. According to the deﬁnition of kgk/ , we can obtain

kgk 6 jkgk/ ;

8g 2 H/ ;

ð2Þ

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ where k k denotes the uniform norm and j ¼ supx2½p;p /ðx; xÞ. The next thing we do is to give an integral representation for arbitrary trigonometric polynomials. Let T n be the collection of trigonometric polynomial deﬁned on ½p; p with degree not larger than n. The kernel

K n ðx; yÞ :¼

n 1 1X þ cos kðx yÞ; 2p p k¼1

x; y 2 ½p; p;

will play an important role in our proof. The following Lemma 1 which can be found in [10] shows the reproducing property of K n ð; Þ. Lemma 1. The kernel K n ð; Þ is continuous, symmetric and positive deﬁnite. Furthermore, it is the reproducing kernel of T n , i.e., (i) K n ðx; Þ 2 T n with ﬁxed x. Rp (ii) p T n ðyÞK n ðx yÞdy ¼ T n ðxÞ. For further use, we also introduce the following Lemma 2. Lemma 2. If / is an even and periodic function with 2p being its period, then for arbitrary a; b 2 R, there holds

Z p

^ k ða cos kx þ b sin kxÞ; /ðx yÞða cos ky þ b sin kyÞdy ¼ /

k ¼ 0; 1; . . . ;

p

where

^ k :¼ /

Z p p

/ðtÞ cos ktdt:

ð3Þ

S. Lin et al. / Applied Mathematics and Computation 224 (2013) 29–35

Proof. Since / is even and 2p-periodical function, we have

Z p

/ðx yÞða cos ky þ b sin kyÞdy Z p Z p ¼a /ðy xÞ cos kydy þ b /ðy xÞ sin kydy p p Z p Z p ¼a /ðyÞ cos kðy þ xÞdy þ b /ðyÞ sin kðy þ xÞdy Zpp Zpp ¼a /ðyÞ cos kydy cos kx a /ðyÞ sin kydy sin kx p p Z p Z p þb /ðyÞ cos kydy sin kx þ b /ðyÞ sin kydy cos kx

p

p

p

^ k ða cos kx þ b sin kxÞ: ¼/ This completes the proof of Lemma 2. h Based on the above two lemmas, we give the integral representation for any T n 2 T n . ^ k – 0; k ¼ 0; 1; . . ., then for arbitrary T n 2 T n , we have Lemma 3. If / is continuous, even, 2p-periodic, and satisﬁes /

T n ðxÞ ¼

n 1X 1 p /^ k

Z p Z p p

k¼1

T n ðyÞ cos kðy zÞ/ðx zÞdydz þ

p

1 ^0 2p /

Z p Z p p

T n ðyÞ/ðx zÞdydz:

p

Proof. From Lemma 1, we know that for arbitrary T n 2 T n , there holds that

T n ðxÞ ¼

Z p

T n ðyÞK n ðx yÞdy ¼

p

n 1X

p k¼1

Z p

T n ðyÞ cos kðx yÞdy þ

p

1 2p

Z p

T n ðyÞdy:

p

It follows from Lemma 2 that for arbitrary even function / 2 Cð½p; pÞ, there holds

Z p

Z p Z p /ðx zÞ cos kðy zÞdz ¼ cos ky /ðx zÞ cos kzdz þ sin ky /ðx zÞ sin kzdz

p

p

p

^ k ðcos ky cos kx þ sin ky sin kxÞ ¼ / ^ k cos kðx yÞ: ¼/ ^ k – 0, then for any k ¼ 0; 1; . . ., there holds Since / satisfying /

cos kðx yÞ ¼

1 ^k /

Z p

/ðx zÞ cos kðy zÞdz:

p

Thus, we have

T n ðxÞ ¼

n 1X 1 p /^ k k¼1

Z p Z p p

T n ðyÞ cos kðy zÞ/ðx zÞdydz þ

p

This ﬁnishes the proof of Lemma 3. Let

1 ^0 2p /

Z p Z p p

T n ðyÞ/ðx zÞdydz:

p

h

rðÞ be the Heaviside function, i.e.,

rðtÞ :¼

1; t P 0; 0; t < 0:

Denote

c1 ðxÞ :¼ 1 rðx x1 Þ; ci ðxÞ :¼ rðx xi1 Þ rðx xi Þ;

2 6 i 6 n 1;

cn ðxÞ :¼ rðx xn1 Þ: Now we deﬁne the neural network as

Nn/ ðxÞ

! Z p Z p Z p Z p n n X 1X 1 1 :¼ c ðzÞ T n ðyÞ cos kðy zÞdydz þ c ðzÞ T n ðyÞdydz /ðx xi Þ: ^ 0 p i p k¼1 /^ k p i 2p/ p p i¼1

31

32

S. Lin et al. / Applied Mathematics and Computation 224 (2013) 29–35

To characterize the approximation degree of the constructed FNN, we need introduce the modulus of smoothness of a continuous function f as

xðf ; tÞ ¼ sup

max

06h6t x;xþh2½p;p

jf ðx þ hÞ f ðxÞj:

The smoothness modulus is often used as a tool for measuring approximation error. It is also used to measure the smoothness of a function and its accuracy between approximation theory and Fourier analysis (see [16]). The function f is called ðM; aÞ-Lipschitz continuous ð0 < a 1Þ, which can be written as f 2 LipðM; aÞ, if and only if there exists a constant M > 0 such that

xðf ; dÞ 6 Mda : Now we are in a position to give the main result of this section. ^ k > 0; k ¼ 0; 1; . . ., then for arbitrary T n 2 T n there holds Theorem 1. Let / be continuous, even, 2p-periodic and satisfy /

kT n Nn/ k2/ 6 2

kT n k21 xð/; DÞ; ^2 / 0

^ 0 :¼ where kgk1 is the usual L1 norm of the integrable function g; / separation distance of X.

Rp

p

/ðxÞdx and D :¼ max16i6n jxiþ1 xi j denotes the maximum

^ k > 0 for all k P 0, it follows from [8, Proposition 2.12] that the kernel deﬁned by /ð; Þ :¼ /ð Þ is positive Proof. Since / deﬁnite. Denote

hðzÞ :¼

n 1X 1 p /^ k

Z p

T n ðyÞ cos kðy zÞdy þ

p

k¼1

1 ^0 2p/

Z p

T n ðyÞdy:

p

Since

hT n ; T n i/ ¼

Z p Z p p

hðzÞ/ðz yÞhðyÞdzdy;

p

and

hT n ; /ð xi Þi/ ¼ T n ðxi Þ; it follows that

* kT n

Nn/ k2/

¼ hT n

Nn/ ; T n

N n/ i/

¼

+ Z p Z p n n X X Tn /ð xi Þ ci ðzÞhðzÞdz; T n /ð xj Þ cj ðyÞhðyÞdy p

i¼1

¼

Z p Z p p

¼

hðzÞhðyÞ /ð zÞ

p

Z p Z p p

*

p

n X

i¼1

j¼1

h/ðz Þ; /ð yÞi/

cj ðyÞ/ð xj Þ

gðzÞ :¼ /ðy zÞ

!! hðzÞhðyÞdzdy

j¼1

Z p Z p

Set

dzdy /

j¼1

n X ci ðzÞ h/ð xi Þ; /ð yÞi/ cj ðyÞh/ð xi Þ; /ð xj Þi/

p

/

+

n X cj ðyÞh/ð zÞ; /ðxj Þi/

n X i¼1

¼

n X

ci ðzÞ/ð xi Þ; /ð yÞ

p

j¼1

p

/ðy zÞ

n X j¼1

cj ðyÞ/ðz xj Þ

n n X X ci ðzÞ /ðy xi Þ cj ðyÞ/ðxi xj Þ i¼1

!! hðzÞhðyÞdzdy:

j¼1

n X cj ðyÞ/ðz xj Þ: j¼1

Since for arbitrary x 2 ½p; p, there exists a j0 2 N such that x 2 ½xj0 ; xj0 þ1 (without loss of generality, we assume 2 6 j0 6 n 1), it follows from the deﬁnition of r that

S. Lin et al. / Applied Mathematics and Computation 224 (2013) 29–35

gðxÞ

33

n X gðxi Þci ðxÞ i¼1

¼ gðxÞ gðx1 Þ þ

n1 X ðgðxiþ1 Þ gðxi ÞÞrðx xi Þ

!

i¼1 n1 X ðgðxiþ1 Þ gðxi ÞÞrðx xi Þ ¼ gðxÞ gðx1 Þ i¼1 j0 1 X ðgðxiþ1 Þ gðxi ÞÞðrðx xi Þ 1Þ ¼ gðxÞ g ð x1 Þ i¼1

þ gðx1 Þ gðxj0 Þ ðgðxj0 þ1 Þ gðxj0 ÞÞrðx xj0 Þ

n1 X

ðgðxiþ1 Þ gðxi ÞÞrðx xi ÞÞ

i¼j0 þ1

¼ gðxÞ gðxj0 Þ ðgðxj0 þ1 Þ gðxj0 ÞÞrðx xj0 Þ Therefore

n X gðxÞ gðx Þc ðxÞ 6 xðg; DÞ: i i i¼1 From the deﬁnition of xðg; tÞ, we obtain

n n X X ci ðyÞ/ðz þ D xi Þ /ðz yÞ þ ci ðyÞ/ðz xi Þ /ðz þ D yÞ p6z;zþD6p p6t;tþD6p i¼1 i¼1 X n 6 sup j/ðz yÞ /ðz þ D yÞj þ sup ci ðyÞð/ðz þ D xi Þ /ðz xi Þ p6z;zþD6p p6z;zþD6p i¼1

xðg; DÞ 6

sup

jgðz þ DÞ gðzÞj ¼

6 xð/; DÞ þ xð/; DÞ

sup

n X ci ðyÞ; i¼1

where in the last inequality we use the fact that ci ðyÞ P 0 for arbitrary i ¼ 1; . . . ; n. Furthermore, from the deﬁnition of ci ðÞ, there holds that n X ci ðyÞ ¼ 1: i¼1

Thus we obtain

n X gðxi Þci ðxÞ 6 2xð/; DÞ: gðxÞ i¼1 Therefore, we have

kT n Nn/ k2/ 2

Z p

2 hðzÞdz xð/; DÞ:

p

R p Finally we only need to give an upper bound estimate for p hðzÞdz. From the deﬁnition of hðzÞ, we deduce

! Z p Z p Z p Z p n 1X 1 1 hðzÞdz ¼ T n ðyÞ cos kðy zÞdy þ T n ðyÞdy dz ^ 0 p ^ k p p p k¼1 / 2p / p Z Z Z Z 1 1 X n p p p p 1 1 ¼ cos kðz yÞdzT n ðyÞdy þ T n ðyÞdy ¼ T n ðyÞdy ^ ^ ^ p /k p p /0 p /0 p k¼1

Thus,

2 kT n k2 n T n N/ 2 2 1 xð/; DÞ: ^ / / 0 This completes the proof of Theorem 1. h

34

S. Lin et al. / Applied Mathematics and Computation 224 (2013) 29–35

3. Approximation and interpolation by FNNs In this section, we deduce an upper bound of approximation by FNNs formed as (1) with the help of Theorem 1. Theorem 2. Under the assumptions of Theorem 1, for f 2 H/ , there exists a constant C independent of n such that

kf Nn/ k/ 6 En ðf Þ/ þ C ðxð/; DÞÞ1=2 ; where En ðf Þ/ :¼ inf T n 2T n kf T n k/ is the best approximation error of the trigonometric polynomials under the RKHS norm. Proof. For arbitrary

e > 0, it is obvious that there exists a T n 2 T n such that

kf T n k/ < En ðf Þ/ þ e: Therefore, it follows from Theorem 1 that

kf Nn/ k/ 6 kf T n k/ þ kT n Nn/ k 6 En ðf Þ/ þ e þ C ðxð/; DÞÞ1=2 : Thus the arbitrariness of e ﬁnishes the proof of Theorem 2. h Remark 1. Since /ð; Þ is the reproducing kernel of the RKHS, H/ , the best approximation of the FNN formed as (1) in the RKHS norm is also the exact interpolation FNN. Thus Theorem 2 also gives an RKHS norm error estimate of the FNN interpolant. It is obvious that the smoother / is, the smaller H/ is. It is natural to arise the question: can we get a similar error estimate for f 2 Cð½p; pÞ? The following Theorem 3 focuses on giving an afﬁrmative answer to this question.

Theorem 3. If the assumptions of Theorem 1 holds, then for arbitrary f 2 Cð½p; pÞ, there exists a constant C independent of n such that

! kf k1 1=2 n : f N 6 C E ðf Þ þ ð x ð/; D ÞÞ n=2 / ^0 /

ð4Þ

Proof. From (2) and Theorem 1 it follows that

kT n k1 n n ðxð/; DÞÞ1=2 ; T n N/ 6 /ð0ÞT n N / 6 C ^0 / / where En ðf Þ :¼ inf T n 2T n kf T n k is the best approximation error of trigonometric polynomials under the uniform norm. Let V ½n=2 f be the well known de la Vallée Poussin operator, i.e.

V ½n=2 ðf ÞðxÞ :¼

2½n=21 X 1 Sk ðf ÞðxÞ; ½n=2 k¼½n=2

where

Sk ðf ÞðxÞ :¼

1

Z p

p

p

f ðx þ tÞ

! k 1 X þ cos jt dt: 2 j¼1

Then we have

kf V ½n=2 f k 6 CE½n=2 ðf Þ;

ð5Þ

kV ½n=2 f k 6 Ckf k:

ð6Þ

and

Therefore, by setting T n :¼ V ½n=2 f in Theorem 1, we get

CkV ½n=2 f k1 n n ðxð/; DÞÞ1=2 : f N/ 6 kf V ½n=2 f k þ kV ½n=2 f N/ k 6 CE½n=2 ðf Þ þ ^0 / The above inequality together with (6) yields (4) immediately. h

S. Lin et al. / Applied Mathematics and Computation 224 (2013) 29–35

35

Acknowledgements Two anonymous referees have carefully read the paper and have provided to us numerous constructive suggestions. As a result, the overall quality of the paper has been noticeably enhanced, to which we feel much indebted and are grateful. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

N. Arouszajiu, Theory of reproducing kernels, Trans. Math. Soc. 68 (1950) 337–404. A.R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inf. Theor. 39 (1993) 930–945. F.L. Cao, T.F. Xie, Z.B. Xu, The estimate for approximation error of neural networks: a constructive approach, Neurocomputing 71 (2008) 626–630. F.L. Cao, R. Zhang, The errors of approximation for feedforward neural networks in Lp metric, Math. Comput. Modell. 49 (2009) 1563–1572. D.B. Chen, Degree of approximation by superpositions of a sigmoidal function, Approx. Theor. Appl. 9 (1993) 17–28. T.P. Chen, H. Chen, Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical system, IEEE Trans. Neural Networks 6 (1995) 911–917. C.K. Chui, X. Li, Approximation by ridge functions and neural networks with one hidden layer, J. Approx. Theor. 70 (1992) 131–141. F. Cucker, D.X. Zhou, Learning Theory: an Approximation Theory Viewpoint, Cambridge University Press, Cambridge, 2007. G. Cybenko, Approximation by superpositions of sigmoidal function, Math. Control Signals Syst. 2 (1989) 303–314. R. DeVore, G. Lorentz, Constructive Approximation, Springer-Verlag, Berlin, 1993. S. Ferrari, R.F. Stengel, Smooth function approximation using neural networks, IEEE Trans. Neural Networks 16 (2005) 24–38. K.I. Funahashi, On the approximate realization of continuous mapping by neural networks, Neural Networks 2 (1989) 183–192. K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximation, Neural Networks 2 (1989) 359–366. M. Leshno, V.Y. Lin, A. Pinks, S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Networks 6 (1993) 861–867. X. Li, On simultaneous approximation of by radial basis function neural networks, Appl. Math. Comput. 95 (1998) 75–89. G.G. Lorentz, Approximation of Functions, Rinehart and Winston, New York, 1966. V. Maiorov, R.S. Meir, Approximation bounds for smooth function in CðRd Þ by neural and mixture networks, IEEE Trans. Neural Networks 9 (1998) 969– 978. Y. Makovoz, Uniform approximation by nerual networks, J. Approximate Theor. 95 (1998) 215–228. H.N. Mhaskar, C.A. Micchelli, Degree of approximation by neural networks with a single hidden layer, Adv. Appl. Math. 16 (1995) 151–183. I. Steinwart, A. Christmann, Support Vector Machines, Springer, New York, 2008. A.N. Tikhonov, V.Y. Arsenin, Solutions of III-Posed Problems, Wiley, New york, 1977 (translated form the Russian). H. Wendland, Scattered Data Approximation, Cambridge University Press, Cambridge, 2005. Z.B. Xu, F.L. Cao, The essential order of approximation for neural networks, Sci. China Ser. F 47 (2004) 97–112.