A robust audio watermarking scheme based on lifting wavelet transform and singular value decomposition

A robust audio watermarking scheme based on lifting wavelet transform and singular value decomposition

Signal Processing 92 (2012) 1985–2001 Contents lists available at SciVerse ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate...

883KB Sizes 0 Downloads 7 Views

Signal Processing 92 (2012) 1985–2001

Contents lists available at SciVerse ScienceDirect

Signal Processing journal homepage: www.elsevier.com/locate/sigpro

A robust audio watermarking scheme based on lifting wavelet transform and singular value decomposition Baiying Lei a,n, Ing Yann Soon a, Feng Zhou b, Zhen Li a, Haijun Lei c a b c

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore The George W. Woodruff School of Mechanical Engineering, The Georgia Institute of Technology, Atlanta, GA, USA College of Computer Science and Technology, Shenzhen University, Shenzhen, China

a r t i c l e i n f o

abstract

Article history: Received 18 June 2011 Received in revised form 18 October 2011 Accepted 21 December 2011 Available online 8 January 2012

In this paper, a new and robust audio watermarking scheme based on lifting wavelet transform (LWT) and singular value decomposition (SVD) is proposed. Specifically, the watermark data is efficiently inserted in the coefficients of the LWT low frequency subband taking advantage of both SVD and quantization index modulation (QIM). The use of QIM renders our scheme blind in nature. Furthermore, the synchronization code technique is also integrated into our hybrid LWT–SVD audio watermarking method. Experimental and analysis results demonstrate that the proposed LWT–SVD method is not only robust against both general signal processing attacks and desynchronization attacks but also achieve a very good tradeoff between robustness, imperceptibility and payload. Comparisons with the typical and related audio watermarking algorithms also show that our proposed method outperforms most of the selected algorithms. & 2012 Published by Elsevier B.V.

Keywords: Audio watermarking Lifting wavelet transform Robust watermarking Singular value decomposition Quantization index modulation

1. Introduction Recently, audio watermarking is a very hot research topic and attracts a lot of interests as one of the most popular approaches for providing copyright protection. As a result, there is a great amount of state-of-the-art publications in the literature concerning this topic. As stated in the International Federation of the Phonographic Industry (IFPI) [1], an effective audio watermarking scheme needs to have the following properties or characteristics: (1) imperceptibility: the quality of the audio signal should not degrade after adding the watermark. Imperceptibility usually can be evaluated using both objective and subjective measures. Besides, signal to noise ratio (SNR) should be more than 20 dB. (2) Robustness: ability to extract a watermark from a watermarked audio signal after various signal n

Corresponding author. Tel.: þ65 6790 6548; fax: þ 65 6792 0415. E-mail addresses: [email protected] (B. Lei), [email protected] (I. Yann Soon), [email protected] (F. Zhou), [email protected] (Z. Li), [email protected] (H. Lei). 0165-1684/$ - see front matter & 2012 Published by Elsevier B.V. doi:10.1016/j.sigpro.2011.12.021

processing attacks. Watermarking should be able to prevent unauthorized detection and removal, unless the quality of audio becomes very poor. (3) Payload: the amount of data that can be embedded into the host audio signal without losing imperceptibility should usually be more than 20 bits per second (bps). (4) Security: watermarked signals should be secure and not revealed by any clues about the watermarks in them, which often depend on secret keys rather than the secrecy of the watermarking algorithm. Actually, robustness, imperceptibility and payload are three main requirements that are mutually conflicting. Therefore, tradeoffs are needed when designing and developing a new and successful watermarking scheme. In general, for robust audio watermarking, they can be broadly classified into time domain and transform domain. Time domain method [2] is very efficient and easy to implement, while transform domain method has the advantage of high robustness. The widely used transform domains for audio watermarking are discrete cosine transform (DCT) [3,4], discrete wavelet transform (DWT) [5,6] and fast Fourier transform (FFT) [7–9]. Moreover,

1986

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

some other transforms such as discrete fractional sine transform (DFRST) [10], LWT [11,12] and SVD [13–19] are also becoming more and more popular in the audio watermarking field. Actually, it is found that the conventional wavelet transform has very good performance because of its multiresolution property and perfect reconstruction [5,16]. However, the classic wavelet transform is mainly computed by convolution which results in high computation load. Besides, the generated floating numbers increase storage requirements. As a result, a new wavelet is designed and developed to increase the efficiency. First proposed by Sweldens [20], LWT is the second generation wavelet which is based on the traditional wavelet. In fact, the lifting scheme has several unique properties in comparison with the traditional wavelet: (1) LWT allows for an in-place implementation of the fast wavelet transform and the construction of wavelets without using the Fourier transform, a feature similar to the FFT. Hence LWT can be calculated more efficiently and needs less memory space. (2) It is particularly easy to build non-linear wavelet transforms and LWT has the time–frequency localization capability. (3) LWT coefficients are integers and do not have quantization errors unlike the traditional wavelet transform. Consequently, lifting based wavelet transform (LWT) is widely used in the audio watermarking field [11,12,21]. Currently, in the perspective of linear algebra, SVD is extensively applied in the robust watermarking to withstand attacks due to its unique and special characteristics. As a factorization of a real matrix and desirable transform, SVD transform has been applied widely in the image watermarking [22] first for ownership protection and extended to audio watermarking quickly [13–19]. Moreover, QIM [23] is also a very popular method for watermark embedding and data hiding. If LWT is combined with QIM method and SVD, it can reduce the operation time and achieve very robust results. In this paper, an efficient and robust watermarking algorithm for copyright protection based on LWT, SVD and QIM with synchronization code technique to withstand desynchronization attacks is proposed. The reason to combine LWT, SVD and QIM is that superior performance can be achieved. The host audio signal is decomposed by LWT and a meaningful binary image is used as the watermark data and is scrambled by chaotic signal. The scrambled watermark is embedded in the low frequency subband of the original host audio through the modification of the singular values (SVs). The organization of this paper is as follows. Related work is provided in Section 2. Section 3 presents the principle of LWT. SVD introduction is given in Section 4. Section 5 discusses the embedding method. Watermark extraction is described in Section 6. Section 7 presents the performance analysis. The experimental results and algorithm comparison are discussed in Section 8. Finally, Section 9 concludes the paper. 2. Related work In recent years, there are a lot of audio watermarking techniques proposed in the literature. For instance, Bassia

and Pitas propose an audio watermarking algorithm with the spread spectrum method in the time domain [2]. For the typical audio watermarking scheme, they can be divided into watermarking system with or without synchronization in a wide sense. The synchronization and self-synchronization technique have the ability to resist the cropping, shifting, time scale modification (TSM), pitch scale modification (PSM) and jittering attacks. For example, Lie and Chang [24] propose a time domain audio watermarking scheme based on human auditory system by exploiting the relation of the average of absolute amplitude differences. However, this group amplitude quantization method obtains a relatively low detection ratio under different attacks. Meanwhile, the histogram based audio watermarking algorithms against TSM and cropping attacks are introduced in [25,26]. In [25], Xiang and Huang recommend the histogram based audio watermarking scheme in the time domain to withstand the desynchronization attacks as histogram and mean are invariant to TSM, PSM and jittering attacks. The multi-bit watermark is inserted by controlling the histogram intensity. However, the data payload is very low. Besides, there is no security measure introduced in this method as the watermark position is not shuffled. In [26], Xiang et al. extend the histogram method to the DWT domain and scramble the watermark by pseudo noise (PN) sequence to improve the security. However, the capacity of this approach is still very low as it is only 2 bps. As we know, transform domain algorithm can improve robustness, thus most self-synchronization based audio watermarking methods concentrate on the transform domain. In [5], Wu et al. adopt a QIM method to embed watermark and achieve self-synchronization with the use of the DWT localization property. Although the DWT technique reduces the computation time significantly, this technique suffers from TSM and amplitude scaling attacks due to the single coefficient quantization. Furthermore, in [9], Megias et al. propose the recent self-synchronization audio watermarking algorithm by modifying FFT amplitude. The time domain insertion of synchronization code is combined with the FFT domain informative watermark embedding. This scheme is very fast and can be applied in realtime. Apart from the traditional transform techniques, LWT and lifting based method have also proven to have good performance in the audio watermarking field in recent years as lifting can further improve the computation time of the wavelet transform. For example, in [21], a LWT based audio watermarking is suggested for fast implementation of watermarking due to the time saving property of the lifting scheme. The binary watermark scrambled by PN sequence is inserted into the appropriate LWT coefficients in the low frequency domain. This algorithm achieves relatively good imperceptibility and robustness to filtering result. However, the robustness results after resampling, requantization and MP3 compression attack are not very good, and the overall result after MP3 compression attack is hardly acceptable too. In [11], Tao et al. propose a robust audio watermarking scheme in the LWT frequency domain based on the statistical characteristics of sub-band coefficients. In this scheme, the watermarking technique is invariant and the implementation efficiency is improved by the adoption of LWT.

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

1987

saves time and has the frequency localization feature which overcomes the weakness of the traditional wavelet. Actually, the main principle of the lifting wavelet is to construct a new wavelet with better characteristics based on a simple wavelet, which is also the basic idea of lifting. As the basis of integer wavelet transform, lifting wavelet algorithm generally comprises three steps: split/ merge, prediction and update. The detailed reasoning and proof of the lifting scheme is given in Ref. [20]:

In [10], another new transform, DFRST, is integrated with chaos technique in the audio watermarking algorithm. In this method, security is enhanced using the chaotic sequence. DFRST properties are also explored for the audio watermarking and security issue. However, no synchronization code technique is employed. The imperceptibility of this scheme is also not very good and needs further improvement. In the recent state-of-the-art publications on audio watermarking technique, SVD related audio watermarking is a very hot topic and has been widely developed and extensively studied due to the superior advantage of SVD over other transforms. For instance, Abd El-Samie [14] and Al-Nuaimy et al. [15] suggest an efficient SVD-based audio watermarking in the transform domain and use the chaotic sequence to shuffle the binary watermark to increase the confidentiality. Furthermore, Al-Nuaimy et al. [15] extend the proposed SVD audio watermarking and apply it in Bluetooth based systems and automatic speaker identification systems. However, from the reported results, the robustness needs further improvement. Besides, it is not robust to TSM and amplitude scaling modification attacks as there is no synchronization technique. In [19], Lei et al. propose a very robust SVD–DCT audio watermarking method. The audio watermarking method is better than the selected SVD based methods in terms of robustness and imperceptibility as claimed by the authors. At the same time, Wang et al. [18] propose a reduce SVD (RSVD) and distortion removal audio watermarking scheme. Rather than using the popular SVD watermarking method by modifying the SVs, this method adopts a different algorithm by taking advantage of the SV distortion, that is, RSVD. The U matrix is modified to embed the watermark bits. The audio fidelity is preserved by a threshold-based distortion control. However, in this method, synchronization is not provided and there is no security measure incorporated too.

(1) Split step: the split step is also called the lazy wavelet transform. The operation just splits the input signal x (n) into even and odd samples: Xe(n) and Xo(n) XeðnÞ ¼ xð2nÞ XoðnÞ ¼ xð2n þ 1Þ

ð1Þ

(2) Prediction step: keep even samples unchanged and use Xe(n) to predict Xo(n). The two signal subsets from split process should be closely correlated. The difference between the prediction value of P[Xe(n)] and the real value of Xo(n) is defined as detail signal d(n): dðnÞ ¼ XoðnÞP½XeðnÞ

ð2Þ

where P[d] is the predict operator. The detail signal d(n) denotes the high-frequency component of the original signal x(n). Thus the prediction step can be viewed as a high-pass filter. (3) Update step: introduce the update operator U [d], and use detail signal d(n) to update even samples Xe(n). Then the approximate signal c(n) denotes the lowfrequency component of the original signal. Therefore, this operation is viewed as a low-pass filter cðnÞ ¼ XeðnÞ þU½dðnÞ

ð3Þ

In fact, the reconstruction of lifting wavelet transform is an inverse process of decomposition. The lifting scheme of decomposition and reconstruction is illustrated in Fig. 1.

3. Wavelet lifting The lifting scheme is proposed to reduce computation time and memory requirement as lifting scheme adopts an in-place implementation of wavelet transform. Lifting wavelet simplifies the problem by directly analyzing the problem in integer domain. In addition, the lifting wavelet

Xe (n)

+

4. SVD principles and properties The traditional transform techniques such as FFT, DCT and DWT just decompose a signal in terms of a standard

c(n)

Xe (n)

-

X(n)

X(n) Split

P

Xo (n)

-

U

U

d(n)

P

Merge

+ Xo (n)

Fig. 1. Decomposition and reconstruction of lifting wavelet.

1988

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

basis set, which is not an optimal representation in some sense. Owing to the unique features and attractive properties such as stability with little disturbance, SVD [27] has been used in many signal processing applications. As a kind of orthogonal transforms and a numerical technique for diagonalizing matrix, SVD is a numerical technique for linear algebra in the transformed domain comprising basis states that are optimal in some sense. Recently, SVD is widely used in the watermarking field as a well-known numerical analysis tool in the sense that the slight modification of the large SVs will not affect the transparency of the cover object. The audio signal is also a kind of signal which can be viewed as a matrix. Thus the audio signal can also take advantage of the SVD property for robustness and transparency tradeoff. The SVD of a matrix A of size m  n is usually defined as 2

u1,1 6 A ¼ USV ¼ 4 ^ um,1 T

¼

 & 

m X n X r X

3 2 u1,r s1,1 6 ^ 7 5 4 ^ um,r 0 ui,k  sk,k  vk,j

 & 

3 2 v1,1 0 6 ^ 7 5 4 ^ vn,1 sr,r

 & 

3 v1,r T ^ 7 5 vn,r

ð4Þ

i¼1j¼1k¼1

where U is a m  r matrix, V is a r  n matrix and S is a r  r diagonal matrix with positive elements, the superscript T denotes matrix transposition, and r is the rank of matrix A. In SVD-based watermarking, a frame is treated as a matrix and decomposed into three matrices with SVD transformation. The diagonal elements of S are called SVs of A, which are nonnegative and assumed to be arranged in a decreasing order, that is, s1,1 4s2,2 4?4 sr,r. The attractive properties of the SVs [27,28] are explained as follows: Stability: let A,BARm  n and their corresponding SVs are s1,s2,y,sn and r1,r2,y,rn, respectively. The relation between them is established as 9si r i 9 oJABJ2 ,i ¼ 1,2,. . .,n. This indicates that the SVs have very good stability, i.e., when there is a little disturbance with a matrix, the variation of its SV is not greater than 2-norm of disturbance matrix. Proportionality: the SVs of kA are 9k9 times of the SVs of A. Transpose: A and its transposed counterpart AT have the same non-zero SVs. Flipping: A and its flipped versions about the vertical or the horizontal axes have the same non-zero SVs. Rotation: A and its rotated versions obtained through rotating A by an arbitrary angle have the same nonzero SVs. Scaling: if AARm  n, then its scaled version As has SVs pffiffiffiffiffiffiffiffiffi equal to Lr Lc times of the SVs of A, where Lr and Lc are the scaling factors of the rows and columns, respectively. Actually, the abovementioned SVD properties are very suitable for developing robust watermarking approaches in that the watermarked signal will not be corrupted by attacks such as rotation, noise addition and scaling. Furthermore, the embedded watermark is expected to be extracted effectively owing to these unique properties.

5. Embedding method 5.1. Watermark preprocessing Watermark should be first preprocessed in order to improve the robustness and enhance the confidentiality. The binary image will be chaotically scrambled before embedding to increase the safety of the watermarking technique. The binary image as watermark data is scrambled by a chaotic map which is reproduced in a permutated matrix. This paper uses a Skew tent map to enhance the confidentiality of the watermarking method. The skew tent map is defined as follows: (1 0 rxðnÞ o a a xðnÞ, xðn þ1Þ ¼ ð5Þ 1 1 a1 xðnÞ þ 1a , a r xðnÞ r1 where aA(0,1) is the system parameter. The initial value is adopted as key K1. Then the binary image logo or signature b(n) is scrambled by x(n) with the following rule: wðnÞ ¼ bðnÞ  xðnÞ,

1 r n rN w

ð6Þ

where Nw is the length of the watermark,  is the exclusive or (XOR) operator. After this random chaotic sequence encryption, the watermark is permuted and cannot be guessed by random search. 5.2. Synchronization code The watermark will have dislocation of the watermark regions due to desynchronization attacks. Synchronization code is an effective way to locate the position of hidden informative bits after the desynchronization attacks. Besides, such localized synchronization codes eliminate false alarm error due to data modification on watermark embedding. In the proposed method, we exploit a pseudo random sequence generated by chaotic signal as the synchronization code to increase the security of the synchronization code. Chaotic systems can have deterministic behavior which is very sensitive to initial conditions as the chaotic signals are uncorrelated and seem to be random in essence. Using a strongly chaotic nature we can ensure that the system is cryptographically secure. The synchronization code is generated by thresholding the Bernoulli shift map. The Bernoulli shift map belongs to one of the simplest deterministic chaotic maps which contain many chaotic characteristics. A binary shift Bernoulli Map can be defined as ( 2xðkÞ if 0 rxðkÞ o 12 xðk þ 1Þ ¼ ð7Þ 2xðkÞ1 if 12 r xðkÞ r1 where x(0)A(0,1) (map’s initial condition) is adopted as secret key K2 and must be specified. x(k) is mapped into the synchronization sequence C¼{c(k),1 rkrLsyn} with the following rule:  1 if xðkÞ 4 t ð8Þ cðkÞ ¼ 0 otherwise

t is a predefined threshold for synchronization code. Time domain embedding has the strength that it is less

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

Original Audio

Part 1

Partition

LWT

Segmentation

1989

Secret Key K2

Chaotic Encrypted Image

SVD

Watermark Embedding

Inverse SVD

Segment Reconstruction

Part 2

Synchronization Code

Synchronization Code Generation and Insertion

Secret Key K1

Inverse LWT

× Watermarked Audio

Fig. 2. Overview of watermark embedding process.

ð9Þ

of LWT. Fig. 2 presents the diagram of our watermark embedding algorithm. In our watermarking technique, we choose the popular QIM method in the embedding process because of its good robustness and blind nature [23]. As a result, our method is blind and does not need the original audio for the data extraction. The second part of the host audio, SB, is used to embed the watermark. Specifically, the embedding process is described by the following steps:

Then each bit of the synchronization code is embedded into each SA(k) as follows:

Step 1: perform LWT on the audio segment, SB, of the host audio signal

  8 SAðkÞ SAðkÞ > < round D D UD,   SA0 ðkÞ ¼ > UD þ D2 , : f loor SAðkÞ D

I ¼ LWTðSBÞ

computation intensive and incurs low cost in finding the synchronization code. Thus the synchronization code is hidden in the time domain to lower the calculation times. Before embedding, the synchronization code should be arranged into a binary data sequence. The synchronization code insertion part is cut into Lsyn audio segments and each audio segment has P samples denoted as 1 r kr Lsyn ,

SAðkÞ ¼ AðkUP þuÞ,

1 ru rP

if SynðkÞ ¼ 0

ð10Þ

where D denotes the embedding strength, round(  ) means rounding to the nearest integer, floor(  ) is rounding to minus infinity. After embedding, the embedded and attacked signal SA00 (k) is also split into Lsyn segments, and then the synchronization code is extracted by the following rule: ( 0

Syn ðkÞ ¼

D

rmodðSA00 ðkÞ, DÞ o 34D

0,

if

1,

otherwise

4

ð12Þ

if SynðkÞ ¼ 1

ð11Þ

where mod(  ) denotes modulus after division.

5.3. Watermark embedding As LWT has good space localization characteristics, and more than 90% of the signal energy is concentrated in the low frequency components, using LWT technique in audio watermarking scheme will result in very good anti-interference and anti-compression abilities. Therefore, we directly embed the signal into low frequency components

Step 2: the approximate coefficients after the LWT decomposition are divided into non-overlapping blocks. The length of audio blocks depends on the amount of data that needs to be embedded and the number of LWT decomposition levels. The watermark sequence is embedded successively into the blocks in the low-frequency subband. Step 3: scramble the watermark image with the method mentioned in Section 5.1. Step 4: for each block, perform SVD transform to obtain the SVs and first SVs, S(1,1) I ¼ USV T

ð13Þ

Step 5: embed the watermark into SVs with the QIM method. The encrypted watermark w(i) is added to the first SVs, S(1,1), of each block. Our watermark embedding method is based on the popular odd and even parity rule: Let Q¼round(S(1,1)/b), D ¼mod(Q,2), where b is the quantization step. A small value of b will lead to good imperceptibility of the watermarking scheme but low robustness to the attacks. Thus we choose an optimal b for tradeoff between inaudibility and robustness of the

1990

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

watermark. The embedding rule is given as If D is 0 and w(i) is 1, then Q¼Qþ1; if D is 1 and w(i) is 0, then Q¼Qþ1. Step 6: the first SVs are further modified by the updated Q as follows:

watermark is embedded. As the extracted bits are random and independent values, Pfp can be defined as  Nw  X Nw ð22Þ Pf p ¼ ðPe Þk ð1Pe ÞNw k k k ¼ th1

Sw ð1,1Þ ¼ b  roundðQ Þ

where Pe ¼P(w¼w0 9no watermark). In our scheme, the watermarked and unwatermarked bits are either 0 or 1, therefore, the probability of Pe is 0.5, that is  Nw  Nw 1 X Pf p ¼ N ð23Þ k 2 w k ¼ th1

ð14Þ

Step 7: Sw(1,1) is used to build the watermarked block Iw by applying inverse SVD as follows: Iw ¼ USw V T

ð15Þ

Step 8: inverse LWT is conducted to reconstruct the watermarked signal SBw ¼ LWT 1 ðIw Þ

ð16Þ

6. Watermark extraction The main step of watermark extraction is as follows:

BER ¼

Step 1: perform LWT on the watermarked signal Ie ¼ LWTðSBw Þ

ð17Þ

Step 2: for the obtained wavelet approximation coefficient, block based method is also used, that is, we divide the LWT approximate coefficients into different blocks too. Step 3: SVD is performed in each block Ie ¼ U e Se V e T

ð18Þ

Step 4: let Qe ¼ round(Se(1,1)/b) and De ¼mod(Qe,2), then the extraction rule is ( 1 De ¼ 1 w0 ðnÞ ¼ ð19Þ 0 De ¼ 0

Step 5: perform the decryption with the same chaotic sequence to get the hidden binary image or signature: 0

b ðnÞ ¼ w0 ðnÞ  xðnÞ

ð20Þ

7. Performance analysis 7.1. Error analysis There are two types of errors in searching for synchronization codes, false positive error and false negative error. A false positive error is defined as the watermark decoder falsely classifying other signal as watermark when there is no watermark. The probability of a false positive error is usually denoted as P f p ¼ PðPðw,w0 Þ Z th19no watermarkÞ

Actually, in our scheme, Nw ¼1024, if th1¼0.75  Nw, then Pfp ¼2.883  10–60, which means the false positive error is almost zero and can rarely be noticed. On the other hand, a false negative error occurs when the existing watermark is not detected. The probability of false negative error can be calculated as follows:  Nw  X Nw Pf n ¼ ð24Þ ðBERÞk  ð1BERÞNw k k k ¼ th1

ð21Þ

where th1 is an application-dependent threshold and the probability is measured under the assumption that no

Nw 1 X wðnÞ  w0 ðnÞ Nw n ¼ 1

ð25Þ

where w(n) and w0 (n) are the original and extracted watermarks. In our scheme, if th1¼0.75  Nw, Pfn is almost equal to 0. 7.2. Security analysis For a secure audio watermarking scheme, robustness against attacks is an important issue. To enhance the security, the key space should be large enough to make brute force attack infeasible. Moreover, secret keys are adopted for security purpose. The digital computer often stores floating-point number using 32 bits, which consist of an exponent being represented using eight bits and a significant number being represented using 24 bits. In our scheme, keys or initial conditions are floating-point numbers. Hence the exponent is fixed while the significant number may be varied. Hence the total number of possible initial conditions is 2 to the power of 24 which is more than 16 million. This number can also be greatly increased if so desired through the use of double precision floating point numbers. Actually, the security of information system depends on keys rather than the privacy of the scheme. In our proposed audio watermarking scheme, we use keys K1, K2 to generate chaotic sequences for enhancing the security of the proposed scheme. Hence, the size of the key value space influences the security of the proposed scheme. As keys K1, K2 are both used in our scheme, we take K1 as an example and compute its key value space as follows: Suppose K1 ¼{0oK1(i) o19i¼1, 2, y, N1}, N1 is an integer which is used to generate the Skew tent map sequence, thus it should be large enough to produce chaotic sequences Y ¼{y(i, j) 9i¼1, 2, y, N1, j¼1, 2, y, N2}, where N1 denotes the number of chaotic sequences and N2 represents the length of each chaotic sequence. When K 01 ¼ f0 oK 1 ðiÞ þ d o 19i ¼ 1,2,. . .,N 1 g, generate

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

1991

0.035 K1 K2

0.03

f(d)

0.025 0.02 0.015 0.01 0.005 0 10-5

10-10

10-15

10-20

initial value difference Fig. 3. Key space under varying differences in initial values.

another group of chaotic sequences Y0 ¼{y0 (i, j)9i¼1, 2, y, N1, j¼1, 2, y, N2}. Utilize function f¼S(d) to test key space of K1 PN 1 f ¼ SðdÞ ¼

i¼1

PN2

j¼1

9yði,jÞy0 ði,jÞ9

N1  N2

ð26Þ

Fig. 3 plots the function f¼S(d). It can be seen that f is equal to 0 when d0 ¼10  17 in our method. Thus the key space of K1 is 1/d0 ¼1017. Similarly, the key spaces of K2 can be computed in the same way as K1, which can also be seen from Fig.3. Therefore, the total key space of our watermarking scheme is 1034, which means that there is enough key space to guarantee high confidentiality of our proposed watermarking system. Furthermore, the employed synchronization codes can be used to prevent the signal from being detected by the intelligent attackers and the proposed audio watermarking has different combinations of the secret key. Based on this security analysis, it can be concluded that the embedded watermarks are secure to attackers who try to exhaustively or statistically detect and read them. All in all, our proposed scheme with such a long key is adequate for reliable and practical use.

8. Experimental results In this section, several experiments are conducted to demonstrate the performance of the proposed LWT–SVD based audio watermarking approach. The performance of our scheme is assessed in terms of robustness and imperceptibility. The proposed scheme has been tested against a great amount of scenarios, such as the sound quality assessment material (SQAM) [29] clips, full songs (classical, pop and rock music) and human voice signal. The test audio signal in our scheme is 44.1 kHz sampled, with 16 bits/sample. 32  32 binary image logo is used in our scheme to conduct performance evaluation. LWT decomposition level is set at 3. In our experiment, SNR and Segmental SNR (SegSNR) are used for the evaluation of the quality of the watermarked audio signals. BER is used for evaluating the reliability of the extracted watermarks. SNR and SegSNR are defined as follows: ! X L L X SNR ¼ 10 log10 ð28Þ SðiÞ ðS0 ðiÞSðiÞÞ2 i¼1

SegSNR ¼

The data embedding payload (also known as the capacity) of a watermarking scheme is defined as the number of bits that can be embedded and recovered in the audio stream, and is often measured in bps. Several measures have been suggested for payload. Here, the retrieval payload relative to the size of the marked signal is used, and denoted as Nw Time

Pr K 1 10 X S2 ðiÞ log10 Pr i ¼ 01 2 K m¼0 i ¼ 1 ðS ðiÞSðiÞÞ

ð29Þ

where S(i) and S0 (i) correspond to the original and the watermarked signals, respectively.

7.3. Data payload

Payload ¼

i¼1

ð27Þ

where Time is the duration of the host audio. For our scheme, Nw ¼1024 bits is embedded in a6-s host audio, thus the payload of our method is 170.67 bps. This is a relatively high payload as typical payload is 20–50 bps.

8.1. Imperceptibility test For the imperceptibility test, the time domain waveforms and the spectra in the frequency domain are presented for performance evaluation. The time domain waveforms show the difference between the original and watermarked waveforms in time domain, while the spectrum can illustrate the differences in the frequency domain. Fig. 4 presents the waveforms of the original, watermarked and residual signals. Fig. 5 shows the spectra of the original and watermarked signals. The average SNR and SegSNR results of all the test signals versus the different quantization steps, b, are shown in

1992

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

Original audio signal 1 0 -1 0.5

1

1.5

2

2.5 x 105

2

2.5 x 105

2

2.5

Watermarked audio signal 1 0 -1 0.5

1

1.5

Waveform of the residual audio signal 1 0 -1 0.5

1

1.5

x 105 Fig. 4. Waveform of the original, watermarked and residual signals.

The spectrogram of the host audio signal

x 104

Frequency

2 1.5 1 0.5 0 0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Time The spectrogram of the watermarked audio signal

x 104

Frequency

2 1.5 1 0.5 0 0.5

1

1.5

2

2.5 3 Time

3.5

4

4.5

5

Fig. 5. Spectra of the original and watermarked signals.

Fig. 6. From the waveforms and spectra, it can be observed that there is not much distinguishing difference between the original and the watermarked audio, which is also verified by the SNR and SegSNR results in Fig. 6. The SNR and SegSNR results are all above 20 dB even when the quantization step is 1 and these results more than satisfy the IFPI requirements [1].

8.2. Subjective listening test As we know, SNR is a simple way to present the sense of imperceptibility by measuring the signal distortion caused by watermarking. However, human perception may not corroborate well with the SNR measure. Consequently, subjective quality evaluation of the watermarking methods must be

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

1993

SNR and SegSNR vs. quantization step 45 SegSNR SNR

40

dB

35 30 25 20 15 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Quantization Step Fig. 6. SNR, SegSNR results versus quantization steps.

MOS scores of the test sequences 5 4.5 4

MOS score

3.5 3 2.5 2 1.5 1 0.5 0

1

2

3

4

5

6

7 8 9 10 11 12 13 14 15 Test sequence

Fig. 7. MOS scores of the 15 test sequences.

conducted to provide a better test of inaudibility based on human perception. In our experiment, we perform an informal subjective listening test to evaluate the perceptual quality of the watermarked audio. Ten listeners involved in the listening tests are asked to classify the difference between the original and the watermarked audio in terms of a 5-point Mean Opinion Score (MOS) with impairment scale defined as 5: imperceptible, 4: perceptible but not annoying, 3: slightly annoying, 2: annoying, 1: very annoying. In this way, the inaudibility of our watermarking schemes has been certified through listening tests. Fig. 7 plots the MOS scores of the 15 test sequences. The average MOS for the tested audio excerpts is 4.82 for our algorithm, which means that the

watermarked audio and the original audio are perceptually indistinguishable. 8.3. Robustness to common signal processing attacks The watermarking scheme evaluation should compose of blind attacks (such as MP3 compression, noise addition, etc.) and intentional attacks. The non-intentional attacks do not know whether the watermark exists and where the watermark is, while the intentional attacks want to know where and what has been embedded and will try to remove it. The blind attacks (common signal processing) and desynchronization attacks are used to estimate the

1994

Table 1 Comparison of algorithms based on robustness to common signal processing attacks. Attack

No attack

Requantization

Resampling 22.05 kHz

Resampling 11.025 kHz

Resampling 8 kHz

BER Wu et al. [5]

0

0

0.501

0.489

0.482

BER Wang and Zhao [6]

0

0

0.5029

0.4902

0.5020

BER Ours

0

0

0

0

0.0586

BER Attack Huang et al. [4]

0 Additive noise

0 Echo addition

0 Equalization

0 Pitch shift 11 higher

0 Pitch shift 11 lower

BER Wu et al. [5]

0.026

0.190

0.159

0.501

0.504

BER Wang and Zhao [6]

0.0195

0.1914

0.1504

0.5156

0.4580

BER Ours

0

0.0078

0.0039

0.5166

0.501

Huang et al. [4]

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

0 Low-pass filtering 4 kHz

0.002 MP3—256 kbps

0.0027 MP3—128 kbps

0.064 MP3—96 kbps

0.059 MP3—64 kbps

BER Wu et al. [5]

0.491

0

0

0.502

0.498

BER Wang and Zhao [6]

0.5019

0

0

0

0.5020

BER Ours

0

0

0

0.0107

0.0225

BER Attack Huang et al. [4]

0 Cropping (10%) (front)

0 Cropping (10%) (middle)

0 Amplitude scaling up to 110%

0 Amplitude scaling down to 90%

0 Adding (10%) (front)

BER Wu et al. [5]

0

0

0.498

0.508

0.500

BER Wang and Zhao [6]

0

0

0.5049

0.4815

0.5078

BER Ours

0

0

0.2363

0.2549

0

BER Attack Huang et al. [4]

0 TSM (þ 1%)

0 TSM ( 1%)

0.158 TSM ( 2%)

0.162 Jittering (1/100,000)

0 Jittering (1/50,000)

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

BER Attack Huang et al. [4]

1995

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

0.0186 0.1455 0 BER

0

0

0.0215 0.1543 0.001 BER Ours

0

0

0.1406 0.0488 0.4854 0.5195 0.4785 BER Wang and Zhao [6]

0.484 0.482 0.503 BER Wu et al. [5]

Resampling 22.05 kHz No attack

Requantization

0.045

0.140

robustness of our scheme. In our experiment, the parameters of these common signal processing manipulations are given as follows:

Attack

Table 1 (continued )

Resampling 11.025 kHz

Resampling 8 kHz

1996

Re-quantization: 16-bit watermarked audio signal is requantized to 8 bits and back to 16 bits. Resampling: watermarked audio signals with original sampling rate of 44.1 kHz have been subsampled down to 22.05 kHz, 11.025 kHz and 8 kHz, and upsampled back to 44.1 kHz. Additive noise: white Gaussian noise with 1% of the power of the audio signal is added. Low-pass filtering: low-pass filtering using a second order Butterworth filter with cut-off frequency of 4 kHz is performed on the watermarked audio signals. Echo addition: an echo signal with a delay of 10 ms and a decay of 10% has been added to the original audio signal. Equalization: the ‘‘Hum Removal’’ preset of the audio editing tool (CoolEdit Pro2.1) is used, which is a 6-band graphic equalizer. The 50, 100, 150, 200, 250 and 300 Hz frequency bands are boosted by 18 dB. MP3 compression: the robustness against the low-rate codec is tested using MPEG 1 Layer III compression (MP3) with compression rates of 64, 96, 128 and 256 kbps. Cropping: 10% of the samples of each testing signal are cropped at front and middle positions. Adding: 10% of the samples of each testing signal are added into the front of the host signal. Amplitude variation: the watermarked signal is attenuated up to 110% and down to 90%. Pitch shifting: tempo-preserved pitch shifting is a difficult attack for audio watermarking algorithms as it causes frequency fluctuation. In our experiment, the pitch is shifted 11 higher and 11 lower. TSM: TSM processing is done in the watermarked audio signal to change the time scale to 71%,  2%, while preserving the pitch. Jittering: jittering is a small rapid variation. One sample out of every 100,000 and 50,000 samples is removed in our jittering experiment.

The above mentioned attacks are used to evaluate the robustness of watermarking algorithms. The BER results of the extracted watermarks after attacks are the best indication of the robustness of the watermarking algorithm. Table 1 demonstrates the robustness results of our proposed scheme. The results of DCT domain method in [4], DWT domain method in [5] and DWT–DCT domain method in [6] are also given in Table 1 for comparison purposes. From Table 1, it is obvious that the robustness of our method is much better than the algorithms in [4,5], and slightly better than the algorithm in [6]. The reason why our method outperforms all these selected methods [4–6] in robustness is mainly because of the advantageous properties of LWT–SVD over single DWT or DCT transform, and the hybrid DWT–DCT transform. Both LWT and SVD have very attractive features for robust watermarking and the combination of them actually enhance the robustness too.

Table 2 Comparison of algorithms based on robustness to Stirmark attacks. Cox et al. [3]

¨ zer et al. [13] O

Ours

Cox et al. [3]

¨ zer et al. [13] O

Ours

Addbrumm

1.25

0

0

Fft_stat1

19.84

0.5

0.13

AddDynNoise

1.56

0

0

Fft_test

19.80

0.4

0.026

AddFFTNoise

51.25

0

0

Flipsample

21.66

0.75

0.029

Addnoise

0.78

0

0

Invert

52.42

0

0

Addsinus

0.77

0

0

Lsbzero

0

0

0

Amplify

52.32

0.75

0

Normalize

0

0

0

Bassboost

0

0

0

Nothing

0

0

0

Compressor

0

0

0

Rc_highpass

2.03

0

0

Copysample

100

0.5

0.2

Rc_lowpass

0

0

0

Cutsamples

100

0

0

Smooth

0

0

0

Echo

23.43

0

0

Stat1

0

0

0

Extracted watermarks of our method

1997

Attacks

Extracted watermarks of our method

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

Attacks

1998

Table 2 (continued ) Attacks

Cox et al. [3]

¨ zer et al. [13] O

Ours

Cox et al. [3]

¨ zer et al. [13] O

Ours

Exchange

0

0

0

Stat2

0

0

0

Extrastereo

0

0

0

Voiceremove

52.1

0

0

Fft_hlpass

0.31

0

0

Zerocross

0

0

0

Fft_invert

52.6

0

0

Zerolength

60.5

0

0

Fft_real_reverse

0.78

0

0

Zeroremove

100

0

0

Average of all attacks

22.2937

0.0906

0.012

Extracted watermarks of our method

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

Attacks

Extracted watermarks of our method

Table 3 Summary of algorithm comparison. Method

Bassia et al. [2] Huang et al. [4] Cvejec and Seppanen [8] Wu et al. [5] Lie and Chang [24]

Spread spectrum No DCT quantization Yes Spread spectrum Yes

DWT QIM Amplitude modiftication Wang and Zhao [6] DWT–DCT quantization Li et al. [7] Spread spectrum Xiang and Huang Histogram [25] Xiang et al. [26] DWT-based Histogram Fan and Wang [10] Chaos-based DFRST Ercelebi and LWT Batakci [21] Megias et al. [9] Ampulitude modification ¨ zer et al. [13] O STFT–SVD Al-Haj and DWT–SVD Mohammad [17] Abd El-Samie [14] SVD Al-Nuaimy et al. SVD [15] Bhat K. et al. [16] DWT–SVD Wang et al. [18] FFT–RSVD Lei et al. [19] SVD–DCT Ours LWT–SVD

Synchronization SNR (dB)

Payload (bps)

Subjective test reported

Blind Audio content

22 43.0 N/A

44.1 36 27.1

Yes No 4.71

Yes Yes No

Yes Yes

30.64 24.5

172.41 43.1

No Yes

Yes

43.1

N/A

Yes Yes

29.5 440

Yes

Secret keys used

Embedding domain

Stirmark attack test

Error analysis reported

Security analysis provided

4 Audio excerpts Yes 2 Audio signals BCH coding 21 Audio pieces PN sequence

Time DCT FFT

No No No

No No No

No No No

Yes Yes

2 Audio signals 3 Music signals

No section length

DWT Time

No No

Yes Yes

No Yes

No

Yes

Yes

DWT–DCT

Yes

No

No

4.27 3

4.9 4.07

Yes Yes

Music and speech 5 Music signals 6 Music signals

Frame length No

FFT subband Time

Yes Yes

Yes No

No No

42.8

2

3.34

Yes

6 SQAM clips

PN sequence

DWT

Yes

No

No

No

30

86

No

Yes

6 Music signals

Chaotic sequnce

DRFST

No

Yes

Yes

No

25.93 N/A

Yes

No

9 Audio signals

LWT

No

No

No

Yes

25.7

4.69

No No

28.36 32 28.55 N/A

No No Yes No Yes Yes

Time

No

No

No

4.7 4.33

pseudo random sequence Yes 6SQAM clips and Yes songs Semi 3 Audio clips No No Musicand speech No

STFT–SVD DWT

Yes Yes

No No

No No

27.13 N/A 27.13 N/A

Yes No

No No

1 Music clip 1 Music clip

Chaotic sequnce Chaotic sequnce

SVD SVD

No No

No No

No No

24.37 27.23 32.53 40

Yes 4.36 Yes 4.82

Yes Yes Yes Yes

5 Music files 25 Music files 15SQAM clips 15 QAM clips and songs

Yes No Chaotic sequnce Chaotic sequnce

DWT FFT–SVD SVD–DCT LWT–SVD

No No Yes Yes

Yes No Yes Yes

No No No Yes

33.09

45.9 187 43 170.67

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

Reference

1999

2000

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

8.4. Robustness to Stirmark attacks The robustness of our scheme is also benchmarked against Stirmark attack which is a very popular benchmark tool for audio watermarking. The parameters of these standardized attacks are the default values set out in the software configuration. The detailed benchmark attack results are summarized in Table 2. We also compare our method with the selected related watermarking methods in [3,13]. From the comparison results, it is noted that our method is slightly better than the watermarking scheme in [13] but much better than the watermarking scheme in [3] under the Stirmark attacks.

8.5. Algorithm comparisons and discussions In this section, the proposed scheme is compared with other state-of-the-art and related audio watermarking schemes in the literature. The selected watermarking schemes are typical audio watermarking schemes with self-synchronization or without synchronization, with and without SVD transform, and other conventional audio watermarking schemes. The detailed comparisons among these audio watermarking schemes in terms of robustness, imperceptibility, payload and other features are listed in Table 3. Note that if the detailed subjective results (MOS score) are not provided in the references, we just report yes in Table 3 for the subjective test category. If an algorithm involves convolution, DCT or Fourier transforms, it is computational intensive and the execution time is high. Some insights gleaned from Table 3 are summarized as follows: (1) Most of the recent publications focus on one or two features, such as synchronization, efficiency, robustness, imperceptibility. (2) Very few of them provide security analysis. (3) A higher embedding rate often leads to unsatisfactory robustness for the same amount of attacking disturbance. A suitable embedding rate should be between 20 bps and 50 bps. From the comparison results in Table 3, we can see that our proposed LWT–SVD algorithm can obtain a relatively high payload and good transparency results. It is capable of achieving moderately high SNR results, as SNR results in our scheme are not as high as those in [5,6,25,26], but are higher than most of the selected schemes. Besides, the payload in our method is 170.67 bps, which is lower than that in [5,6] but is relatively high compared to the rest of the selected schemes. It is also above 20 bps which more than satisfies the IFPI requirements. Apart from the security consideration, running speed of the algorithm is also an important aspect for a good watermarking scheme. The error and security analysis are also provided as well as the robustness against detailed Stirmark attack. In addition, our proposed scheme is blind in nature. All in all, it can be concluded that the proposed algorithm not only achieves a satisfactory compromise between robustness, payload,

imperceptibility and time complexity but also more than meets all the IFPI requirements. 9. Conclusions In this paper, a very robust and blind audio watermarking scheme is proposed which makes good use of the features of SVD, LWT, synchronization code technique and QIM. The proposed method is similar to SVs modification algorithm in the low-frequency subband. The robustness of our scheme is validated by common signal processing and Stirmark attacks. The performance analyses and comparison results indicate that the proposed watermarking scheme maintains good audio quality and high robustness against various attacks, including MP3 compression, low-pass filtering, amplitude scaling, time scaling, cropping, jittering, sampling rate change, bit resolution transformation, additive noise, echo addition and equalization. References [1] S. Katzenbeisser, F.A.P. Petitcolas, Information Hiding Techniques for Steganography and Digital Watermarking, Artech House Norwood Mass, USA, 2000. [2] P. Bassia, I. Pitas, N. Nikolaidis, Robust audio watermarking in the time domain, IEEE Transactions on Multimedia 3 (2) (2001) 232–241. [3] I.J. Cox, J. Kilian, F.T. Leighton, T. Shamoon, Secure spread spectrum watermarking for multimedia, IEEE Transactions on Image Processing 6 (12) (1997) 1673–1687. [4] J.W. Huang, Y. Wang, Y.Q. Shi, A blind audio watermarking algorithm with self-synchronization, in: Proceedings of IEEE International Symposium on Circuits and Systems, 2002, pp. 627–630. [5] J. Wu, D. Huang, Y.Q. Huang, Shi, Efficiently self-synchronized audio watermarking for assured audio data transmission, IEEE Transactions on Broadcasting 51 (1) (2005) 69–76. [6] X.-Y. Wang, H. Zhao, A novel synchronization invariant audio watermarking scheme based on dwt and dct, IEEE Transactions on Signal Processing 54 (12) (2006) 4835–4840. [7] W. Li, X. Xue, P. Lu, Localized audio watermarking technique robust against time-scale modification, IEEE Transactions on Multimedia 8 (1) (2006) 60–69. [8] N. Cvejic, T. Seppanen, Spread spectrum audio watermarking using frequency hopping and attack characterization, Signal Processing 84 (1) (2004) 207–213. [9] D. Megias, J. Serra-Ruiz, M. Fallahpour, Efficient self-synchronised blind audio watermarking system based on time domain and fft amplitude modification, Signal Processing 90 (12) (2010) 3078–3092. [10] M. Fan, H. Wang, Chaos-based discrete fractional sine transform domain audio watermarking scheme, Computers and Electrical Engineering 35 (3) (2009) 506–516. [11] Z. Tao, H.-M. Zhao, J. Wu, J.-H. Gu, Y.-S. Xu, D. Wu, A lifting wavelet domain audio watermarking algorithm based on the statistical characteristics of sub-band coefficients, Archives of Acoustics 35 (4) (2010) 481–491. [12] D. Kundur, D. Hatzinakos, Digital watermarking using multiresolution wavelet decomposition, in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 1998, pp. 2969–2972. ¨ zer., B. Sankur, N. Memon, A svd-based audio watermarking [13] H. O technique, in: Proceedings of the Seventh Workshop on Multimedia and Security, ACM, New York, NY, USA, 2005, pp. 51–56. [14] F.E. Abd El-Samie, An efficient singular value decomposition algorithm for digital audio watermarking, International Journal of Speech Technology 12 (1) (2009) 27–45. [15] W. Al-Nuaimy, M.A.M. El-Bendary, A. Shafik, F. Shawki, A.E. AbouEl-azm, N.A. El-Fishawy, S.M. Elhalafawy, S.M. Diab, B.M. Sallam, F.E. Abd El-Samie, H.B. Kazemian, An svd audio watermarking approach using chaotic encrypted images, Digital Signal Processing doi:10.1016/j.dsp.2011.01.013.

B. Lei et al. / Signal Processing 92 (2012) 1985–2001

[16] V. Bhat K., I. Sengupta, A. Das, An adaptive audio watermarking based on the singular value decomposition in the wavelet domain, Digital Signal Processing 20 (6) (2010) 1547–1558. [17] A. Al-Haj, A. Mohammad, Digital audio watermarking based on the discrete wavelets transform and singular value decomposition, European Journal of Scientific Research 39 (1) (2010) 6–21. [18] J. Wang, R. Healy, J. Timoney, A robust audio watermarking scheme based on reduced singular value decomposition and distortion removal, Signal Processing 91 (8) (2011) 1693–1708. [19] B.Y. Lei, I.Y. Soon, Z. Li, Blind and robust audio watermarking scheme based on SVD–DCT, Signal Processing 91 (8) (2011) 1973–1984. [20] W. Sweldens, The lifting scheme: a custom-design construction of biorthogonal wavelets, Applied and Computational Harmonic Analysis 3 (2) (1996) 186–200. [21] E. Ercelebi, L. Batakci, Audio watermarking scheme based on embedding strategy in low frequency components with a binary image, Digital Signal Processing 19 (2) (2009) 265–277. [22] R. Liu, T. Tan, A svd-based watermarking scheme for protecting rightful ownership, IEEE Transactions on Multimedia 4 (1) (2002) 121–128.

2001

[23] G.W.W.B. Chen, Quantization index modulation: a class of provably good methods for digital watermarking and information embedding, IEEE Transactions on Information Theory 47 (2001) 1423–1443. [24] W.N. Lie, L.C. Chang, Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification, IEEE Transactions on Multimedia 8 (1) (2006) 46–59. [25] S. Xiang, J. Huang, Histogram-based audio watermarking against time-scale modification and cropping attacks, IEEE Transactions on Multimedia 9 (7) (2007) 1357–1372. [26] S. Xiang, H.J. Kim, J. Huang, Audio watermarking robust against time-scale modification and mp3 compression, Signal Processing 88 (10) (2008) 2372–2387. [27] H.C. Andrews, C.L. Patterson, Singular vale decomposition and digital image processing, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP 24 (1) (1976) 26–53. [28] E. Biglieri, K. Yao, Some properties of singular value decomposition and their applications to digital signal processing, Signal Processing 18 (3) (1989) 277–289. [29] EBU, Sqam—sound quality assessment material. /http://sound. media.mit.edu/mpeg4/audio/sqam/S (last checked 30.05.11).