Signal Processing 92 (2012) 1985–2001
Contents lists available at SciVerse ScienceDirect
Signal Processing journal homepage: www.elsevier.com/locate/sigpro
A robust audio watermarking scheme based on lifting wavelet transform and singular value decomposition Baiying Lei a,n, Ing Yann Soon a, Feng Zhou b, Zhen Li a, Haijun Lei c a b c
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore The George W. Woodruff School of Mechanical Engineering, The Georgia Institute of Technology, Atlanta, GA, USA College of Computer Science and Technology, Shenzhen University, Shenzhen, China
a r t i c l e i n f o
abstract
Article history: Received 18 June 2011 Received in revised form 18 October 2011 Accepted 21 December 2011 Available online 8 January 2012
In this paper, a new and robust audio watermarking scheme based on lifting wavelet transform (LWT) and singular value decomposition (SVD) is proposed. Speciﬁcally, the watermark data is efﬁciently inserted in the coefﬁcients of the LWT low frequency subband taking advantage of both SVD and quantization index modulation (QIM). The use of QIM renders our scheme blind in nature. Furthermore, the synchronization code technique is also integrated into our hybrid LWT–SVD audio watermarking method. Experimental and analysis results demonstrate that the proposed LWT–SVD method is not only robust against both general signal processing attacks and desynchronization attacks but also achieve a very good tradeoff between robustness, imperceptibility and payload. Comparisons with the typical and related audio watermarking algorithms also show that our proposed method outperforms most of the selected algorithms. & 2012 Published by Elsevier B.V.
Keywords: Audio watermarking Lifting wavelet transform Robust watermarking Singular value decomposition Quantization index modulation
1. Introduction Recently, audio watermarking is a very hot research topic and attracts a lot of interests as one of the most popular approaches for providing copyright protection. As a result, there is a great amount of stateoftheart publications in the literature concerning this topic. As stated in the International Federation of the Phonographic Industry (IFPI) [1], an effective audio watermarking scheme needs to have the following properties or characteristics: (1) imperceptibility: the quality of the audio signal should not degrade after adding the watermark. Imperceptibility usually can be evaluated using both objective and subjective measures. Besides, signal to noise ratio (SNR) should be more than 20 dB. (2) Robustness: ability to extract a watermark from a watermarked audio signal after various signal n
Corresponding author. Tel.: þ65 6790 6548; fax: þ 65 6792 0415. Email addresses:
[email protected] (B. Lei),
[email protected] (I. Yann Soon),
[email protected] (F. Zhou),
[email protected] (Z. Li),
[email protected] (H. Lei). 01651684/$  see front matter & 2012 Published by Elsevier B.V. doi:10.1016/j.sigpro.2011.12.021
processing attacks. Watermarking should be able to prevent unauthorized detection and removal, unless the quality of audio becomes very poor. (3) Payload: the amount of data that can be embedded into the host audio signal without losing imperceptibility should usually be more than 20 bits per second (bps). (4) Security: watermarked signals should be secure and not revealed by any clues about the watermarks in them, which often depend on secret keys rather than the secrecy of the watermarking algorithm. Actually, robustness, imperceptibility and payload are three main requirements that are mutually conﬂicting. Therefore, tradeoffs are needed when designing and developing a new and successful watermarking scheme. In general, for robust audio watermarking, they can be broadly classiﬁed into time domain and transform domain. Time domain method [2] is very efﬁcient and easy to implement, while transform domain method has the advantage of high robustness. The widely used transform domains for audio watermarking are discrete cosine transform (DCT) [3,4], discrete wavelet transform (DWT) [5,6] and fast Fourier transform (FFT) [7–9]. Moreover,
1986
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
some other transforms such as discrete fractional sine transform (DFRST) [10], LWT [11,12] and SVD [13–19] are also becoming more and more popular in the audio watermarking ﬁeld. Actually, it is found that the conventional wavelet transform has very good performance because of its multiresolution property and perfect reconstruction [5,16]. However, the classic wavelet transform is mainly computed by convolution which results in high computation load. Besides, the generated ﬂoating numbers increase storage requirements. As a result, a new wavelet is designed and developed to increase the efﬁciency. First proposed by Sweldens [20], LWT is the second generation wavelet which is based on the traditional wavelet. In fact, the lifting scheme has several unique properties in comparison with the traditional wavelet: (1) LWT allows for an inplace implementation of the fast wavelet transform and the construction of wavelets without using the Fourier transform, a feature similar to the FFT. Hence LWT can be calculated more efﬁciently and needs less memory space. (2) It is particularly easy to build nonlinear wavelet transforms and LWT has the time–frequency localization capability. (3) LWT coefﬁcients are integers and do not have quantization errors unlike the traditional wavelet transform. Consequently, lifting based wavelet transform (LWT) is widely used in the audio watermarking ﬁeld [11,12,21]. Currently, in the perspective of linear algebra, SVD is extensively applied in the robust watermarking to withstand attacks due to its unique and special characteristics. As a factorization of a real matrix and desirable transform, SVD transform has been applied widely in the image watermarking [22] ﬁrst for ownership protection and extended to audio watermarking quickly [13–19]. Moreover, QIM [23] is also a very popular method for watermark embedding and data hiding. If LWT is combined with QIM method and SVD, it can reduce the operation time and achieve very robust results. In this paper, an efﬁcient and robust watermarking algorithm for copyright protection based on LWT, SVD and QIM with synchronization code technique to withstand desynchronization attacks is proposed. The reason to combine LWT, SVD and QIM is that superior performance can be achieved. The host audio signal is decomposed by LWT and a meaningful binary image is used as the watermark data and is scrambled by chaotic signal. The scrambled watermark is embedded in the low frequency subband of the original host audio through the modiﬁcation of the singular values (SVs). The organization of this paper is as follows. Related work is provided in Section 2. Section 3 presents the principle of LWT. SVD introduction is given in Section 4. Section 5 discusses the embedding method. Watermark extraction is described in Section 6. Section 7 presents the performance analysis. The experimental results and algorithm comparison are discussed in Section 8. Finally, Section 9 concludes the paper. 2. Related work In recent years, there are a lot of audio watermarking techniques proposed in the literature. For instance, Bassia
and Pitas propose an audio watermarking algorithm with the spread spectrum method in the time domain [2]. For the typical audio watermarking scheme, they can be divided into watermarking system with or without synchronization in a wide sense. The synchronization and selfsynchronization technique have the ability to resist the cropping, shifting, time scale modiﬁcation (TSM), pitch scale modiﬁcation (PSM) and jittering attacks. For example, Lie and Chang [24] propose a time domain audio watermarking scheme based on human auditory system by exploiting the relation of the average of absolute amplitude differences. However, this group amplitude quantization method obtains a relatively low detection ratio under different attacks. Meanwhile, the histogram based audio watermarking algorithms against TSM and cropping attacks are introduced in [25,26]. In [25], Xiang and Huang recommend the histogram based audio watermarking scheme in the time domain to withstand the desynchronization attacks as histogram and mean are invariant to TSM, PSM and jittering attacks. The multibit watermark is inserted by controlling the histogram intensity. However, the data payload is very low. Besides, there is no security measure introduced in this method as the watermark position is not shufﬂed. In [26], Xiang et al. extend the histogram method to the DWT domain and scramble the watermark by pseudo noise (PN) sequence to improve the security. However, the capacity of this approach is still very low as it is only 2 bps. As we know, transform domain algorithm can improve robustness, thus most selfsynchronization based audio watermarking methods concentrate on the transform domain. In [5], Wu et al. adopt a QIM method to embed watermark and achieve selfsynchronization with the use of the DWT localization property. Although the DWT technique reduces the computation time signiﬁcantly, this technique suffers from TSM and amplitude scaling attacks due to the single coefﬁcient quantization. Furthermore, in [9], Megias et al. propose the recent selfsynchronization audio watermarking algorithm by modifying FFT amplitude. The time domain insertion of synchronization code is combined with the FFT domain informative watermark embedding. This scheme is very fast and can be applied in realtime. Apart from the traditional transform techniques, LWT and lifting based method have also proven to have good performance in the audio watermarking ﬁeld in recent years as lifting can further improve the computation time of the wavelet transform. For example, in [21], a LWT based audio watermarking is suggested for fast implementation of watermarking due to the time saving property of the lifting scheme. The binary watermark scrambled by PN sequence is inserted into the appropriate LWT coefﬁcients in the low frequency domain. This algorithm achieves relatively good imperceptibility and robustness to ﬁltering result. However, the robustness results after resampling, requantization and MP3 compression attack are not very good, and the overall result after MP3 compression attack is hardly acceptable too. In [11], Tao et al. propose a robust audio watermarking scheme in the LWT frequency domain based on the statistical characteristics of subband coefﬁcients. In this scheme, the watermarking technique is invariant and the implementation efﬁciency is improved by the adoption of LWT.
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
1987
saves time and has the frequency localization feature which overcomes the weakness of the traditional wavelet. Actually, the main principle of the lifting wavelet is to construct a new wavelet with better characteristics based on a simple wavelet, which is also the basic idea of lifting. As the basis of integer wavelet transform, lifting wavelet algorithm generally comprises three steps: split/ merge, prediction and update. The detailed reasoning and proof of the lifting scheme is given in Ref. [20]:
In [10], another new transform, DFRST, is integrated with chaos technique in the audio watermarking algorithm. In this method, security is enhanced using the chaotic sequence. DFRST properties are also explored for the audio watermarking and security issue. However, no synchronization code technique is employed. The imperceptibility of this scheme is also not very good and needs further improvement. In the recent stateoftheart publications on audio watermarking technique, SVD related audio watermarking is a very hot topic and has been widely developed and extensively studied due to the superior advantage of SVD over other transforms. For instance, Abd ElSamie [14] and AlNuaimy et al. [15] suggest an efﬁcient SVDbased audio watermarking in the transform domain and use the chaotic sequence to shufﬂe the binary watermark to increase the conﬁdentiality. Furthermore, AlNuaimy et al. [15] extend the proposed SVD audio watermarking and apply it in Bluetooth based systems and automatic speaker identiﬁcation systems. However, from the reported results, the robustness needs further improvement. Besides, it is not robust to TSM and amplitude scaling modiﬁcation attacks as there is no synchronization technique. In [19], Lei et al. propose a very robust SVD–DCT audio watermarking method. The audio watermarking method is better than the selected SVD based methods in terms of robustness and imperceptibility as claimed by the authors. At the same time, Wang et al. [18] propose a reduce SVD (RSVD) and distortion removal audio watermarking scheme. Rather than using the popular SVD watermarking method by modifying the SVs, this method adopts a different algorithm by taking advantage of the SV distortion, that is, RSVD. The U matrix is modiﬁed to embed the watermark bits. The audio ﬁdelity is preserved by a thresholdbased distortion control. However, in this method, synchronization is not provided and there is no security measure incorporated too.
(1) Split step: the split step is also called the lazy wavelet transform. The operation just splits the input signal x (n) into even and odd samples: Xe(n) and Xo(n) XeðnÞ ¼ xð2nÞ XoðnÞ ¼ xð2n þ 1Þ
ð1Þ
(2) Prediction step: keep even samples unchanged and use Xe(n) to predict Xo(n). The two signal subsets from split process should be closely correlated. The difference between the prediction value of P[Xe(n)] and the real value of Xo(n) is deﬁned as detail signal d(n): dðnÞ ¼ XoðnÞP½XeðnÞ
ð2Þ
where P[d] is the predict operator. The detail signal d(n) denotes the highfrequency component of the original signal x(n). Thus the prediction step can be viewed as a highpass ﬁlter. (3) Update step: introduce the update operator U [d], and use detail signal d(n) to update even samples Xe(n). Then the approximate signal c(n) denotes the lowfrequency component of the original signal. Therefore, this operation is viewed as a lowpass ﬁlter cðnÞ ¼ XeðnÞ þU½dðnÞ
ð3Þ
In fact, the reconstruction of lifting wavelet transform is an inverse process of decomposition. The lifting scheme of decomposition and reconstruction is illustrated in Fig. 1.
3. Wavelet lifting The lifting scheme is proposed to reduce computation time and memory requirement as lifting scheme adopts an inplace implementation of wavelet transform. Lifting wavelet simpliﬁes the problem by directly analyzing the problem in integer domain. In addition, the lifting wavelet
Xe (n)
+
4. SVD principles and properties The traditional transform techniques such as FFT, DCT and DWT just decompose a signal in terms of a standard
c(n)
Xe (n)

X(n)
X(n) Split
P
Xo (n)

U
U
d(n)
P
Merge
+ Xo (n)
Fig. 1. Decomposition and reconstruction of lifting wavelet.
1988
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
basis set, which is not an optimal representation in some sense. Owing to the unique features and attractive properties such as stability with little disturbance, SVD [27] has been used in many signal processing applications. As a kind of orthogonal transforms and a numerical technique for diagonalizing matrix, SVD is a numerical technique for linear algebra in the transformed domain comprising basis states that are optimal in some sense. Recently, SVD is widely used in the watermarking ﬁeld as a wellknown numerical analysis tool in the sense that the slight modiﬁcation of the large SVs will not affect the transparency of the cover object. The audio signal is also a kind of signal which can be viewed as a matrix. Thus the audio signal can also take advantage of the SVD property for robustness and transparency tradeoff. The SVD of a matrix A of size m n is usually deﬁned as 2
u1,1 6 A ¼ USV ¼ 4 ^ um,1 T
¼
&
m X n X r X
3 2 u1,r s1,1 6 ^ 7 5 4 ^ um,r 0 ui,k sk,k vk,j
&
3 2 v1,1 0 6 ^ 7 5 4 ^ vn,1 sr,r
&
3 v1,r T ^ 7 5 vn,r
ð4Þ
i¼1j¼1k¼1
where U is a m r matrix, V is a r n matrix and S is a r r diagonal matrix with positive elements, the superscript T denotes matrix transposition, and r is the rank of matrix A. In SVDbased watermarking, a frame is treated as a matrix and decomposed into three matrices with SVD transformation. The diagonal elements of S are called SVs of A, which are nonnegative and assumed to be arranged in a decreasing order, that is, s1,1 4s2,2 4?4 sr,r. The attractive properties of the SVs [27,28] are explained as follows: Stability: let A,BARm n and their corresponding SVs are s1,s2,y,sn and r1,r2,y,rn, respectively. The relation between them is established as 9si r i 9 oJABJ2 ,i ¼ 1,2,. . .,n. This indicates that the SVs have very good stability, i.e., when there is a little disturbance with a matrix, the variation of its SV is not greater than 2norm of disturbance matrix. Proportionality: the SVs of kA are 9k9 times of the SVs of A. Transpose: A and its transposed counterpart AT have the same nonzero SVs. Flipping: A and its ﬂipped versions about the vertical or the horizontal axes have the same nonzero SVs. Rotation: A and its rotated versions obtained through rotating A by an arbitrary angle have the same nonzero SVs. Scaling: if AARm n, then its scaled version As has SVs pﬃﬃﬃﬃﬃﬃﬃﬃﬃ equal to Lr Lc times of the SVs of A, where Lr and Lc are the scaling factors of the rows and columns, respectively. Actually, the abovementioned SVD properties are very suitable for developing robust watermarking approaches in that the watermarked signal will not be corrupted by attacks such as rotation, noise addition and scaling. Furthermore, the embedded watermark is expected to be extracted effectively owing to these unique properties.
5. Embedding method 5.1. Watermark preprocessing Watermark should be ﬁrst preprocessed in order to improve the robustness and enhance the conﬁdentiality. The binary image will be chaotically scrambled before embedding to increase the safety of the watermarking technique. The binary image as watermark data is scrambled by a chaotic map which is reproduced in a permutated matrix. This paper uses a Skew tent map to enhance the conﬁdentiality of the watermarking method. The skew tent map is deﬁned as follows: (1 0 rxðnÞ o a a xðnÞ, xðn þ1Þ ¼ ð5Þ 1 1 a1 xðnÞ þ 1a , a r xðnÞ r1 where aA(0,1) is the system parameter. The initial value is adopted as key K1. Then the binary image logo or signature b(n) is scrambled by x(n) with the following rule: wðnÞ ¼ bðnÞ xðnÞ,
1 r n rN w
ð6Þ
where Nw is the length of the watermark, is the exclusive or (XOR) operator. After this random chaotic sequence encryption, the watermark is permuted and cannot be guessed by random search. 5.2. Synchronization code The watermark will have dislocation of the watermark regions due to desynchronization attacks. Synchronization code is an effective way to locate the position of hidden informative bits after the desynchronization attacks. Besides, such localized synchronization codes eliminate false alarm error due to data modiﬁcation on watermark embedding. In the proposed method, we exploit a pseudo random sequence generated by chaotic signal as the synchronization code to increase the security of the synchronization code. Chaotic systems can have deterministic behavior which is very sensitive to initial conditions as the chaotic signals are uncorrelated and seem to be random in essence. Using a strongly chaotic nature we can ensure that the system is cryptographically secure. The synchronization code is generated by thresholding the Bernoulli shift map. The Bernoulli shift map belongs to one of the simplest deterministic chaotic maps which contain many chaotic characteristics. A binary shift Bernoulli Map can be deﬁned as ( 2xðkÞ if 0 rxðkÞ o 12 xðk þ 1Þ ¼ ð7Þ 2xðkÞ1 if 12 r xðkÞ r1 where x(0)A(0,1) (map’s initial condition) is adopted as secret key K2 and must be speciﬁed. x(k) is mapped into the synchronization sequence C¼{c(k),1 rkrLsyn} with the following rule: 1 if xðkÞ 4 t ð8Þ cðkÞ ¼ 0 otherwise
t is a predeﬁned threshold for synchronization code. Time domain embedding has the strength that it is less
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
Original Audio
Part 1
Partition
LWT
Segmentation
1989
Secret Key K2
Chaotic Encrypted Image
SVD
Watermark Embedding
Inverse SVD
Segment Reconstruction
Part 2
Synchronization Code
Synchronization Code Generation and Insertion
Secret Key K1
Inverse LWT
× Watermarked Audio
Fig. 2. Overview of watermark embedding process.
ð9Þ
of LWT. Fig. 2 presents the diagram of our watermark embedding algorithm. In our watermarking technique, we choose the popular QIM method in the embedding process because of its good robustness and blind nature [23]. As a result, our method is blind and does not need the original audio for the data extraction. The second part of the host audio, SB, is used to embed the watermark. Speciﬁcally, the embedding process is described by the following steps:
Then each bit of the synchronization code is embedded into each SA(k) as follows:
Step 1: perform LWT on the audio segment, SB, of the host audio signal
8 SAðkÞ SAðkÞ > < round D D UD, SA0 ðkÞ ¼ > UD þ D2 , : f loor SAðkÞ D
I ¼ LWTðSBÞ
computation intensive and incurs low cost in ﬁnding the synchronization code. Thus the synchronization code is hidden in the time domain to lower the calculation times. Before embedding, the synchronization code should be arranged into a binary data sequence. The synchronization code insertion part is cut into Lsyn audio segments and each audio segment has P samples denoted as 1 r kr Lsyn ,
SAðkÞ ¼ AðkUP þuÞ,
1 ru rP
if SynðkÞ ¼ 0
ð10Þ
where D denotes the embedding strength, round( ) means rounding to the nearest integer, ﬂoor( ) is rounding to minus inﬁnity. After embedding, the embedded and attacked signal SA00 (k) is also split into Lsyn segments, and then the synchronization code is extracted by the following rule: ( 0
Syn ðkÞ ¼
D
rmodðSA00 ðkÞ, DÞ o 34D
0,
if
1,
otherwise
4
ð12Þ
if SynðkÞ ¼ 1
ð11Þ
where mod( ) denotes modulus after division.
5.3. Watermark embedding As LWT has good space localization characteristics, and more than 90% of the signal energy is concentrated in the low frequency components, using LWT technique in audio watermarking scheme will result in very good antiinterference and anticompression abilities. Therefore, we directly embed the signal into low frequency components
Step 2: the approximate coefﬁcients after the LWT decomposition are divided into nonoverlapping blocks. The length of audio blocks depends on the amount of data that needs to be embedded and the number of LWT decomposition levels. The watermark sequence is embedded successively into the blocks in the lowfrequency subband. Step 3: scramble the watermark image with the method mentioned in Section 5.1. Step 4: for each block, perform SVD transform to obtain the SVs and ﬁrst SVs, S(1,1) I ¼ USV T
ð13Þ
Step 5: embed the watermark into SVs with the QIM method. The encrypted watermark w(i) is added to the ﬁrst SVs, S(1,1), of each block. Our watermark embedding method is based on the popular odd and even parity rule: Let Q¼round(S(1,1)/b), D ¼mod(Q,2), where b is the quantization step. A small value of b will lead to good imperceptibility of the watermarking scheme but low robustness to the attacks. Thus we choose an optimal b for tradeoff between inaudibility and robustness of the
1990
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
watermark. The embedding rule is given as If D is 0 and w(i) is 1, then Q¼Qþ1; if D is 1 and w(i) is 0, then Q¼Qþ1. Step 6: the ﬁrst SVs are further modiﬁed by the updated Q as follows:
watermark is embedded. As the extracted bits are random and independent values, Pfp can be deﬁned as Nw X Nw ð22Þ Pf p ¼ ðPe Þk ð1Pe ÞNw k k k ¼ th1
Sw ð1,1Þ ¼ b roundðQ Þ
where Pe ¼P(w¼w0 9no watermark). In our scheme, the watermarked and unwatermarked bits are either 0 or 1, therefore, the probability of Pe is 0.5, that is Nw Nw 1 X Pf p ¼ N ð23Þ k 2 w k ¼ th1
ð14Þ
Step 7: Sw(1,1) is used to build the watermarked block Iw by applying inverse SVD as follows: Iw ¼ USw V T
ð15Þ
Step 8: inverse LWT is conducted to reconstruct the watermarked signal SBw ¼ LWT 1 ðIw Þ
ð16Þ
6. Watermark extraction The main step of watermark extraction is as follows:
BER ¼
Step 1: perform LWT on the watermarked signal Ie ¼ LWTðSBw Þ
ð17Þ
Step 2: for the obtained wavelet approximation coefﬁcient, block based method is also used, that is, we divide the LWT approximate coefﬁcients into different blocks too. Step 3: SVD is performed in each block Ie ¼ U e Se V e T
ð18Þ
Step 4: let Qe ¼ round(Se(1,1)/b) and De ¼mod(Qe,2), then the extraction rule is ( 1 De ¼ 1 w0 ðnÞ ¼ ð19Þ 0 De ¼ 0
Step 5: perform the decryption with the same chaotic sequence to get the hidden binary image or signature: 0
b ðnÞ ¼ w0 ðnÞ xðnÞ
ð20Þ
7. Performance analysis 7.1. Error analysis There are two types of errors in searching for synchronization codes, false positive error and false negative error. A false positive error is deﬁned as the watermark decoder falsely classifying other signal as watermark when there is no watermark. The probability of a false positive error is usually denoted as P f p ¼ PðPðw,w0 Þ Z th19no watermarkÞ
Actually, in our scheme, Nw ¼1024, if th1¼0.75 Nw, then Pfp ¼2.883 10–60, which means the false positive error is almost zero and can rarely be noticed. On the other hand, a false negative error occurs when the existing watermark is not detected. The probability of false negative error can be calculated as follows: Nw X Nw Pf n ¼ ð24Þ ðBERÞk ð1BERÞNw k k k ¼ th1
ð21Þ
where th1 is an applicationdependent threshold and the probability is measured under the assumption that no
Nw 1 X wðnÞ w0 ðnÞ Nw n ¼ 1
ð25Þ
where w(n) and w0 (n) are the original and extracted watermarks. In our scheme, if th1¼0.75 Nw, Pfn is almost equal to 0. 7.2. Security analysis For a secure audio watermarking scheme, robustness against attacks is an important issue. To enhance the security, the key space should be large enough to make brute force attack infeasible. Moreover, secret keys are adopted for security purpose. The digital computer often stores ﬂoatingpoint number using 32 bits, which consist of an exponent being represented using eight bits and a signiﬁcant number being represented using 24 bits. In our scheme, keys or initial conditions are ﬂoatingpoint numbers. Hence the exponent is ﬁxed while the signiﬁcant number may be varied. Hence the total number of possible initial conditions is 2 to the power of 24 which is more than 16 million. This number can also be greatly increased if so desired through the use of double precision ﬂoating point numbers. Actually, the security of information system depends on keys rather than the privacy of the scheme. In our proposed audio watermarking scheme, we use keys K1, K2 to generate chaotic sequences for enhancing the security of the proposed scheme. Hence, the size of the key value space inﬂuences the security of the proposed scheme. As keys K1, K2 are both used in our scheme, we take K1 as an example and compute its key value space as follows: Suppose K1 ¼{0oK1(i) o19i¼1, 2, y, N1}, N1 is an integer which is used to generate the Skew tent map sequence, thus it should be large enough to produce chaotic sequences Y ¼{y(i, j) 9i¼1, 2, y, N1, j¼1, 2, y, N2}, where N1 denotes the number of chaotic sequences and N2 represents the length of each chaotic sequence. When K 01 ¼ f0 oK 1 ðiÞ þ d o 19i ¼ 1,2,. . .,N 1 g, generate
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
1991
0.035 K1 K2
0.03
f(d)
0.025 0.02 0.015 0.01 0.005 0 105
1010
1015
1020
initial value difference Fig. 3. Key space under varying differences in initial values.
another group of chaotic sequences Y0 ¼{y0 (i, j)9i¼1, 2, y, N1, j¼1, 2, y, N2}. Utilize function f¼S(d) to test key space of K1 PN 1 f ¼ SðdÞ ¼
i¼1
PN2
j¼1
9yði,jÞy0 ði,jÞ9
N1 N2
ð26Þ
Fig. 3 plots the function f¼S(d). It can be seen that f is equal to 0 when d0 ¼10 17 in our method. Thus the key space of K1 is 1/d0 ¼1017. Similarly, the key spaces of K2 can be computed in the same way as K1, which can also be seen from Fig.3. Therefore, the total key space of our watermarking scheme is 1034, which means that there is enough key space to guarantee high conﬁdentiality of our proposed watermarking system. Furthermore, the employed synchronization codes can be used to prevent the signal from being detected by the intelligent attackers and the proposed audio watermarking has different combinations of the secret key. Based on this security analysis, it can be concluded that the embedded watermarks are secure to attackers who try to exhaustively or statistically detect and read them. All in all, our proposed scheme with such a long key is adequate for reliable and practical use.
8. Experimental results In this section, several experiments are conducted to demonstrate the performance of the proposed LWT–SVD based audio watermarking approach. The performance of our scheme is assessed in terms of robustness and imperceptibility. The proposed scheme has been tested against a great amount of scenarios, such as the sound quality assessment material (SQAM) [29] clips, full songs (classical, pop and rock music) and human voice signal. The test audio signal in our scheme is 44.1 kHz sampled, with 16 bits/sample. 32 32 binary image logo is used in our scheme to conduct performance evaluation. LWT decomposition level is set at 3. In our experiment, SNR and Segmental SNR (SegSNR) are used for the evaluation of the quality of the watermarked audio signals. BER is used for evaluating the reliability of the extracted watermarks. SNR and SegSNR are deﬁned as follows: ! X L L X SNR ¼ 10 log10 ð28Þ SðiÞ ðS0 ðiÞSðiÞÞ2 i¼1
SegSNR ¼
The data embedding payload (also known as the capacity) of a watermarking scheme is deﬁned as the number of bits that can be embedded and recovered in the audio stream, and is often measured in bps. Several measures have been suggested for payload. Here, the retrieval payload relative to the size of the marked signal is used, and denoted as Nw Time
Pr K 1 10 X S2 ðiÞ log10 Pr i ¼ 01 2 K m¼0 i ¼ 1 ðS ðiÞSðiÞÞ
ð29Þ
where S(i) and S0 (i) correspond to the original and the watermarked signals, respectively.
7.3. Data payload
Payload ¼
i¼1
ð27Þ
where Time is the duration of the host audio. For our scheme, Nw ¼1024 bits is embedded in a6s host audio, thus the payload of our method is 170.67 bps. This is a relatively high payload as typical payload is 20–50 bps.
8.1. Imperceptibility test For the imperceptibility test, the time domain waveforms and the spectra in the frequency domain are presented for performance evaluation. The time domain waveforms show the difference between the original and watermarked waveforms in time domain, while the spectrum can illustrate the differences in the frequency domain. Fig. 4 presents the waveforms of the original, watermarked and residual signals. Fig. 5 shows the spectra of the original and watermarked signals. The average SNR and SegSNR results of all the test signals versus the different quantization steps, b, are shown in
1992
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
Original audio signal 1 0 1 0.5
1
1.5
2
2.5 x 105
2
2.5 x 105
2
2.5
Watermarked audio signal 1 0 1 0.5
1
1.5
Waveform of the residual audio signal 1 0 1 0.5
1
1.5
x 105 Fig. 4. Waveform of the original, watermarked and residual signals.
The spectrogram of the host audio signal
x 104
Frequency
2 1.5 1 0.5 0 0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Time The spectrogram of the watermarked audio signal
x 104
Frequency
2 1.5 1 0.5 0 0.5
1
1.5
2
2.5 3 Time
3.5
4
4.5
5
Fig. 5. Spectra of the original and watermarked signals.
Fig. 6. From the waveforms and spectra, it can be observed that there is not much distinguishing difference between the original and the watermarked audio, which is also veriﬁed by the SNR and SegSNR results in Fig. 6. The SNR and SegSNR results are all above 20 dB even when the quantization step is 1 and these results more than satisfy the IFPI requirements [1].
8.2. Subjective listening test As we know, SNR is a simple way to present the sense of imperceptibility by measuring the signal distortion caused by watermarking. However, human perception may not corroborate well with the SNR measure. Consequently, subjective quality evaluation of the watermarking methods must be
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
1993
SNR and SegSNR vs. quantization step 45 SegSNR SNR
40
dB
35 30 25 20 15 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Quantization Step Fig. 6. SNR, SegSNR results versus quantization steps.
MOS scores of the test sequences 5 4.5 4
MOS score
3.5 3 2.5 2 1.5 1 0.5 0
1
2
3
4
5
6
7 8 9 10 11 12 13 14 15 Test sequence
Fig. 7. MOS scores of the 15 test sequences.
conducted to provide a better test of inaudibility based on human perception. In our experiment, we perform an informal subjective listening test to evaluate the perceptual quality of the watermarked audio. Ten listeners involved in the listening tests are asked to classify the difference between the original and the watermarked audio in terms of a 5point Mean Opinion Score (MOS) with impairment scale deﬁned as 5: imperceptible, 4: perceptible but not annoying, 3: slightly annoying, 2: annoying, 1: very annoying. In this way, the inaudibility of our watermarking schemes has been certiﬁed through listening tests. Fig. 7 plots the MOS scores of the 15 test sequences. The average MOS for the tested audio excerpts is 4.82 for our algorithm, which means that the
watermarked audio and the original audio are perceptually indistinguishable. 8.3. Robustness to common signal processing attacks The watermarking scheme evaluation should compose of blind attacks (such as MP3 compression, noise addition, etc.) and intentional attacks. The nonintentional attacks do not know whether the watermark exists and where the watermark is, while the intentional attacks want to know where and what has been embedded and will try to remove it. The blind attacks (common signal processing) and desynchronization attacks are used to estimate the
1994
Table 1 Comparison of algorithms based on robustness to common signal processing attacks. Attack
No attack
Requantization
Resampling 22.05 kHz
Resampling 11.025 kHz
Resampling 8 kHz
BER Wu et al. [5]
0
0
0.501
0.489
0.482
BER Wang and Zhao [6]
0
0
0.5029
0.4902
0.5020
BER Ours
0
0
0
0
0.0586
BER Attack Huang et al. [4]
0 Additive noise
0 Echo addition
0 Equalization
0 Pitch shift 11 higher
0 Pitch shift 11 lower
BER Wu et al. [5]
0.026
0.190
0.159
0.501
0.504
BER Wang and Zhao [6]
0.0195
0.1914
0.1504
0.5156
0.4580
BER Ours
0
0.0078
0.0039
0.5166
0.501
Huang et al. [4]
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
0 Lowpass ﬁltering 4 kHz
0.002 MP3—256 kbps
0.0027 MP3—128 kbps
0.064 MP3—96 kbps
0.059 MP3—64 kbps
BER Wu et al. [5]
0.491
0
0
0.502
0.498
BER Wang and Zhao [6]
0.5019
0
0
0
0.5020
BER Ours
0
0
0
0.0107
0.0225
BER Attack Huang et al. [4]
0 Cropping (10%) (front)
0 Cropping (10%) (middle)
0 Amplitude scaling up to 110%
0 Amplitude scaling down to 90%
0 Adding (10%) (front)
BER Wu et al. [5]
0
0
0.498
0.508
0.500
BER Wang and Zhao [6]
0
0
0.5049
0.4815
0.5078
BER Ours
0
0
0.2363
0.2549
0
BER Attack Huang et al. [4]
0 TSM (þ 1%)
0 TSM ( 1%)
0.158 TSM ( 2%)
0.162 Jittering (1/100,000)
0 Jittering (1/50,000)
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
BER Attack Huang et al. [4]
1995
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
0.0186 0.1455 0 BER
0
0
0.0215 0.1543 0.001 BER Ours
0
0
0.1406 0.0488 0.4854 0.5195 0.4785 BER Wang and Zhao [6]
0.484 0.482 0.503 BER Wu et al. [5]
Resampling 22.05 kHz No attack
Requantization
0.045
0.140
robustness of our scheme. In our experiment, the parameters of these common signal processing manipulations are given as follows:
Attack
Table 1 (continued )
Resampling 11.025 kHz
Resampling 8 kHz
1996
Requantization: 16bit watermarked audio signal is requantized to 8 bits and back to 16 bits. Resampling: watermarked audio signals with original sampling rate of 44.1 kHz have been subsampled down to 22.05 kHz, 11.025 kHz and 8 kHz, and upsampled back to 44.1 kHz. Additive noise: white Gaussian noise with 1% of the power of the audio signal is added. Lowpass ﬁltering: lowpass ﬁltering using a second order Butterworth ﬁlter with cutoff frequency of 4 kHz is performed on the watermarked audio signals. Echo addition: an echo signal with a delay of 10 ms and a decay of 10% has been added to the original audio signal. Equalization: the ‘‘Hum Removal’’ preset of the audio editing tool (CoolEdit Pro2.1) is used, which is a 6band graphic equalizer. The 50, 100, 150, 200, 250 and 300 Hz frequency bands are boosted by 18 dB. MP3 compression: the robustness against the lowrate codec is tested using MPEG 1 Layer III compression (MP3) with compression rates of 64, 96, 128 and 256 kbps. Cropping: 10% of the samples of each testing signal are cropped at front and middle positions. Adding: 10% of the samples of each testing signal are added into the front of the host signal. Amplitude variation: the watermarked signal is attenuated up to 110% and down to 90%. Pitch shifting: tempopreserved pitch shifting is a difﬁcult attack for audio watermarking algorithms as it causes frequency ﬂuctuation. In our experiment, the pitch is shifted 11 higher and 11 lower. TSM: TSM processing is done in the watermarked audio signal to change the time scale to 71%, 2%, while preserving the pitch. Jittering: jittering is a small rapid variation. One sample out of every 100,000 and 50,000 samples is removed in our jittering experiment.
The above mentioned attacks are used to evaluate the robustness of watermarking algorithms. The BER results of the extracted watermarks after attacks are the best indication of the robustness of the watermarking algorithm. Table 1 demonstrates the robustness results of our proposed scheme. The results of DCT domain method in [4], DWT domain method in [5] and DWT–DCT domain method in [6] are also given in Table 1 for comparison purposes. From Table 1, it is obvious that the robustness of our method is much better than the algorithms in [4,5], and slightly better than the algorithm in [6]. The reason why our method outperforms all these selected methods [4–6] in robustness is mainly because of the advantageous properties of LWT–SVD over single DWT or DCT transform, and the hybrid DWT–DCT transform. Both LWT and SVD have very attractive features for robust watermarking and the combination of them actually enhance the robustness too.
Table 2 Comparison of algorithms based on robustness to Stirmark attacks. Cox et al. [3]
¨ zer et al. [13] O
Ours
Cox et al. [3]
¨ zer et al. [13] O
Ours
Addbrumm
1.25
0
0
Fft_stat1
19.84
0.5
0.13
AddDynNoise
1.56
0
0
Fft_test
19.80
0.4
0.026
AddFFTNoise
51.25
0
0
Flipsample
21.66
0.75
0.029
Addnoise
0.78
0
0
Invert
52.42
0
0
Addsinus
0.77
0
0
Lsbzero
0
0
0
Amplify
52.32
0.75
0
Normalize
0
0
0
Bassboost
0
0
0
Nothing
0
0
0
Compressor
0
0
0
Rc_highpass
2.03
0
0
Copysample
100
0.5
0.2
Rc_lowpass
0
0
0
Cutsamples
100
0
0
Smooth
0
0
0
Echo
23.43
0
0
Stat1
0
0
0
Extracted watermarks of our method
1997
Attacks
Extracted watermarks of our method
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
Attacks
1998
Table 2 (continued ) Attacks
Cox et al. [3]
¨ zer et al. [13] O
Ours
Cox et al. [3]
¨ zer et al. [13] O
Ours
Exchange
0
0
0
Stat2
0
0
0
Extrastereo
0
0
0
Voiceremove
52.1
0
0
Fft_hlpass
0.31
0
0
Zerocross
0
0
0
Fft_invert
52.6
0
0
Zerolength
60.5
0
0
Fft_real_reverse
0.78
0
0
Zeroremove
100
0
0
Average of all attacks
22.2937
0.0906
0.012
Extracted watermarks of our method
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
Attacks
Extracted watermarks of our method
Table 3 Summary of algorithm comparison. Method
Bassia et al. [2] Huang et al. [4] Cvejec and Seppanen [8] Wu et al. [5] Lie and Chang [24]
Spread spectrum No DCT quantization Yes Spread spectrum Yes
DWT QIM Amplitude modiftication Wang and Zhao [6] DWT–DCT quantization Li et al. [7] Spread spectrum Xiang and Huang Histogram [25] Xiang et al. [26] DWTbased Histogram Fan and Wang [10] Chaosbased DFRST Ercelebi and LWT Batakci [21] Megias et al. [9] Ampulitude modiﬁcation ¨ zer et al. [13] O STFT–SVD AlHaj and DWT–SVD Mohammad [17] Abd ElSamie [14] SVD AlNuaimy et al. SVD [15] Bhat K. et al. [16] DWT–SVD Wang et al. [18] FFT–RSVD Lei et al. [19] SVD–DCT Ours LWT–SVD
Synchronization SNR (dB)
Payload (bps)
Subjective test reported
Blind Audio content
22 43.0 N/A
44.1 36 27.1
Yes No 4.71
Yes Yes No
Yes Yes
30.64 24.5
172.41 43.1
No Yes
Yes
43.1
N/A
Yes Yes
29.5 440
Yes
Secret keys used
Embedding domain
Stirmark attack test
Error analysis reported
Security analysis provided
4 Audio excerpts Yes 2 Audio signals BCH coding 21 Audio pieces PN sequence
Time DCT FFT
No No No
No No No
No No No
Yes Yes
2 Audio signals 3 Music signals
No section length
DWT Time
No No
Yes Yes
No Yes
No
Yes
Yes
DWT–DCT
Yes
No
No
4.27 3
4.9 4.07
Yes Yes
Music and speech 5 Music signals 6 Music signals
Frame length No
FFT subband Time
Yes Yes
Yes No
No No
42.8
2
3.34
Yes
6 SQAM clips
PN sequence
DWT
Yes
No
No
No
30
86
No
Yes
6 Music signals
Chaotic sequnce
DRFST
No
Yes
Yes
No
25.93 N/A
Yes
No
9 Audio signals
LWT
No
No
No
Yes
25.7
4.69
No No
28.36 32 28.55 N/A
No No Yes No Yes Yes
Time
No
No
No
4.7 4.33
pseudo random sequence Yes 6SQAM clips and Yes songs Semi 3 Audio clips No No Musicand speech No
STFT–SVD DWT
Yes Yes
No No
No No
27.13 N/A 27.13 N/A
Yes No
No No
1 Music clip 1 Music clip
Chaotic sequnce Chaotic sequnce
SVD SVD
No No
No No
No No
24.37 27.23 32.53 40
Yes 4.36 Yes 4.82
Yes Yes Yes Yes
5 Music ﬁles 25 Music ﬁles 15SQAM clips 15 QAM clips and songs
Yes No Chaotic sequnce Chaotic sequnce
DWT FFT–SVD SVD–DCT LWT–SVD
No No Yes Yes
Yes No Yes Yes
No No No Yes
33.09
45.9 187 43 170.67
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
Reference
1999
2000
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
8.4. Robustness to Stirmark attacks The robustness of our scheme is also benchmarked against Stirmark attack which is a very popular benchmark tool for audio watermarking. The parameters of these standardized attacks are the default values set out in the software conﬁguration. The detailed benchmark attack results are summarized in Table 2. We also compare our method with the selected related watermarking methods in [3,13]. From the comparison results, it is noted that our method is slightly better than the watermarking scheme in [13] but much better than the watermarking scheme in [3] under the Stirmark attacks.
8.5. Algorithm comparisons and discussions In this section, the proposed scheme is compared with other stateoftheart and related audio watermarking schemes in the literature. The selected watermarking schemes are typical audio watermarking schemes with selfsynchronization or without synchronization, with and without SVD transform, and other conventional audio watermarking schemes. The detailed comparisons among these audio watermarking schemes in terms of robustness, imperceptibility, payload and other features are listed in Table 3. Note that if the detailed subjective results (MOS score) are not provided in the references, we just report yes in Table 3 for the subjective test category. If an algorithm involves convolution, DCT or Fourier transforms, it is computational intensive and the execution time is high. Some insights gleaned from Table 3 are summarized as follows: (1) Most of the recent publications focus on one or two features, such as synchronization, efﬁciency, robustness, imperceptibility. (2) Very few of them provide security analysis. (3) A higher embedding rate often leads to unsatisfactory robustness for the same amount of attacking disturbance. A suitable embedding rate should be between 20 bps and 50 bps. From the comparison results in Table 3, we can see that our proposed LWT–SVD algorithm can obtain a relatively high payload and good transparency results. It is capable of achieving moderately high SNR results, as SNR results in our scheme are not as high as those in [5,6,25,26], but are higher than most of the selected schemes. Besides, the payload in our method is 170.67 bps, which is lower than that in [5,6] but is relatively high compared to the rest of the selected schemes. It is also above 20 bps which more than satisﬁes the IFPI requirements. Apart from the security consideration, running speed of the algorithm is also an important aspect for a good watermarking scheme. The error and security analysis are also provided as well as the robustness against detailed Stirmark attack. In addition, our proposed scheme is blind in nature. All in all, it can be concluded that the proposed algorithm not only achieves a satisfactory compromise between robustness, payload,
imperceptibility and time complexity but also more than meets all the IFPI requirements. 9. Conclusions In this paper, a very robust and blind audio watermarking scheme is proposed which makes good use of the features of SVD, LWT, synchronization code technique and QIM. The proposed method is similar to SVs modiﬁcation algorithm in the lowfrequency subband. The robustness of our scheme is validated by common signal processing and Stirmark attacks. The performance analyses and comparison results indicate that the proposed watermarking scheme maintains good audio quality and high robustness against various attacks, including MP3 compression, lowpass ﬁltering, amplitude scaling, time scaling, cropping, jittering, sampling rate change, bit resolution transformation, additive noise, echo addition and equalization. References [1] S. Katzenbeisser, F.A.P. Petitcolas, Information Hiding Techniques for Steganography and Digital Watermarking, Artech House Norwood Mass, USA, 2000. [2] P. Bassia, I. Pitas, N. Nikolaidis, Robust audio watermarking in the time domain, IEEE Transactions on Multimedia 3 (2) (2001) 232–241. [3] I.J. Cox, J. Kilian, F.T. Leighton, T. Shamoon, Secure spread spectrum watermarking for multimedia, IEEE Transactions on Image Processing 6 (12) (1997) 1673–1687. [4] J.W. Huang, Y. Wang, Y.Q. Shi, A blind audio watermarking algorithm with selfsynchronization, in: Proceedings of IEEE International Symposium on Circuits and Systems, 2002, pp. 627–630. [5] J. Wu, D. Huang, Y.Q. Huang, Shi, Efﬁciently selfsynchronized audio watermarking for assured audio data transmission, IEEE Transactions on Broadcasting 51 (1) (2005) 69–76. [6] X.Y. Wang, H. Zhao, A novel synchronization invariant audio watermarking scheme based on dwt and dct, IEEE Transactions on Signal Processing 54 (12) (2006) 4835–4840. [7] W. Li, X. Xue, P. Lu, Localized audio watermarking technique robust against timescale modiﬁcation, IEEE Transactions on Multimedia 8 (1) (2006) 60–69. [8] N. Cvejic, T. Seppanen, Spread spectrum audio watermarking using frequency hopping and attack characterization, Signal Processing 84 (1) (2004) 207–213. [9] D. Megias, J. SerraRuiz, M. Fallahpour, Efﬁcient selfsynchronised blind audio watermarking system based on time domain and fft amplitude modiﬁcation, Signal Processing 90 (12) (2010) 3078–3092. [10] M. Fan, H. Wang, Chaosbased discrete fractional sine transform domain audio watermarking scheme, Computers and Electrical Engineering 35 (3) (2009) 506–516. [11] Z. Tao, H.M. Zhao, J. Wu, J.H. Gu, Y.S. Xu, D. Wu, A lifting wavelet domain audio watermarking algorithm based on the statistical characteristics of subband coefﬁcients, Archives of Acoustics 35 (4) (2010) 481–491. [12] D. Kundur, D. Hatzinakos, Digital watermarking using multiresolution wavelet decomposition, in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 1998, pp. 2969–2972. ¨ zer., B. Sankur, N. Memon, A svdbased audio watermarking [13] H. O technique, in: Proceedings of the Seventh Workshop on Multimedia and Security, ACM, New York, NY, USA, 2005, pp. 51–56. [14] F.E. Abd ElSamie, An efﬁcient singular value decomposition algorithm for digital audio watermarking, International Journal of Speech Technology 12 (1) (2009) 27–45. [15] W. AlNuaimy, M.A.M. ElBendary, A. Shaﬁk, F. Shawki, A.E. AbouElazm, N.A. ElFishawy, S.M. Elhalafawy, S.M. Diab, B.M. Sallam, F.E. Abd ElSamie, H.B. Kazemian, An svd audio watermarking approach using chaotic encrypted images, Digital Signal Processing doi:10.1016/j.dsp.2011.01.013.
B. Lei et al. / Signal Processing 92 (2012) 1985–2001
[16] V. Bhat K., I. Sengupta, A. Das, An adaptive audio watermarking based on the singular value decomposition in the wavelet domain, Digital Signal Processing 20 (6) (2010) 1547–1558. [17] A. AlHaj, A. Mohammad, Digital audio watermarking based on the discrete wavelets transform and singular value decomposition, European Journal of Scientiﬁc Research 39 (1) (2010) 6–21. [18] J. Wang, R. Healy, J. Timoney, A robust audio watermarking scheme based on reduced singular value decomposition and distortion removal, Signal Processing 91 (8) (2011) 1693–1708. [19] B.Y. Lei, I.Y. Soon, Z. Li, Blind and robust audio watermarking scheme based on SVD–DCT, Signal Processing 91 (8) (2011) 1973–1984. [20] W. Sweldens, The lifting scheme: a customdesign construction of biorthogonal wavelets, Applied and Computational Harmonic Analysis 3 (2) (1996) 186–200. [21] E. Ercelebi, L. Batakci, Audio watermarking scheme based on embedding strategy in low frequency components with a binary image, Digital Signal Processing 19 (2) (2009) 265–277. [22] R. Liu, T. Tan, A svdbased watermarking scheme for protecting rightful ownership, IEEE Transactions on Multimedia 4 (1) (2002) 121–128.
2001
[23] G.W.W.B. Chen, Quantization index modulation: a class of provably good methods for digital watermarking and information embedding, IEEE Transactions on Information Theory 47 (2001) 1423–1443. [24] W.N. Lie, L.C. Chang, Robust and highquality timedomain audio watermarking based on lowfrequency amplitude modiﬁcation, IEEE Transactions on Multimedia 8 (1) (2006) 46–59. [25] S. Xiang, J. Huang, Histogrambased audio watermarking against timescale modiﬁcation and cropping attacks, IEEE Transactions on Multimedia 9 (7) (2007) 1357–1372. [26] S. Xiang, H.J. Kim, J. Huang, Audio watermarking robust against timescale modiﬁcation and mp3 compression, Signal Processing 88 (10) (2008) 2372–2387. [27] H.C. Andrews, C.L. Patterson, Singular vale decomposition and digital image processing, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP 24 (1) (1976) 26–53. [28] E. Biglieri, K. Yao, Some properties of singular value decomposition and their applications to digital signal processing, Signal Processing 18 (3) (1989) 277–289. [29] EBU, Sqam—sound quality assessment material. /http://sound. media.mit.edu/mpeg4/audio/sqam/S (last checked 30.05.11).