Informatica 32 (2008) 299-306 299 Content-Based Watermarking for Image Authentication Using Independent Component Analysis Dr. Latha Parameswaran Professor, Department of Computer Science & Engineering, AMRITA University, Coimbatore - 641 105, India E-mail: lathapcp@yahoo.co.in Dr. K. Anbumani Former Director, Karunya School of Computer Science and Technology, Karunya University, Coimbatore -641114, India E-mail: anbumani_k@yahoo.co.uk Keywords: digital watermarking, independent component analysis, discrete cosine transform Received: September 19, 2007 This paper proposes a novel approach to content-based watermarking for image authentication that is based on Independent Component Analysis (ICA). In the scheme proposed here, ICA is applied to blocks of the host image and the resulting mixing matrix represents the features of the image blocks. Frobenius norm of the mixing matrix is adopted as the content-based feature. This is embedded as the watermark in a mid-frequency DCT coefficient of the block. This authentication technique is robust against incidental image processing operations, but detects malicious tampering and correctly locates the tampered regions. Povzetek: Predlagana je nova metoda avtentikacije slik. 1 Introduction A digital watermark is a piece of information that is hidden in a multimedia content, in such a way that it is imperceptible to a human observer, but easily detected by a computer. The principal advantage is that the watermark is inseparable from the content [1]. Digital watermarking is the process of hiding the watermark imperceptibly in the content. This technique was initially used in paper and currency as a measure of authenticity. The primary tool available for data protection is encryption. Encryption protects content during the transmission of the data from the sender to receiver. However, after receipt and subsequent decryption, the data is no longer protected. Watermarking complements encryption [1]. Digital Watermarking involves two major phases: (i) Watermark embedding, and (ii) Watermark extraction. Digital watermarks can be a pseudo random sequence or a logo of a company or an image. Watermark embedding is done in the watermark carriers such as Discrete Cosine Transform (DCT) or Discrete Wavelet Transform (DWT), etc of the original data resulting in watermarked data. The watermarked data may be compressed to reduce its size, corrupted by noise during its transmission through a noisy channel. It may be subjected to other normal image processing operations such as filtering, histogram modification etc. Also malicious intruders may tamper the data. DCT is a widely used technique for watermarking [1]. Recently ICA is being adopted for watermarking [2] - [8]. In [2] ICA is applied to the blocks of the host image and the watermark image. The least-energy independent components of the host are replaced by the high-energy independent components of the watermark image. For watermark extraction the demixing matrices of both the watermark and the host images are required. Dan Yu et al. [3] treat the host image, the key image, and the watermark image as the independent sources. Embedding is done by weighted addition of the key and the watermark to the host. For watermark extraction, two more mixtures are obtained by adding the key and the watermark using different weights. ICA is then applied to these mixtures to separate the host, the key, and the watermark. The host and the key are required for watermark extraction. In [4] the same procedure as in [3] has been used. The only difference is in the algorithm used for ICA. Ju Liu et al. [5] use ICA for detection of the watermark which is a random sequence embedded in low-frequency DCT coefficients. Original DCT coefficients are required for watermark detection and for creating a second mixture needed for ICA. Bounkong et al. [6] apply ICA to each block of the host image and obtain its independent components. The watermark is embedded in selected components using quantization and a modified image block is obtained 300 Informatica 32 (2008) 299-306 L. Parameswaran et al. from these modified independent components. This is added to the host image block, obtaining the watermarked image. In the extraction phase, ICA is applied to each block obtaining the independent components. The watermark is then extracted from these through dequantization. The technique of creating three mixtures is also employed in [7] and [8]. While [7] uses upsizing and downsizing, [8] uses the so-called redundant DWT (RDWT). Many authors have worked on content-based watermarking for image authentication. Li et al. [14] discuss a content-based watermarking scheme that uses local features of the image such as edges and zero-crossings. Their scheme uses a look-up table to embed the watermark and the same table is required at the receiver end to extract the watermark. Content-based watermark is generated based on salient features of the image either in spatial domain like edges, texture, fractal dimensions [15] etc. or in a transform domain such as singular values [16], eigenvalues [17], etc. Choices of image features vary with techniques and directly influence the robustness of the scheme. Some techniques generate a random binary sequence to embed the watermark based on the features of the images [18] and [19]. A content-based digital signature scheme for image authentication has been presented in [20]. In [21] - [24] a localization based method has been presented to verify the integrity of the received image. In these techniques the host image is divided into a number of disjoint blocks and watermark is embedded in each of these blocks. To verify the authenticity of the received image, blockwise authentication has been done. In [18], [25], [26] image authentication has been done using content-based watermarks. But these schemes do not embed the watermark in the image content; instead embed them in the image header. These techniques distort the host image prior to watermark embedding. In [27] a watermarking technique based on the quadtree is proposed. This scheme embeds a Gaussian sequence watermark into low-frequency band of the wavelet transform. In their technique, watermark is embedded into visually insensitive pixels in quadtrees. Most of the above authors embed a specified watermark. Most have copyright protection as their goal and require a lot of information about the host image for watermark extraction. Additional image mixtures are artificially created and then ICA is used as a blind source separation technique to separate the host image, the watermark, and the key. In the scheme proposed here, a different approach to the use of ICA is adopted. ICA is used to determine the mixing matrix which - specifically its Frobenius norm -represents the content of the host image. No information about the host is required for watermark extraction. Thus the proposed scheme is a novel, blind, content-based watermarking for content authentication that uses ICA and DCT. 2 Proposed content-based watermarking using ICA 2.1 Principles of ICA Principles of ICA are discussed in [9]-[13]. Let x be a random vector of observations. ICA models the observed data as: x = As, where the vector s represents the independent sources that generate the observed data x. The matrix A is the mixing matrix. Each xj is a linear combination of the independent source signals. Hence xi is called the mixed signal. ICA estimates A and s from given x. In the context of images, each row is considered as one observation x. Thus the entire image is denoted as X, representing all the rows of the image. ICA models an image X as: X = AS. In the general case, X is m x n and S is r x n, where m > r. In other words, the number of observed mixed signals must be greater than or equal to the number of independent components. Some significant points of ICA are: • Each component sj is independent of every other component. • All independent components are non-Gaussian (with possible exception of one). • The observation x is assumed centered. • There is no order specified among the independent components. • The independent components can be obtained from the observations as, s=Wx, where W is called the demixing matrix, estimated in ICA. • Each column of the mixing matrix A represents a feature of the data x. The last mentioned property is made use of in this proposed technique using ICA. Each column of the mixing matrix A represents a feature of the image. So, all the columns together represent all the features of the image. In order to get one single quantity to represent the image, the Frobenius norm of A has been chosen as the content-based feature that represents the image. The Frobenius norm of a matrix is the square root of the sum of the norms of all the columns of A. The proposed content-based watermarking for blind authentication uses a hybrid of ICA and DCT. The host image is divided into small blocks. ICA is applied to each block and the mixing matrix of the block is determined. Frobenius norm of the mixing matrix is computed. This is considered as the content-based feature of the block. Such features are obtained for all the blocks of the host image. These constitute the content-based watermark used for authentication of the image. In this technique, the watermark is embedded by replacing the chosen mid-frequency coefficient DCT(p, q) with a scaled value of the watermark but retaining the sign of the DCT coefficient, i.e., CONTENT-BASED WATERMARKING FOR IMAGE... Informatica 32 (2008) 299-306 301 where w is the content-based watermark. Here the watermark is the Frobenius norm of a block, which is always non-negative. The value for a is chosen based on the statistical details of the DCT coefficients and the watermark. Watermark extraction is the reverse of the embedding process. The received watermarked image is divided into blocks and ICA is applied to each of them. The Frobenius norm of the mixing matrix is computed. DCT of each block is also performed. The watermark that was embedded is extracted from the chosen mid-frequency coefficient: 3. Extract the embedded watermark from the chosen DCT coefficient: 4. This set forms the extracted watermark, 5. Calculate the blockwise percentage difference (A) between the watermark values w* and w': The percentage difference (A) between the extracted and the embedded watermarks is computed. If it is high, it indicates that the image has been tampered. Details of steps for the three phases of watermarking are given below i. Watermark generation, ii. Watermark embedding, and iii. Watermark extraction and authentication. Watermark generation 1. Segment the host image I of size n x n into blocks of size m x m resulting in K blocks. 2. Perform ICA of each block treating each row of the block as a vector. 3. Extract the mixing matrix A. 4. Compute the Frobenius norm of the mixing matrix; this is the content-based watermark w of the block. 5. Repeat steps 2 - 4 for computing the watermark for all the blocks. This set forms the watermark, W = ^1' w2,■■■, wk } Watermark embedding 1. Perform DCT of each block. 2. Select the mid-frequency coefficient at the chosen location (p, q) in each block. 3. Replace the chosen coefficient with the watermark: 4. Perform inverse DCT. 5. Repeat steps 1- 4 for all the blocks. The resultant is the watermarked image I*. Watermark extraction and authentication 1. Perform steps 1-5 of the watermark generation procedure on the received image I' and obtain the computed watermark, . 2. Perform DCT of each block. In this scheme the percentage difference of the values corresponding to each block is used to detect any change in the block and thereby the authenticity of the image. If the difference is small - smaller than an experimentally chosen threshold value - the block and therefore the entire image is deemed authentic. If the difference of any block is greater than the threshold, that block is identified as the tampered block and hence the image is unauthentic. . 3 Experimental results The proposed blind content-based watermarking scheme for image authentication has been tested using Matlab and Adobe Photoshop. The scheme has been evaluated on a set of three different categories of 512 x 512 gray scale images: (i) standard images, (ii) natural images, and (iii) images created using imaging tools. Choice of parameters In order to determine the block size for image segmentation various block sizes were tried. In [6] Bounkong has mentioned that choosing a block size is based on the processing time and relevant features. Blocks of small size leads to poor performance in watermarking process and larger blocks demand high computational time. Hence a trade off between these two is required to choose the block size. After experimentation, a block size of 16 x 16 was chosen as it resulted in better PSNR value, computational time and better feature representation. In order to embed the watermark in a suitable location the proposed technique uses one of the mid-frequency coefficients. Embedding the watermark in low-frequency components, results in visual degradation of the host image. Similarly embedding the watermark data in high-frequency components is not advisable as they may be lost during compression. Hence embedding the watermark in mid-frequency components ensures robustness. The mid-frequency coefficient (p, q) in which to embed the watermark is chosen as the mid-diagonal coefficient i.e. the location 302 Informatica 32 (2008) 299-306 L. Parameswaran et al. For choosing a suitable value for the embedding strength a, statistics of the DCT coefficient values at that mid-diagonal location of all the blocks are obtained, specifically the standard deviation ax. Similarly the standard deviation aw is obtained for the watermark. The value of embedding factor a is determined such that the watermark values are suitably scaled to have the same range of variation as that of the DCT coefficients: a = -soc "H . In this experimentation after computation the value is a = 0.14. Threshold for the percentage difference A between the watermarks has been experimentally determined as 15%. Lower thresholds resulted in false negatives; while higher thresholds made the technique to be fragile. The ICA algorithm adopted in this proposed technique is the fastICA algorithm. This algorithm has been discussed in [9] and [13]. Quality of the watermarked image The proposed content-based watermarking scheme has been implemented on a set of images of three categories. The metrics PSNR, Pearson Correlation Coefficient (PCC), Normalized Cross Correlation (NCC), and Image Fidelity (IF) are calculated between the host image and the watermarked image. The test images after watermarking is shown in Fig. 1. It can be observed that there is no perceptually noticeable difference in the images due to watermarking. Numerical values of the performance metrics for the test images given in Table 1 also corroborate this. PSNR values range from 96.09 to 102.45, with an average of 97.87 for all the test images, which is quite high. The other metrics, Pearson Correlation Coefficient (PCC) and Image Fidelity (IF) are also quite high. This shows that watermark embedding does not degrade the visual quality of the image. CONTENT-BASED WATERMARKING FOR IMAGE... Informatica 32 (2008) 299-306 303 % Table 1 : Quality metrics after watermarking using ICA Figure 1: technique. Images after watermarking using the ICA Extraction efficacy The efficiency of the scheme in correctly extracting the watermark is given by the percentage difference between the computed and extracted Frobenius norm of the mixing matrix of the received image blocks. Table 2 gives the highest percentage difference A for some of the test images. The values are small, ranging from 1.87 to 9.60%, over all the test images. This indicates that the scheme extracts the embedded watermark accurately. PSNR IF NCC PCC Bridge 98.8892 1.0000 1.0000 0.9999 Boy 98.6417 0.9999 1.0000 0.9999 Building 96.7310 0.9999 1.0000 0.9999 Cameraman 97.4249 1.0000 1.0000 0.9999 Clown 98.2414 0.9999 1.0000 0.9999 Couple 96.7900 0.9999 1.0000 0.9999 Jet Plane 98.6468 1.0000 1.0000 0.9999 Lena 98.0096 0.9999 1.0000 0.9999 Living Room 98.7276 1.0000 1.0000 0.9999 Mandrill 98.4030 1.0000 1.0000 0.9998 Peppers 96.0968 0.9999 1.0000 0.9998 Sail Boat 96.4620 1.0000 1.0000 0.9999 Bulb 102.4517 1.0000 1.0000 1.0000 Snow Tree 97.8179 1.0000 1.0000 0.9999 Specs 97.8651 1.0000 1.0000 0.9998 Trees 97.3633 0.9999 0.9999 0.9997 KeyClock 96.2998 1.0000 1.0000 0.9998 SunBark 96.8414 1.0000 1.0000 0.9999 Decor 97.2312 0.9999 0.9999 0.9989 Lamp 96.1245 0.9989 0.9999 0.9978 Average 97.7529 0.9999 1.0000 0.9997 Minimum 96.0968 0.9989 0.9999 0.9978 Maximum 102.4517 1.0000 1.0000 1.0000 Robustness against incidental image processing Robustness of the proposed scheme against normal signal processing operations such as compression, noise and filtering has been experimentally evaluated on all the test images. In this proposed watermarking technique the watermarked image is subjected to three types of distortions: compression, noise, and filter. Watermarked image has been compressed using JPEG compression with different quality factors. Additive white Gaussian noise (AWGN) and uniform noise has been added to the watermarked image. Also filtering such as low pass, sharpening, histogram equalization, and contrast stretching has been applied on the watermarked image. Results of the test image Lena is shown in Table 3. For all these attacks, the values of highest percentage difference A, ranges from 1.25 to 6.45 with an average of 3.23 as given in Table 3. All the percentage differences all less than the determined threshold 15%, indicating that there is no tampering. 304 Informatica 32 (2008) 299-306 Table 2: Results after watermark extraction without attacks using ICA_ Image Highest percentage difference Bridge 9.1100 Boy 8.3440 Building 8.5009 Cameraman 4.5248 Clown 8.6273 Couple 8.2430 Jet Plane 7.5966 Lena 8.9354 Living Room 9.2842 Mandrill 6.6092 Peppers 7.9915 Sail Boat 8.7662 Bulb 1.9682 Snow Tree 8.7526 Specs 8.2906 Trees 1.8791 KeyClock 9.6035 SunBark 8.2667 Decor 7.2345 Lamp 7.1124 Average 7.4 820 Minimum 1.8 791 Maximum 9.6 035 L. Parameswaran et al. Table 3: Results after incidental distortions on Lena using ICA_ Attacks Parameters Highest percentage difference JPEG Compression Maximum Quality Factor =10 3.5728 High Quality Factor =8 2.1245 Medium Quality Factor =5 2.0000 Low Quality Factor =3 1.9925 Noise AWGN Percent = 5 6.4512 Uniform Percent = 5 3.1928 Filter Low pass Standard Deviation = 10 4.2686 Sharpening - 3.2578 Histogram Equalization - 4.2564 Gamma Correction Gamma value = 3 3.2578 Contrast Stretching Brightness = 15 Contrast = 15 1.2456 Average 3.238 2 Maximum 6.451 2 Minimum 1.245 6 Similar good performance of robustness of the proposed scheme has been obtained for other test images also. For example, Fig. 2 shows the robustness of the various test images against: • JPEG compression medium quality (Quality factor = 5) • AWGN with noise 5% • Low pass filter with standard deviation 10. 6.00 5.00 s S 0) s 0