SINGLE CORE HARDWARE MODULE TO IMPLEMENT ENCRYPTION IN TECB MODE M. B. I. Reaz\ M. I. Ibrahimy\ F. Mohd-Yasin^, C. S. Wei^, M. Kamada^ ^Department of Electrical and Computer Engineering, International Islamic University Malaysia, Kuala Lumpur, Malaysia ^Faculty of Engineering, Multimedia University, Selangor, Malaysia ^Department of Computer and Information Sciences, Ibaraki University, Hitachi, Japan Keywords: Encryption, DES, 3DES, FPGA, Synthesis, Hardware Abstract: The growth of the Internet as a vehicle for secure communication has resulted in Data Encryption Standard (DES) no longer capable of providing high-level security for data protection. Triple Data Encryption Standard (3DES) is a symmetric block cipher with 192 bits key proposed to further enhance DES. Many applications crave for the speed of a hardware encryption implementation while trying to preserve the flexibility and low cost of a software implementation. This project used single core module to implement encryption in Triple DES Electronic Code Book (TECB) mode, which was modeled using hardware description language VHDL. The architecture was mapped in Altera EPF10K100EFC484-1 and EP20K200EFC672-1X for performance investigations and resulted in achieving encryption rate of 102.56 Mbps, area utilization of 2111 logic cells (25%) and a higher maximum operating frequency of 78.59 MHz by implementing on the larger FPGA device EP20K200EFC672-1X. It also suggested that 3DES hardware was 2.4 times faster than its software counterpart. Elektronski modul za izvedbo šifriranja v TECB načinu Kjučne besede: šifriranje, DES, 3DES, FPGA, sinteza, strojna oprema Izvleček: Porast zahtev po uporabi varnih internetnih storitev je privedel do spoznanja, da DES standard (Data Encryption Standard) ne omogoča več zelo visoke zaščite podatkov. Predlagani trojni DES (3DES), kije simetrična šifra s 192-bitnim ključem, naj bi dodatno izboljšal DES. Izvedba 3DES standarda v strojni opremi omogoča visoke hitrosti šifriranja in poskuša obdržati fleksibilnost in nizko ceno programskih rešitev. V delu opišemo uporabo elektronskega modula za izvedbo 3DESTECB ( 3DES Electronic Code Block), ki smo ga modelirali z uporabo VHDL jezika. Arhitekturo smo preslikali v Alterini FPGA vezji in dosegli šifrirne hitrosti 102.56 Mbps, izkoristek površine 2111 logičnih celic (25%), in višjo delovno frekvenco 78.59MHz pri uporabi večjega vezja EP20K200EFC672-1X. Ocenili smo, da je elektronska izvedba 3DES do 2.4-krat hitrejša od programske rešitve. 1. Introduction In the wake of advancement in computer technology and increasingly volatile information flow, we are faced with challenges of safeguarding information that is not meant for public l RESULT 515.0ns Value: 0 490.Oris 500.0ns SlO.Ons 01; 13.S BC AO 0000000000000000 Fig. 2: Functional simulation of encryption operation 004B451FE30662F8 520.0ns 3456789ABCDEF 457799BBCDFF1 2345678912345 i3413249B3EF3e 530.0ns 0000000000000000 Name: i-RST i~GO ^ CLK m READY P' EXT__IN P^KEY_A gs KEY^B P" KEY^C P RESULT -a- Value: 1 490.0ns 500.0ns SlO.Ons Q 90-^ 133 BC ACi 0000000000000000 ;{0123456789ABCDEF 520.0ns B451FE30662F8 B57799BBGDFF1 12345678912345 i3413249B3EF38 .530.0ns 0000000000000000 Fig. 3: Function simulation of decryption operation From Figure 2, the keys used in the encryption of data message 0123456789ABCDEF were 133457799BB-CDFF1, BC12345678912345 and AC83413249B3EF38. The cipher resulted text was 904B451FE30662FB. During the decryption operation, the cipher text was decrypted to obtain the original data message of 0123456789ABCDEF as shown in Figure 3. 4.2 Synthesis and Optimization With the functional simulation showing the correct behavioural result, synthesis was done using Altera Quartus 11 4.0 software on the core design implemented into FPGA. Device family that could fit the design into it was chosen and the timing requirements were set. Different FPGA could result in different maximum frequency obtained. A larger FPGA device family such as APEX20KE gave a higher maximum clock frequency than FLEXI OKE. This was an important criterion to be considered while deciding on the FPGA to be used even though the design could be fitted into both. The smaller device family had a higher resource utilization percentage. By deciding to use a larger device family, speed optimization had been given priority in view of excessive amount of resource in the FPGA selected. There was a trade off between area and speed. A higher number of logic elements used that resulted in higher maximum operating frequency. Initial synthesis of the design on APEX20KE family gave a maximum frequency of 72.77 MHz. However, after switching off the 'Remove Duplicate Registers' and 'Remove Duplicate Logic' setting, the maximum operating frequency achieved approximately 77MHz. The logic cells used that summed up to be 25%, which was 2% more than the previous setting. By setting the maximum frequency requirement to 80MHz, a higher value of 78.59 MHz was achieved. This was the highest maximum clock frequency value with APEX20KE family that could be obtained from the optimization process. 4.3 Timing Simulation Timing simulation was performed to verify that the module functioned correctly and there were no timing violations in the implemented design. The functional simulation was done using MAX PLUS II software but the timing simulation was done using Quartus II 4.0 software. During timing simulation, the total delay of the wires and combinational logic was taken into account. Initial testing using the clock signal having frequency that was higher than that of maximum operating frequency resulted in er- roneous output. The result obtained was not the encrypted data message. This was because the encrypted data cannot be decrypted to recover the initial data message. The total delay had exceeded one clock cycle period. The clock signal period was then set to 13ns. This clocking period was larger than the total wire and combinational logic delay. Different sets of keys and input data blocks were used during the simulation. It was found that the encrypted data could be decrypted to recover the original data. Besides that, the reset pin had also been tested. Reset signal was set to 'high' to reset the design. 4.4 Synthesis Results Table 1: Synthesis results Family APEX20KE Device EP20K20()EFC672-1X Name Core Total logic elements 2111 / 8320 (25%) Total I/O pins 325 / 376 (86%) Total memory bits 2048/ 106496 (1%) Total PLLs 0 / 2 (0%) Total combinational functions 2110 Total registers 408 Performance, f„i,x 78.59 MHz Clock period 12.724 ns Table 1 shows the synthesis results of the 3DES encryption engine. The FPGA family that had been selected for the realization of 3DES encryption engine was APEX20KE (more precisely, EP20K200EFC672-1X). Out of the 8320 logic elements contained in the device, a total of 2111 logic cells were used. A total of 325 I/O pins were utilized, which is equivalent to 86 percent of the total pins in the device. Out of these 325 pins, 65 pins were output pins while the remaining 260 pins were input pins. Out of a total of 106496 memory bits in the device, 2048 of them were utilized. This is equivalent to 1 percent of the total memory bits resource. Besides that, the total number of registers used in the EP20K200EFC672-1X device summed up to be 408. A maximum clock frequency of 78.59 MHz was obtained. The clock signal that was used in the device must have a period of at least 12.724 ns. Any period below this value gave a faulty result. 4.5 Timing and Area Analysis The results for timing and area analysis of the main modules are presented in terms of maximum operating frequency and logic cell (LC). The analysis was done using Ouar-tus II software. The devices chosen for the implementation were EP20K200EFC672-1X of APEX20KE family and EPF10K100EFC484-1 of FLEXI OKE family. Comparison was done between the two devices. Tables 2 and 3 show the effect of registers and logic cells duplication in EP20K200EFC672-1X and EPF10K100EFC484-1 respectively when the full 3DES architecture was mapped into them. To implement these features, the 'Remove Duplicate Registers' and 'Remove Duplicate Logic' settings were selected or deselected. When the 'Remove Duplicate Registers' and 'Remove Duplicate Logic' settings were selected during the hardware implementation of the encryption module in EP20K200EFC672-1X, this resulted in lower area utilization of 1984 logic cells and lower maximum operating frequency of 72.77 MHz. When these settings were deselected, higher area utilization of 2111 logic cells and higher maximum operating frequency of 78.59 MHz was obtained. However, that is not the case when EPF10K100EFC484-1 was used. Selecting the 'Remove Duplicate Registers' and 'Remove Duplicate Logic' setting resulted in lower area utilization but higher maximum operating frequency. Table 4 shows the synthesis results for the final design of the project. Two devices were used, namely EP20K200EFC672-1X and EPF1 OKI00EFC484-1. EP20K200EFC672-1X is a larger device compared to EPF10K100EFC484-1. Table 2: Effect of registers and logic cells duplication in EP20K200EFC672-1X 'Remove Duplicate Registers' and 'Remove Duplicate Logic' Area (LC) Clock Period (ns) Maximum Operating Frequency (MHz) On 1984/ 8320 (23%) 13.742 72.77 Off 2111 / 8320 (25%) 12.724 78.59 Table 3: Effect of registers and logic cells duplication in EPF10K100EFC484-1 'Remove Duplicate Registers' and 'Remove Duplicate Logic' Area (LC) Clock Period (ns) Maximum Operating Frequency (MHz) On 1924/4992 (38%) 17.5 57.14 Off 2080 / 4992 (42%) 17.6 56.82 Table 4: Synthesis results Device Area (LC) Clock Period (ns) Maximum Operating Frequency (MHz) EP20K200EFC672-1X 2111/ 8320 (25%) 12.724 78.59 EPF10K100EFC484-1 1924/4992 (38%) 17.5 57.14 When the larger device was used, it was found that the final design had a higher maximum operating frequency of 78.59 MHz. It utilized more logic cells. However, when the smaller device from FLEXI OKE family was used, it only had a maximum operating frequency of 57.14 MHz. Besides that, the design used only 1924 logic cells of the resource, which was lesser than the 2111 logic cells used in EP20K200EFC672-1X. With this, it can be concluded that the mapping of the design architecture on different devices can result in different maximum operating frequency and area utilization. A larger device results in higher maximum operating frequency and larger area utilization. As such, considerable decision must be taken on whether a faster operation is needed or a smaller device is required. Figure 4 demonstrates the RTL view of the core entity. It is shown that core was formed by three smaller entities, namely s_mac, key_^block and inp_block. Each of these entities had its own unique function. S.,„mac controlled and synchronized the operations of the other two entities while key__block processed the three keys, producing the sub-keys needed before sending them to inp^block. lnp_block was the entity where the actual encryption and decryption of the plaintext occurred. Table 5: Comparisons between hardware and software implementation 3DES (FPGA) 3DES (software) Key size (bits) 192 192 Data rate (Mbps) 102.56 42.9 Triple DES was implemented into FPGA and as well as MATLAB using an Intel Pentium III 866 MHz machine. It shows that 3DES hardware was significantly (2.4 times) faster than its software counterpart. The 3DES software could only manage a data rate of 42.9 Mbps compared to 102.56 Mbps of 3DES hardware. 5. Conclusions The hardware implementation of 3DES encryption engine on FPGA chip was realized. The chip selected was EP20K200EFG672-1X of APEX20KE family. It could encrypt data at a rate of 102.56 Mbps, with a maximum operating frequency of 78.59 MHz and area utilization of 2111 logic cells. The throughput of 102.56 Mbps in the current full implementation of 3DES core can be considered as low by industry standard. As such, to improve the throughput of the design, pipelining of the iterations process can be implemented. Registers can be added to store data during the pipelining process. This will invariably reduce the maximum clock frequency; however the number of clock cycles being used for one complete 3DES operation can be greatly reduced, thus reducing the latency. To allow more secured encryption process, additional 3DES operation modes can be added to the core module. Gurrently, the encryption hardware only operates under TEGB mode. By including more modes of operation, users can choose to operate under certain mode, depending on their preference. Table 5 shows the comparisons done on the performances of the hardware and software implementation of 3DES. Fig. 4: RTL view of core entity 170 References /1/ Aladdin Knowledge System, "The enduring Value of Symmetric Encryption", White Paper, pp: 5-8, August 2000. /2/ Harper, S. and Athanas P., "A Security Policy Based Upon Hardware Encryption", System Sciences, 2004. Proceedings of the 37"^ Annual Hawaii Internationa! Conference, pp: 190 - 197, Virginia, 5-8 Jan. 2004, /3/ Davor Runje, Mario Kovac, "Universal Strong Encryption FPGA Core Implementation", Design, Automation and Test in Europe, 1998, Proc., pp; 923-924, France, 23-26 Feb 1998. /4/ O.Y.H.Cheung, P.H.W.Leong, "Implementation of an FPGA Based Accelerator for Virtual Private Networl