Informática 38 (2014) 115-124 115 A Fast Chaos-Based Pseudo-Random Bit Generator Using Binary64 Floating-Point Arithmetic Michael François INSA Centre Val de Loire, Univ. Orléans, LIFO EA 4022, Bourges, France E-mail: michael.francois@insa-cvl.fr David Defour and Christophe Negre Univ. Perpignan Via Domitia, DALI F-66860, LIRMM UMR 5506 F-34095, Perpignan, France E-mail: {david.defour,christophe.negre}@univ-perp.fr Keywords: PRBG, pseudo-random, logistic map, IEEE-754, cryptography Received: June 27, 2013 Chaos-based cryptography is widely investigated in recent years, especially in the field of random number generators. The paper describes a novel pseudo-random bit generator (PRBG) based on chaotic logistic maps. Three logistic maps are combined in the algorithmic process, and a block of 32 random bits is produced at each iteration. The binary64 double precision format is used according to the IEEE 754-2008 standard for floating-point arithmetic. This generator provides a considerable improvement of an existing generator in the literature. Rigorous statistical analyses are carefully conducted to evaluate the quality and the robustness of the PRBG. The obtained results showed the relevance of the proposed generator, which is suitable even for real-time applications. Povzetek: V clankuje opisan hitri psevdo-nakljucni generator za kriptiranje. 1 Introduction The generation of pseudo-random bits (or numbers) plays a critical role in various applications such as: statistical mechanics, numerical simulations, gaming industry, communication systems, cryptographic protocols and many others [1]. In practice, the generation of such numbers with randomness properties is an open problem and continues to be investigated. There are two main classes of generators: software and physical generators. For the software generators, the term "pseudo-random" is applied to indicate that, the generator is defined as an algorithm allowing to produce sequences of bits with randomness properties. From a single initial seed, these generators will always produce the same outputs. The assets of such generators are: a fast execution time, repeatability and reproducibility of the pseudo-random sequences. The second class of generators exploits physical random phenomena for the generation, but is not discussed here. Some basic techniques are often used for generating pseudo-random numbers, such as: linear recurrence [2], non-linear congruence [3], linear feedback shift register (LFSR) [4], cellular automata [5], discrete logarithm problem [6], quadratic residuosity problem [7], etc. Generally, the security of a cryptographic generator is based on the difficulty to solve the related mathematical problem. Beyond the security, such kind of generator is sometimes too slow due to heavy computational instructions. For example, the Blum Blum Shub algorithm [7] has a security proof, assuming the computational difficulty of the quadratic residuosity problem. The algorithm is also proven to be secure, relatively to the difficulty of integer factorization problem. However, the generator is impractical unless extreme security is needed. The Blum-Micali algorithm [6] presents also an unconditional security proof based on the difficulty of the discrete logarithm problem, but is also ineffective. One interesting way to design pseudo-random generators can be found in chaos theory [8,9,10]. Indeed, chaotic systems are characterized by their high sensitivity to initial parameters and some properties like ergodicity, mixing property and high complexity [8,11]. A secret parameter should be sensitive enough to ensure the so-called avalanche property. A small deviation in the initial conditions should cause a large modification in the output, that makes chaotic systems very attractive for pseudo-random number generation. These generators commonly use chaotic logistic maps and produce large pseudo-random sequences. For a high security level, it is necessary to combine several logistics maps, in order to increase the complexity of the cryptosystem. But, this is not always sufficient, because a rigorous analysis is more appropriate to evaluate the randomness level and the global security of the generator. In this paper, a new PRBG combining three chaotic logistic maps is presented. It provides a significant improvement on security and performance, of the generator proposed by Patidar et al. [12]. The proposed algorithm uses the binary64 floating-point arithmetic and produces at each 116 Informatica 38 (2014) 115-124 M. François et al. iteration a block of 32 random bits. The pseudo-random sequences passed successfully the various statistical tests related to the randomness and correlation. The assets of the PRBG are: high sensitivity to initial seeds, high level of randomness and fast execution time. The paper is structured as follows, in Sec. 2 the used chaotic logistic map and the description of the Patidar's algorithm are given. Section 3, presents a detailed description of our algorithm and a brief discussion about the floating-point representation. The statistical analysis is given in Sec. 4. The security aspect of the PRBG is discussed in Sec. 5, before concluding. 2 Background 2.1 The chaotic logistic map Frequently used in chaos theory as well as in chaos-based cryptosystems, the form of the logistic map is given by: F(X) = ftX(1 - X), (1) with ft between 3.57 and 4.0 [13]. Its chaotic behavior has been widely studied and several generators have already used such logistic map for generating pseudo-random numbers [14, 15, 16, 17]. To avoid non-chaotic behaviour (island of stability, oscillations, ...), the value of ft should be near 4.0, which corresponds to a highly chaotic behaviour. The logistic map is used under the iterative form: X„+1 = ftX„(1 - Xn), Vn > 0, (2) where the starting seed X0 is a real number belonging to the interval ]0,1[. All the computed elements Xn are also real numbers in ]0,1[. 2.2 About the algorithm of Patidar Patidar et al. [12] have proposed a PRBG based on two logistic maps. The algorithm starts from random independent initial seeds X0, Y0, belonging to ]0,1[. The chosen value of ft is 4 and the two logistic maps are given by: Xn+1 = 4Xn(1 - Xn), Vn > 0, (3) Yn+1 = 4Yn(1 - Yn), Vn > 0. (4) The main idea of the algorithm is very simple and consists to compare the outputs of both the logistic maps in the following way: g(Xn+i,Yn+i) = 1 if X„+1 > Y„+1 0 i/Xn+i < Y„+1 Even if the idea is interesting, the algorithm presents several weaknesses: 1. Only one bit is generated after each iteration, that corresponds to a very low throughput according to the relevance of the logistic maps. 2. The sequences produced with nearby seed values are extremely correlated. 3. The seed space has a much lower entropy than 128, due to the existing correlation between the pseudorandom sequences. Therefore, the generator presents weak or degenerate seeds. 4. At a given iteration n, in the case of eventual collision between Xn+1 and Yn+1 (which is possible), the output bit will always be 0 until the end of the output sequence. The algorithm proposed in this paper also combines several chaotic logistic maps, but is designed to avoid all those weaknesses and then ensure a better security. 3 The proposed PRBG 3.1 Floating-point representation As we know, digital computers use binary digits to represent numbers. In the case of real numbers, there are two representation formats: fixed-point and floating-point formats. To represent integers or real numbers with a fixed precision, it is more suitable to adopt the first format. The second format can support a much wider range of values. Nowadays, the floating point arithmetic is standardized by IEEE/ANSI [18]. Two different floating-point formats are defined: single precision (binary32) and double precision (binary64). In this paper, we only focus on binary64 floating-point format, which is generally used to achieve a higher simulation precision for the study of chaotic systems. Binary64 has two infinities, two kinds of NaN (i.e. Not a Number) and the set of finite numbers. Each finite number is described by three fields: s a sign represented on one bit (1 indicating negative), e a biased exponent represented on 11 bits and m a mantissa represented on 52 bits (see Figure 1). The bits of the mantissa can be divided into two blocks of 20 bits and 32 bits and the treatment is applied on the block of 32 bits (mantissal). 3.2 Description of the algorithm As in the paper of Patidar et al., our algorithm uses the same type of chaotic logistic map given by Eq. 1. In our case, the value of ft is fixed to 3.9999 that corresponds to a highly chaotic case [19, 20]. Indeed, the Lyapunov exponent [21, 22] measures the chaotic behavior of a function and the corresponding Lyapunov exponent of the logistic map for ft = 3.9999 is 0.69 very close to its maximum which is 0.59. The chaotic logistic map is used under the iterative form: Xn+1 = 3.9999Xn(1 - Xn), Vn > 0, (5) where the starting seed X0 is a real number that belongs to ]0,1[. All the computed elements Xn are also real numbers A Fast Chaos-Based Pseudo-Random Bit. Informatica 38 (2014) 115-124 117 63 62 -------- 52 51 □ exponent 32 31 0 mantissau mantissal slgn mantissa Figure 1: Floating-point representation in binary64 format. in ]0,1[. Our algorithm takes into account the various weaknesses of the algorithm proposed by Patidar et al. Thus, to have a large space of output sequences, three logistic maps are used during the generation process. The same value of fi is used for each one and the corresponding equations are: Xn+1 = 3.9999Xn(1 - Xn), Vn > 0, (6) Y„+1 = 3.9999Y„(1 - Y„), Vn > 0, (7) Zn+1 = 3.9999Zn(1 - Zn), Vn > 0. (8) For each computed value Xn, Yn and Zn, a binary64 floating-point representation is used as shown in Figure 1. The algorithmic principle is simple and consists at each iteration, to apply a xor operation on the 32 bits of mantissal of the three output elements Xn, Yn and Zn. Thus, the algorithm allows to produce 32 random bits per iteration and therefore increase the throughput of the generator. The operating principle of the algorithm is shown in Figure 2. As one can see, the seeds from which the generation process starts are Xk, Yk and Zk. Indeed, for nearby seed values, the elements Xn, Yn and Zn are almost identical in the first rounds. Thus, to completely decorrelate the beginning of the pseudo-random sequences, it is necessary to start the generation only at the kth iteration. The number of preliminary rounds k and the way to choose the initial seeds are presented in Sec. 3.3. The implementation of the algorithm in C language is simple: just include the file ieee754.h and use the defined functions for extracting the bits of mantissal for each computed element Xn, Yn and Zn. 3.3 The choice of initial parameters 3.3.1 Initial seed selection To improve the randomness quality of the generated sequences, the choice of the initial seed values should not be neglected. The coefficient values of the elements Xn, Yn and Zn, belong to ]0,1[. Due to symmetric structure of the logistic map, it is necessary to choose the starting seeds in one of the two half-intervals (here ]0, 2-1 [) to avoid similar trajectories. In binary64 floating-point format, the computed term (1 - X) is equal to 1.0 for any X in ]0,2-53[, then for a seed value in ]0,2-53[, the computed value of Eq. 2 is equivalent to fiXn. To avoid such problem, initial seed values must be chosen in ]2-53, 2-1 [. The three initial seeds must be different, then the difference J[2] between the values should be representable in bi- nary64. The value of ¿[2i is in the worst case 2-53, which corresponds to log10(253) (« 15.955) decimal digits. To have a significant difference we choose ¿[10] = 10-15, which corresponds to ¿[2] = 2-49 8289. Thus, to avoid identical trajectories, the difference between each initial seed should be at least ¿[2] = 2-49 8289. 3.3.2 Number of preliminary rounds In the case where the values of initial seeds (X0, Y0 and Z0) are very close, the beginnings of chaotic trajectories are almost similar. To avoid such problem, it is necessary to apply some preliminary rounds before starting to produce the random bits. Thus, it is necessary to see at which number of iterations, the difference J[2] begins to be propagated. We consider that the initial seed is X0 = J[2] = 2 49 8289, and the obtained trajectory with the Eq. 5 is shown in Figure 3. 1 0.8 e 0.6 X 0.4 0.2 0 20 40 60 80 Iteration n 100 120 140 Figure 3: Trajectory of the chaotic logistic map given in Eq. 5, for n = 135 and X0 = 2 -49.8289 One can see that, the trajectory starts to oscillate almost from the 30th iteration. Thus, the generation of random bits will begin from the iteration 30. That allows to decorrelate the outputs of the PRBG, and then increase the sensitivity related to the initial seeds. 4 Statistical analysis The quality of the output sequences produced by any PRBG is the crucial element. Indeed, the sequences should present individually a high level of randomness and be decorrelated with each other, whatever the initial seed values. Therefore, a statistical analysis should be carefully conducted to prove the quality of the pseudo-random sequences. 0 118 Informatica 38 (2014) 115-124 M. François et al. unused elements Xk- Yk- .10110 ... 01101 . 32-bit block I I Zn+1 Z z . ... z k Figure 2: The operating principle of the proposed PRBG. 4.1 Randomness evaluation The analysis consists in evaluating the randomness level of the sequences generated by the PRBG. In the literature, various statistical tests exist for analysing the randomness level of sequences. The NIST (National Institute of Standards and Technology of the U.S. Government) proposes a battery of tests that can be applied on the binary sequences [23]. One can also find other known libraries such as TestU01 [24] or the DieHARD suites [25]. Here, the sequences are evaluated through statistical tests suite NIST. Such suite consists in a statistical package of fifteen tests developed to quantify and to assess the randomness of binary sequences, produced by pseudo-random number generators. Here, we define three approaches for testing the randomness level of sequences. Let N be the total number of generated sequences and the binary size of each sequence is M = 32 x B, with B the number of 32-bit blocs. The three approaches are: 1. APP-1 (individual sequences): the produced sequences are individually tested and the results are given as ratio of success relatively to a threshold determined from the total number (N) of tested sequences. Such approach indicates the global randomness level of the tested sequences. 2. APP-2 (concatenated sequence): all the individual sequences are concatenated to form a new single sequence. The randomness level of the constructed sequence is analysed through the NIST tests. The constructed sequence should pass the tests whether the original sequences are truly decorrelated and random. 3. APP-3 (resulting sequences): all the sequences are superimposed on each other (forming a matrix), and new sequences are constructed from columns. Thus, B resulting sequences of binary size 32 x N are constructed, by collecting for each position 1 < j < B, the 32-bit bloc of each sequence. If the original sequences are really random, the resulting sequences should also be random (with B as large as N) and then pass the NIST tests. Such approach is very interesting, in the case of generating sequences by nearby seed values, and allows to detect some hidden linear structures between the original sequences. These approaches are used to analyse a subset of generated sequences. In the case of very distant initial seed values, the corresponding chaotic trajectories are different, and allow to produce good pseudo-random sequences. The worst case occurs when closed seed values are used, because that can lead to highly correlated output sequences. That is why, the analysis is achieved on sequences generated from nearby initial seed values. Here, a subset of N = 16000 pseudo-random sequences is produced, where the binary size of each sequence is 32000 (i.e. B = 1000). We choose arbitrarily, one starting seed value X0 = 0.24834264038461704925, and then Y0 = X0 + 6[2] and Zo = Yo + 6[2], with 6[2] = 2-49'8289. These three seeds allow to generate one pseudo-random sequence. The other sequences, are generated from X0, Y0 and by incrementing of 6[2] the last seed value Z0 in a simple loop. In the case of Patidar's algorithm, only two initial seeds are needed to produce a pseudo-random sequence. Here, the first two seeds values are given by: X0 = Y0 and Y0' = Z0. For generating the other sequences, the same strategy is applied and consists to make a loop by incrementing of 6[2] the seed Y0'. It should be noted that, for a better comparison, the same coefficient fi (i.e. 3.9999) is used for the logistic map in the Patidar's algorithm. The results of NIST tests obtained for the two algorithms are presented in Table 4.1 and Table 4.1. For approach APP-1 (resp. APP-3), the acceptable proportion should lie above 98.76% (resp. 98.00% ) and does not concern the tests Random Excursions-(Variant). For APP-2, a sequence passes a statistical test for pvaiue > 0.01 and fails otherwise. For the tests Non-Overlapping and Random Excursions-(Variant), only the smallest percentage of all sub-tests is given. For individual sequences, the Universal A Fast Chaos-Based Pseudo-Random Bit. Informatica 38 (2014) 115-124 119 test is not applicable due to the size of initial sequences. One can remark that, for the proposed PRBG, all the tested sequences pass successfully the NIST tests. These results show clearly the quality of the tested sequences. For Pati-dar's algorithm, individually, the sequences are not enough random, because there are many tests that are not successful, for example: Runs, Overlapping or even Serial tests. The results of approaches APP-2 and APP-3 show that, the tested sequences are extremely correlated. One should know that, in the article of Patidar et al., each sequence is produced from a randomly chosen initial seed belonging to ]0,1[, then the seeds were too different from each other. That is why this problem has not been detected. 4.2 Correlation evaluation 30 25 20 o 15 10 %2T Prop-PRBG i Pat-PRBG ■■ 0.2 0.4 0.6 Correlation coefficient value 0.8 Figure 4: Histogram of Pearson's correlation coefficient values on interval [-0.1,0.1] (resp. [-0.5,0.5]), for the proposed (resp. Patidar's) PRBG. A part of the correlation evaluation has already been done by applying the NIST tests (APP-2 and APP-3). Here, two additional methods are used to analyse the correlation between the pseudo-random sequences. Firstly, the correlation between sequences is evaluated globally by computing the Pearson's correlation coefficient [26] and secondly, by using the Hamming distance. 4.2.1 Pearson's correlation coefficient The analyse consists to compute the Pearson's correlation coefficient between each pair of sequences, and to present the distribution of the values through a histogram. Consider a pair of sequences such as: Si = [x0,..., xB_i] and S2 = [y0,... ,yB-1]. Therefore, the corresponding correlation coefficient is: 4.2.2 Hamming distance Another type of correlation based on the bits of produced pseudo-random sequences is analysed. Given two binary sequences S = [s0,..., sM_1] and S' = [s0,..., s'M_1] of same length (M), the Hamming distance is the number of positions where they differ. The distance is given as: M _1 E< j=0 d(s,s')=!>; e sj). (10) For truly random binary sequences, the value of d(S, S') should be around M/2, that corresponds to the proportion 0.50. This distance is computed between each pair of generated sequences (N = 16000), and all values are represented through a histogram. For the two algorithms, the histograms are shown in Figure 5. One can see that for our CSl,S2 B_1 J2 (xi - x) • (vi - y) i=0 Y, (xi - x)2 i=0 1/2 B_1 E (vi - v)2 i=0 1/2 (9) where xi and y are 32-bit integers, x J2 xi/B and i=0 80 70 60 in 50 >> 40 c n e S- 30