Original scientific paper Informacije ^efMIDEM A Innrnal of Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 1 (2015), 39 - 46 A 5-Gbps CMOS Burst-Mode CDR Circuit With an Analog Phase Interpolator for PONs Hadi Hayati, Mehdi Ehsanian Faculty of Electrical Engineering, K. N. Toosi University of Technology, Tehran, Iran Abstract: This paper presents a 5-Gb/s low-power burst-mode clock and data recovery circuit based on analog phase interpolator for passive optical network applications. The proposed clock recovery unit consists of two double-edge triggered sample-and-holds (DT-SHs) and a phase interpolator. The PI instantaneously locks the recovered clock to incoming burst-mode data by coefficients generated at the DT-SHs' outputs. To reduce power dissipation in clock recovery unit, instead of two buffers, only one is utilized for the DT-SH. The proposed Pl-based BM-CDR has been designed and simulated in 0.18-^m standard CMOS technology. The Results show that reduction in power dissipation of 40% for the clock recovery unit has been achieved. The proposed BM-CDR circuit retimes data at 5Gb/s for a 210-1 pseudo-random binary sequence within the first Ul. The recovered data shows jitter at 14ps (pp). The circuit, including 1:2 data demux, draws 29mW power from a 1.8-V supply. Keywords: Burst mode communications; Passive optical networks; Clock and data recovery; Phase interpolator; Sample and hold 5-Gbps CMOS vezje s hitrim dostopom zaporednih naslovov in anlognim faznim interpolatorjem za PON Izvleček: Članek predstavlja 5Gb/s urno in podatkovno reševalno vezje s hitrim dostopom zaporednih naslovov na osnovi analognega pasivnega interpolatorja za pasivna optična omrežja. Predlagana enota za obnavljanje ure vsebuje dvorobno proženo vezje vzorčenja (DT-SHs) in fazni interpolator. PI hkratno zaklene obnovljeno uro na pridobljen podatek s koeficienti, ki jih generira izhod DT-SHs. Namesto dveh predpomnilnikov je, za zagotavljanje nižje porabe energije, uporabljen le eden. Meritve kažejo na 40 % znižanje porabe energije. Predlagano vezje je simulirano in načrtovano v 0.18 ^m CMOS tehnologiji. Predlagano BM-CDR vezje obnavlja podatke s 5 Gb/s za 210-1 psevdonaključno zaporedje v prvem Ul. Restavrirani podatki kažejo jitter 14ps (pp). Poraba energije pri napajalni napetosti 1.8 V je 29 mW. Ključne besede: komunikacija s hitrim dostopom zaporednih podatkov; pasivna optična omrežja; restavriranje podatkov in ure; fazni interpolator ' Corresponding Author's e-mail: hhayati@ee.kntu.ac.ir 1 Introduction Optical access networks for the development of broadband services have been widely used. Fiber-to-the-home (FTTH), which is one of typical services of optical access networks, has grown rapidly [1]. Passive optical network (PON) architecture is considered to be an effective solution in such networks [2]. PON is based on point to multipoint configuration where the receiver in the optical line terminal (OLT) must deal with several packets with different amplitudes and phases, which are transmitted from optical network units (ONUs). An optical splitter, that does not need electrical power, provides splitting of bit streams to ONUs, and multiplexing of traffic flows from ONUs [1]. Downstream traffic from the OLT is transmitted to all ONUs in continuous mode (CM), and each ONU selects traffic addressed to itself. In the upstream path ONUs transmit data to the OLT using time-division multiple access (TDMA) to provide time slot assignments for each ONU [3]. Therefore, the time between packets is short, and the OLT in central office requires clock and data recovery (CDR) circuit to be capable of fast data re-generation. Fig. 1 illustrates a PON system. This communications is called burst-mode (BM) in which data transmitted to the receiver is in burst packets. In synchronous optical networks (SONET), jitter transfer function is an important parameter and SONET has stringent jitter specifications. However, in PON where the transfer mode is asynchronous [2], we are able to trade off the loop bandwidth with jitter thus obtaining fast locking. In burst-mode communications, BM-Rx needs to work with BM-CDR with an instantaneous locking method to be able to deal with burst data. Accordingly, several approaches have already been proposed to form BM-CDRs with short phase acquisition time. For instance, in [4] BM-CDR based on injection-locking is proposed where its potential to fail lock is unacceptable at high-speed circuits due to process, voltage, and temperature variations. BM-CDR based on gated-voltage controlled oscillator (GVCO) is presented in [5], but it suffers from mismatches between the VCOs and incoming data rate, resulting in low consecutive identical digits (CIDs) tolerance. Moreover single GVCO approach is presented in [6]. The oversampling technique is capable of fast phase locking, reported in [7,8], however its power dissipation is remarkably high. Another approach employs the use of broad-band phase-locked-loops (PLLs) requiring high bandwidth for instantaneous locking, thereby jitter reduction is not well achieved in this approach. The work in [9] addresses PLL based BM-CDR. A new approach based on phase interpolator has recently been proposed [10], which has several advantages such as fast acquisition time and low power dissipation over previous methods. Passive splitter I ONU n |->| BM-LA -»T BM-CDR Figure 1: Application of burst-mode CDR in a passive optical network In this paper we propose a novel phase interpolator (PI)-based BM-CDR circuit with a new structure for double-edge triggered sample-and-hold (DT-SH). The DT-SH utilizes one shared-buffer between two single-edge triggered S/H (ST-SH) to increase the speed and reduce the power. This paper is organized as follows: Section II describes the architecture of the BM-CDR and the design and analysis of each building block. Section III presents simulation results. Discussion and conclusions are given in section IV. 2 BM-CDR Architecture Fig. 2 shows the block diagram of the proposed PI-based burst-mode CDR circuit. As seen, input data is applied to the DT-SHs to sample and hold quadrature clocks at both rising and falling edges of the data. The DT-SH consists of two ST-SHs, followed by a buffer which is shared between them. DT-SHs provide the sampled values of S, and S2 for the PI in order to recover clock at the output of clock recovery unit. In this way, the rising edge of the recovered clock is placed at the midpoint of incoming burst data, so that optimum sampling can be achieved in the flip-flop. Next, the frequency divider provides 2.5GHz-clock from the recovered clock in order to generate de-multiplexed data at a rate of 2.5Gb/s. In the following section, we first analyze the principle of analog phase-interpolation technique and afterwards we present the proposed structure for the DT-SH. Shared Buffer rifß D,„ 05 Gb/s Decision Circuit Din "" 5 Gb/s HI st-S^ ^ ^^^TT ' * DT-SH ' ' CK, CKq Figure 2: Block diagram of the proposed BM-CDR 2.1 Phase Interpolator An important consideration on the subject of phase interpolators is their controller which can be either digital or analog. As explained in [11], digitally controlled PI due to low phase resolution and speed limitation, degrades jitter performance of the CDR. Hence we utilized analog PI in this work. Digitally controlled PI in continuous-mode CDR is reported in [12]. Assuming two input quadrature clock signals, CK, and CKq, sin2nft and -cos2nft, respectively, are applied to the phase interpolator circuit of Fig. 3. The PI circuit multiplies these two signals, producing a clock at the output with the same frequency as quadrature clocks. In fact, the summation is performed in the PI core, and the output current of differential stages are summed on the load resistor. In order to obtain a phase range between 0° to 360°, four differential stages instead of two, are used in the schematic of the PI. However, one can use both positive and negative values for weighting factors to provide a 360° phase range. But this method, due to the need for switching of input clocks, produces additional jitter. The phase and amplitude of the interpolated signal is controlled by the current sources of the PI that realize the weighting factors. It is desired to have a recovered clock whose falling edge is aligned with data transition. Consequently, in the decision circuit, input data is being sampled at its midpoint by rising edge of the recovered clock. Every single transition of data samples and holds V, DD CK, Figure 3: PI schematic CK| and CKq, and provides voltage levels for the current tails of the phase interpolator circuit. A data transition at t=tg results in the PI coefficients of S1 and S2. The achieved recovered clock at the output of the PI is given in Eq. (1) where it describes a clock whose falling edge coincides with the data transition [10]. Recovered CK = (Sj • CK^) + (s^ • CK^ CK j = sin(2^ fl^) CKq = cos(2n ft) (1) (1.a) (1.b) In the presence of data transition, the sampled values of input quadrature clocks at t^ yield the coefficients below: (2.a) (2.b) CK, ((= t^ )= sin(2n ft„ ) = 5 2 CKq ((= t^ )=- cos(2n ft„ )= Substituting the above coefficients in Eq. (1) yields the recovered clock below: Recovered CK = - cos(2n ft0 )• sin(2n ft) -- cos(2n ft)^ sin(2n ft0) Recovered CK = - sin{2n f (t -10)} (3) This shows that the clock's falling edge is aligned with every data transition, where its rising edge sample the jittered data at its midpoint in the decision circuit. Ideally any change in data transition causes proportional change in DT-SHs output, and subsequently PI output phase changes. In order to have linear phases at the PI output, having sinusoidal values for control signals S1 and S2 are necessary, and Eq. (4) must be satisfied for any values of the coefficients [11]. 5j2 + 5j2 = const. (4) Equation (4) leads to a circle in Fig. 4(b). However, in practice implementing sinusoidal control signals to PI tail is not easily possible; hence, as seen in Fig. 4(b), a triangular approximation can be used. This approximation only effects on the amplitude of recovered clock. We can see that the phase of the recovered clock is the same as ideal one. 2.2 Double-edge Triggered Sample and Hold In the PI-based BM-CDR, sample-and-hold plays a critical role in the performance of overall architecture. In the block diagram of Fig. 2, two double-edge triggered S/Hs (DT-SHs) are utilized in which input data samples and holds quadrature clocks, producing voltage levels to drive current sources of the PI. In this work we proposed a new technique for the two DT-SHs to share a single buffer in order to sum the sampled values by both rising and falling edge of data and to bring them into one common path, thereby saving power in the idle mode of each buffer is possible. In this way the power dissipation in the DT-SHs will become half of the conventional work in [10] wherein each DT-SH employs two buffers. Since the power consumption of the DT-SHs is relatively high, it considerably reduces the overall power dissipated in clock recovery unit. Fig. 5(a) shows the proposed technique where a DT-SH, with the highlighted shared buffer, is presented. The schematic diagram of the sampling switch is depicted in Fig. 5(b). The sampled values are given below, each Recovered Clock Figure 4a: The use of quadrature clocks for sampling Figure 4b: Weighting factors and S2 for an ideal PI (dashed line) and the practical one (solid line) having differential values which are illustrated in the model of Fig. 4(a). : CKQ Sampled at rising edge of data at to : CKQ Sampled at falling edge of data at tj S: CKI Sampled at rising edge of data at to S2DN : CKI Sampled at falling edge of data at tj To reduce charge injection, differential structure for the ST-SHs is utilized. Besides, dummy transistor is also used to alleviate charge injection. Since the clock frequency is ideally equal to the data bit rate, the samples at rising edge and falling edge of data are the same. Thereby, as illustrated in Fig. 4(a), we have: sup=sr=s, S 2 — S 2 — S 2 In order to achieve a high sampling rate and short acquisition time, an open-loop sample-and-hold configuration is of interest here. At high sampling rates, the needs for larger switch and smaller hold capacitance increase, which make charge injection worse. This phenomenon is likely to result in a large value of error Shared buffer CKi+ Figure 5a: Double-edge triggered S/H (DT-SH): Sampling of CK| at both rising edge and falling edge of data Dummy switch ■Ch D - D Figure 5b: Circuit implementation of the sampling switch voltage. Both device size and hold capacitance directly effect on error voltage, creating variations in sampled values at sampling switches' output. Subsequently, tail currents in the PI vary, resulting in a poor jitter performance in the recovered clock. By choosing the size of dummy switch, half of the main switch, we can effectively remove the error voltage caused by charge injection [13]. Moreover, fully differential design makes it ro- DN t tl 0 S 2 S 1 Vd Figure 6a: CML latch Figure 6b: Frequency divider: divide-by-two bust against noise and other non-idealities. Therefore the shortcoming can be eliminated as well. The time it takes when the first bit of new data packet arrives, till the recovered clock locks to this data packet, is said to be lock time. In our work, lock time is significantly affected by the buffer's load resistor and parasitic capacitances of the devices. In the previous work, because of having two buffers inside each DT-SH connected to the PI, it might increase parasitic effects and hence the time constant. It is worth mentioning that the time constant of the discussed circuit can be reduced by use of this technique, making lock time shorter. The configuration of the schematic diagram of DT-SH in Fig. 5(a) is purposefully designed in a specific way to be able to work with current-mode logic (CML) data. Thus the CML-to-CMOS data converter is not required in the primary stage of the system. To reject the track-mode and pass the holdmode signals coming from sampling switches, transmission gates (TGs) are used. Shown in the Fig. 5(a), when the main transistor in the sampling switch block is in the sampling mode, TG which is controlled by input data goes off. Either two TGs in the left or the ones in the right of DT-SH configuration are not turned on at the same time, so that electrical short circuit at the TGs' output is not occurred. Parasitic capacitances of the transistors in sampling switches create CH and no more capacitor is needed. 2.3 Flip-flop, Frequency Divider and De-multiplexer The received burst data is sampled using a CML D flipflop, consisting of master and slave latches. Fig. 6(a) shows the schematic diagram of the CML latch used in the proposed BM-CDR. The flip-flop is clocked by the recovered clock produced from the PI at the frequency of 5GHz. It is known that CML flip-flops consume more power than its CMOS counterparts. On the other hand, CML flip-flop is faster, has better performance at high speed circuits. Therefore CML flip-flop is utilized in this design. The CML latch shown in Fig. 6(a) comprises of two stages. The first stage, which is composed of differential transistors, forms a sampler and the second one, which is composed of cross-coupled pair transistors, forms a hold stage. These two different stages make different time constants. Both time constants must be carefully chosen to satisfy the setup time and hold time requirements of the D flip-flop [14]. The architecture of Fig. 2 consists of a 1-to-2 data de-mux which de-serializes retimed serial data into parallel. We wish to demultiplex the 5Gb/s data by means of the latches used in the flip-flop driven by a half-rate clock [15]. The divider circuit provides a 2.5GHz clock with the same phase as recovered clock. Fig. 6(b) illustrates the topology of the CML divide-by-two circuit to which the recovered clock is applied. 3 Simulation Results The CDR/deserializer has been simulated in 0.18-^m CMOS technology with a 1.8-V supply. By applying a 5Gb/s pseudo-random binary sequence (PRBS) CML data of length 210-1 bits to the proposed PI-based BM-CDR, simulation results show the functionality of the system. Instantaneous phase locking is achieved as the recovered clock locks to the input data in the first unit interval (UI). The output of the clock recovery unit is buffered to be able to drive both the decision circuit and the frequency divider of next stage. Because of the delay in DT-SHs and PI circuitry and also delay added due to the buffers which are used after clock re- ■ PI □ DT-SHs ■ Saved power Figure 7: Power dissipation in clock recovery unit (dark area denotes saved power in this work in comparison with the work in [10]) V DD Previouspacket Figure 8a: Instantaneous phase locking in response to burst-mode data 0 0.1 0.2 0.3 OA Time (ns) Figure 8b: Recovered clock in response to 210-1PRBS continuous-mode data covery unit, a couple of buffer stages in the decision circuit path have been added to compensate the delay in the recovered clock path. Therefore optimum sampling has been achieved from the first bit of incoming data. Since the total power consumption in a ST-SH belongs to the power consumption in buffer, it has been approved that by sharing buffer, power dissipation in the proposed DT-SH has become half. According to the simulation results, in case of employing non-shared buffer scenario, 4.26mW power is consumed in the clock recovery unit whereas in our work it was only Time (ns) Figure 8c: Eye diagram of retimed data at 5Gb/s in response to 210-1PRBS continuous-mode data 200 150 100 u 50 100 150 200 Input phase (deg.) Figure 9: Simulated characteristics of the PI 2.54mW. For simplicity the power consumption of each building block in the clock recovery unit is depicted in Fig. 7. It is now observed that the power consumption of the PI is only a small part of the clock recovery unit and a great percentage of total power is dissipated in DT-SHs. Making use of the shared-buffer technique has saved more than 40% power of the clock recovery unit in comparison to utilizing two buffers for each DT-SH (non-shared buffer design used in the work in [10]). Fig. 8(a) illustrates instantaneous phase locking of recovered clock to the incoming burst-mode data. Note the phase locking in the presence of a new data packet with a different phase, which is highlighted in the dia- Table 1: BM-CDR performance summary and comparison with other works Parameters Supply voltage 1 5V 3 3V/1 8V N/A 1 2V This work** 1 8V Process technology 90nm 250nm 130nm 65nm 180nm Data rate (Gb/s) 20 10 3125 10 3125 1-6 Method Injection Locking GVCO Oversampling PI PI Phase locking time 1UI 1UI N/A 1U 1UI Power consumption 102mW (CDR core) 856mW 5.8W (entire Rx) 22mW (CDR core + demux) 29mW (CDR core + demux) * Fabricated ** Simulated AA 0 - r 0 ' 0 2 n ■ [6 [10 C o 50 100 Temperature (°C) Figure 10: Peak-to-peak jitter of the recovered clock for process and temperature variations gram. Fig. 8(b) shows the recovered clock in response to a continuous-mode PRBS data of length 210-1 bits. As shown in Fig. 8(c), recovered data with the eye opening of 1.2V is achieved. The simulated peak-to-peak jitter of recovered clock and data is 9ps and 14ps, respectively. A simulation for the PI characteristics has been performed and plotted in Fig. 9. The phase of input quadrature clocks has been shifted from 0° to 180°, and the result for recovered clock shows a maximum deviation of 11.7° from its ideal interpolation at 5GHz. Corner case simulation has also been performed and results for the recovered clock in slow-slow (SS) and fast-fast (FF) cases shows a peak-to-peak jitter of 22.5ps and 27.5ps, respectively. Fig. 10 shows the simulation results for peak-to-peak jitter of the recovered clock vs. process and temperature variations. According to simulation results, the proposed BM- CDR circuit consumes 29mW power from a 1.8-V supply. The performance of the system and a comparison with other works is summarized in Table I. Although, the simulation results may not be comparable with the experimental results; however the concept of the proposed technique for power reduction is approved. 4 Discussion and Conclusions The requirement for immediate-locking burst-mode CDR circuits at high data rates in passive optical networks is a challenging topic. This work presents a PI-based burst-mode clock and data recovery circuit dedicated to reduce the power consumption of clock recovery unit in high speed multi-access networks. The architecture benefits from the fact that the system is able to work with CML data, coming directly from post amplifier of the previous stage (LA); therefore designing a CML-to-CMOS data converter is not required. Moreo- ver, since designing high-speed buffers is a challenging topic and it is power hungry, reducing the buffers used in DT-SH by a factor of 2, dramatically alleviated power consumption and speed limitations. By employing our technique in DT-SH block by sharing a buffer between two ST-SHs, we showed that approximately 40% reduction in power consumption in clock recovery unit is achieved. Results verify that the recovered clock locks to the incoming burst data within the first UI. Although by sharing the buffer, linear values at the DT-SH's output are observed, and the PI linearity is simulated with a maximum deviation of 11.7° from its ideal interpolation. The functionality of the system is verified in the presence of process and temperature variations. The proposed BM-CDR consumes 29mW power from a 1.8-V supply. The circuit retimes input jittered-data and obtains a p2p jitter of 14ps at 5Gb/s. Acknowledgement This work was supported by K. N. Toosi University of Technology. 5 References 1. Nakamura M, Ueda H, Makino S, Yokotani T, Os-hima K. Proposal of networking by PON technologies for full and Ethernet services in FTTx. J Lightwave Technol 2004;22:2631-40. 2. Cooper IR, Bramhall MA. ATM passive optical networks and integrated VDSL. IEEE Commun. Mag 2000;38:174-9. 3. Kim HG, Lee HJ. A new burst-mode clock recovery technique for optical passive networks. Int J Electron Commun 2010;64:339-43. 4. Lee J, Liu M. A 20Gb/s burst-mode CDR circuit using injection-locking technique. IEEE J Solid-State Circuits 2008;43:619-30. 5. Han PS, Choi WY. 1.25/2.5-Gb/s dual bit-rate burst-mode clock recovery circuits in 0.18-^m CMOS technology. IEEE Trans Circuits Syst II, Exp Briefs 2007;54:38-42. 6. Terada J, Nishimura K, Kimura S, Katsurai H, Yoshi-moto N, Ohtomo Y. A 10.3125Gb/s burst-mode CDR circuit using a M DAC. In: IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers; 2008 Feb 3-7; San Francisco, CA. New York: IEEE; c2008. p. 226-27. 7. Suzuki N, Nakura K, Kozaki S, Tagami H, Nogami M, Nakagawa J. 82.5 Gsample/s (10.3125 GHzx 8 phase clocks) burst-mode CDR for 10G-EPON systems. Electron Lett 2009:45:1261-3. 8. Ossieur P, Bauwelinck J, Yin X, Melange C, Baeke-landt B, Ridder TD, et al. A dual-rate burst-mode bit synchronization and data recovery circuit with fast optimum decision phase calculation. Int J Electron Commun (AEU) 2009;63:931-8. 9. Li A, Faucher J, Plant DV. Burst-mode clock and data recovery in optical multi-access networks using broad-band PLLs. IEEE Photonics Technol Lett 2006;18:73-5. 10. Abiri B, Shivanaraine R, Sheikholeslami A, Tamura H, Kibune M. A 1-to-6Gb/s phase-interpolator-based burst-mode CDR in 65nm CMOS. In: IEEE International Solid-State Circuits Conference (ISS-CC) Digest of Technical Papers; 2011 Feb 20-24; San Francisco, CA. New York: IEEE; c2011. p. 154-6. 11. Kreienkamp R, Langmann U, Zimmermann C, Aoyama T, Siedhoff H. A 10-Gb/s CMOS clock and data recovery circuit with an analog phase interpolator. IEEE J Solid-State Circuits 2005;40:736-43. 12. Hu S, Jia C, Huang K, Zhang C, Zheng X, Wang Z. A 10Gbps CDR based on phase interpolator for source synchronous receiver in 65nm CMOS. In: IEEE International Symposium Circuits and Systems (ISCAS); 2012 May 20-23; Seoul, South Korea. New York: IEEE; c2012. p. 309-12. 13. Yen RC, Gray PR. A MOS switched-capacitor instrumentation amplifier. IEEE J Solid-State Circuits 1982;17:1008-13. 14. Ahmadi MR, Amirkhany A, Harjani R. A 5 Gbps 0.13 m CMOS pilot-based clock and data recovery scheme for high-speed links. IEEE J Solid-State Circuits 2010;45:1533-41. 15. Jung JW, Razavi B. A 25-Gb/s 5-mW CMOS CDR/De-serializer. IEEE J Solid-State Circuits 2013;48:684-97. Arrived: 19. 11. 2014 Accepted: 04. 01. 2015