Informatica 32 (2008) 183-188 183 A Simple Algorithm for the Restoration of Clipped Speech Signal Abdelhakim Dahimene, Mohamed Noureddine and Aarab Azrar Electrical and Electronic Engineering department, Boumerdes University Boumerdes, Algeria, 35000 E-mail: dahimenehakim@yahoo.fr Keywords: speech signal, clipped speech, restoration, interpolation, linear prediction, least square method, Kalman filter Received: August 6, 2007 This paper deals with the problem of peak clipped speech. Our basic assumption is that the clipped speech is voiced and can be linearly predicted with a high accuracy. The coefficients of linear prediction are computed using two different algorithms: a least square direct method and a recursive Kalman filter. The speech reconstruction is accomplished using backward prediction. Povzetek: Predstavljen je algoritem za obnavljanje zvocnega signala. 1 Introduction Speech acquired by personal computer sound cards is often confronted with two main problems: DC level wandering and peak clipping. While building a data base for our speech recognition project, we have been confronted with both problems. The first one is easily eliminated by simple linear processing but the second one requires more complex algorithms. Peak clipping is fundamentally a non linear distortion. It is characterized by the fact that several successive values of the signal disappear and are replaced by a constant. However, it happens that speech signal is highly predictable. So, in essence, peak clipped speech restoration is a problem of interpolation since we are trying to find missing values by using the properties of the signal itself. There exist several methods of interpolation: polynomial (Lagrange, Newton), spline, etc. In the case of peaked clipped speech, an appropriate method is statistical interpolation [1]. 1E+7 1E-i6 1E«5 1E+4 1F+3 Q00 10000 200.00 300.00 400.00 Figure 1: Mean magnitude and ZCR scatter plot [3] 2 Justification of the method When there is no a priori information on the signal, the classical numerical interpolation methods (polynomial and spline) should be used. Band limited interpolation [2] uses only the fact that the signal is band limited. Statistical interpolation based on linear prediction [2, 4] uses the fact that that speech signal is highly predictable. A speech segment is composed of a sequence of voiced, unvoiced and silence (noise) segments [2]. The type of speech signal that has the greatest probability for being peak clipped is voiced speech [2, 3]. Figure 1 represents a scatter plot of voiced, unvoiced and silence mean magnitude and zero crossing rate of segments of speech. Voiced speech segments are indicated by the letter "V", unvoiced segments by the letter "U" and the silent segments by the letter "S". It shows clearly that the voiced signals cluster at high mean magnitude values. The mean magnitude is defined as: œ Mn = Z l*(m)| w(n - m) (1) m=-œ where w(n) is a rectangular window of length 256 samples and the zero crossing rate (ZCR) is: +œ ZCR(n) = Z |sgn[*(m)] - sgn[.x(m -1)]|w(n - m) (2) m=-œ Fortunately, voiced speech happens to be quite predictable. Voiced speech follows quite closely the linear prediction equations [4, 5]. Commercial software like DC-6, from Diamond Cut products, use low order linear prediction for clipped audio signal restoration and the problem of audio signal interpolation have also been addressed by Vaseghi [1] who uses linear prediction from adjacent samples and samples one period away (audio signals are assumed to be periodic). 184 Informatica 32 (2008) 183-188 A. Dahimene et al. Voiced speech can be considered as a quasi periodic signal. It can be modelled as the output of a linear time invariant system (during few milliseconds, the system can safely be assumed to be time invariant) driven by a periodic train of impulses. In this case, a quite general formulation of the signal will be: p p xn = Z akx- n-k ■Z b 'kUn-k (3) k=1 k=0 where the signal uk is equal to 1 every T seconds and zero otherwise, T being the pitch period. ak and bk are respectively the recursive and the non recursive parameters of the above production filter of order p. So, within a pitch period (NT samples) and after p samples, we can write: f = Z k=1 (4) The above equation breaks down in the part of the speech signal that is clipped. So, if we start the time axis at the beginning of a pitch period and if we call NT the number of samples within the pitch period, we can write: = Z akX 'n-k p < n < nt k=1 (5) for kl