Acquisition of Hindi Contrasts by English Speakers: An Optimality Theoretic Account Ashima AGGARWAL University of Florida aaggarwal@ufl.edu Abstract This paper provides an optimality theoretic account of perception of Hindi voicing and aspiration contrasts by English monolinguals. The participants were presented with minimal pairs of stop consonants belonging to three places of articulation, namely, bilabial, alveolar and velar. The minimal pairs varied in (a) voice; (b) aspiration; (c) voice and aspiration. The methodology involved taking a discrimination test wherein the English speakers reported whether the minimal pairs they heard were same or different. The findings were then subjected to quantitative analysis. The results show that aspiration distinction is clearly perceived by English monolinguals but voicing contrast is neutralized in the same position. The study adds to our knowledge of existing phonological theories such as Best's perceptual Assimilation Model (2001) and p-maps (Steriade, 2001). Based on the phonetic results, an optimality theoretic framework is applied to describe the results. The framework involves the ranking of faithfulness and markedness constraints and presenting an initial stage grammar for the L2 English learner of Hindi. In the end, some predictions are made about the further acquisition of these non-native contrasts by L2 English learners. The study has useful implications for adult second language learners. Keywords voicing, aspiration, acquisition, optimality theory, voice onset time Izvleček V raziskavi je avtorica proučila sposobnost angleško govorečih monolongvistov, da v jeziku hindi pravilno zaznajo zvenečnost oz. aspiracijo. Sodelujočim je predstavila minimalne pare treh vrst zapornikov: dvoustničnih, dlesničnih in mehkonebnih. Besede v paru so se razlikovale po zvenečnosti (a), po aspiraciji (b), ali pa po zvenečnosti in aspiraciji (c). Metodologija je vključevala diskriminacijski test, kjer so sodelujoči ugotavljali enakost oz. neenakost besed iz minimalnega para. Rezultati kvantitativne analize so pokazali, da angleško govoreči monolingvisti dobro zaznavajo razliko v aspiraciji, problem jim predstavljajo pari z zvenečim oz. nezvenečim soglasnikov v istem položaju besede. Raziskava prispeva k poznavanju že obstoječih fonoloških teorij, kot so Asimilacijski model zaznavanja (Best, 2001) in p-porazdelitve (Steriade. 2001). Fonetični rezultati so interpretirani tudi v okviru optimalnostne teorije - rangirani so po zvestobi in po zaznamovalnostnih omejitvah -, in prestavljajo začetno stopnjo hindijske slovnice kot tujega jezika angleško-govorečih. Na koncu avtorica navaja svoja predvidevanja o naslednjih razvojnih stopnjah jezika hindi kot tujega jezika. Raziskava je tudi prispevek k znanju o učenju tujega jezika odraslih. Ključne besede zvenečnost, aspiracija, usvajanje tujega jezika, optimalnostna teorija, VOT Acta Lingüistica Asiatica, Vol. 1, No. 3, 2011. ISSN: 2232-3317 http://revije.ff.uni-lj.si/ala/ 1. Introduction Voice onset time (henceforth VOT), is a feature of the production of stop consonants. It is defined as the length of time that passes between when a stop consonant is released and when voicing, the vibration of the vocal folds, begins. Voicing contrast in stops has been discussed in phonetics and phonology for the past few decades. Beginning with Lisker and Abramson (1964), in their well-known cross-language study, voice onset time (VOT) has been widely used to differentiate stop categories across languages. VOT has come to be regarded as one of the best acoustic cues for discriminating three general stop categories, especially in word-initial position and based on the VOT different languages including Hindi and English use different categories (bilabial, alveolar or velar) to identify stops By analyzing VOTs in stop consonants, linguists have concluded that for most languages, VOT values get longer as the place of articulation moves backward (Lisker & Abramson, 1964). For this paper VOT will serve as the cue to measure the voicing of the Hindi stimuli whereas the results of the perception experiment will be analysed within the framework of optimality theory. OT has emerged as a very useful tool within the past few decades and has useful implications for language acquisition. Optimality theory (OT) proposes that the observed forms of language arise from the interaction between conflicting constraints. It assumes that Differences in grammars reflect different rankings of the universal constraint set. Language acquisition can be described as the process of adjusting the ranking of these constraints (Tesar & Smolensky, 1998) This study is intended as a contribution to the understanding of several well-known problems relating to the learning of phonetic contrasts in second language (L2) pronunciation. In particular this paper focuses on some of the effects that the influences of similarity and difference between native and target language sound systems might have on the learning of (L2) phonology. It also aims at filling the gap in the understanding of p-maps (Steriade, 2001) and establishing a hierarchy of difficulty of perceptibility with regards to voicing and aspiration in the word initial position. 2. Theoretical background The phenomenon of voicing and aspiration in Hindi has caught the attention of many phoneticians and phonologists for some time. There have been many studies on the voicing and aspiration in Hindi especially of VOT as an important cue to the place of articulation of initial stops. (Lisker & Abramson, 1964) Acoustically the two kinds of stops, voiced and voiceless, are in most cases easily distinguished by reference to their spectrographic patterns; for voiced stops the formantless segment corresponding to the closure interval is traversed by a small number of low-frequency harmonic components, while in the case of voiceless stops the closure interval is essentially blank. The following are VOT values of Hindi from Lisker and Abramson (1964). For the purpose of this paper, only the VOT values for bilabial, alveolar and velar have been quoted. Table 1: Hindi VOT values (Lisker and Abramson 1964) /b/ /bh/ /p/ /ph/ /d/ /dh/ Av. -85 -61 13 70 -87 -87 R. -120: -40 -105:0 0:25 60:80 -140:-60 -150: -60 N. 16 15 18 18 18 18 /t/ /th/ /g/ /gh/ /k/ /kh/ Av. 15 67 -63 -75 18 92 R. 5:25 35:100 -95:-30 -160:-40 10:35 75:100 N. 16 16 17 16 16 18 There has been numerous but valuable research on the Acquisition of learning the sounds of a second language some of which has been summarized below. Flege (1992a,b) hypothesized that the likelihood of phonetic category formation for L2 phonetic segments is influenced importantly by the age at which L2 learning commences. More specifically, he hypothesized that the range of L2 segments for which additional phonetic categories are established decreases through childhood, but that even adult learners of an L2 may establish phonetic categories for L2 segments that differ substantially from the nearest Ll segment. For the present study it will try to extend the findings to Hindi. For L2 sounds that are phonetically similar, a corresponding sound in the L1 yet differ acoustically from the L1 counterpart ("similar" L2 sounds), phonetic category formation may be blocked by the perceptual mechanism of equivalence classification. The hypothesized difference in how new and similar sounds are treated perceptually leads to the prediction that new but not similar sounds in an L2 may be mastered eventually by adult L2 learners. The prediction concerning similar L2 consonants has been confirmed in a number of previous studies (e.g., Flege, 1991). Following Brown (1998), who claims that if a learner's L1 grammar lacks the phonological feature that differentiates a particular non-native contrast, he or she will be unable to perceive the contrast and therefore unable to acquire the novel segmental representations; the present study offers an account of the acquisition of the Hindi voicing and aspiration by English speakers and seek if this is true of Hindi language. Another important study in the field of non-native perception study is by Best (2001). She proposed in her Perceptual assimilation model (PAM) that a given non-native phone may be perceptually assimilated to the native system of phonemes in one of the given ways: (1) Two-category assimilation (TC) - when two non-native phones are categorized as two different native phonemes. (2) Single category assimilation (SC) - when 2 non-native phones are categorized equally well as one native phoneme. (3) Category goodness (CG) - when 2 non-native phones are categorized as one native phoneme but one fits better than the other. (4) Uncategorized-categorized pair (UC) -when one non-native phone is categorized, and the other remains uncategorized. (5) Uncategorized-uncategorized pair (UU) - when both non-native phones are uncategorized. (6) Non-assimilable (NA) - when non-native phones are perceived as non speech sounds, different from any native phonemes. One goal of this study will be to see if and where the various non-native phones fit into the English speaker's categories. From a phonological perspective, analyzing language acquisition can give us useful insights into the learning process of the L2 learner. Hancin-Bhatt (2000) presented an Optimality Theoretic account of syllable codas in Thai ESL. Thai has a more restrictive set of constraints on what can occur syllable-finally than does English. Thai ESL learners thus need to resolve the conflict between what they know (their first language or L1) and what they are learning (their second language or L2 grammar). Optimality Theory provides the mechanisms to understand how this phonological conflict is resolved, and in what ways. The main findings of this study are that the L1 constraint rankings interact with L2 constraint rankings. Beginning with the L1 constraints ranked higher and then they eventually get demoted below L2 constraints. The study argues that constraint rerankings occur in an ordered fashion. Following from this study I will examine the ranking of constraints by speakers of English L1. Hancin-Bhatt and Bhatt (1997) also relate certain key issues in optimality theory to Major's ontogeny model (1987): the high level of transfer at the beginning of the learning process may be related to the use of constraint ranking of the learner's mother tongue in the new L2 situation; the eventual decrease of transfer may be seen as the result of reranking. The current study is thus aimed to be one of the many steps towards an optimality theoretic account of language acquisition. 3. The present study The present study of Hindi consonants is a preliminary study to capture the perception of word initial stop consonants by 10 monolingual English speakers. These English speakers have had no prior exposure to Hindi. To my knowledge there has been no study that looks at the acquisition of L2 voicing and aspiration from an optimality theoretic perspective. The former studies have concentrated on the measurement of VOT values of contrasting segments and what it indicates about the differences and similarities in L1 and L2 phonetic and/or phonological categories. Little to no attention has been given to these from the perspective of latest phonological theories. In my opinion, analyzing the learners' data with respect to OT will give us useful insights into the learning process of L2 learners. It should be able to capture a clearer picture of what constrains or allows the learner of a language to be able to learn contrasts of a new language system. Given this aim, the present study will try to establish a baseline of sound perception by native English speakers. The focus of this paper will then be to analyze how Hindi voicing and aspiration contrasts are perceived by the English group. 4. Methodology 4.1 Subjects All the 10 subjects were living in Gainesville, Florida at the time of testing; and were affiliated with the University of Florida. Subjects in the native English group spoke only American English. The age range of all participants was 18-24. None of them had any reported hearing deficit. All the subjects were compensated with course points for participating in the study. 4.2 Measurement Assignment of VOT values is done as follows. The voice onset time of a plosive is defined as the duration between the release of a plosive and the beginning of vocal cord vibration. Standardly, VOT can be positive, negative, or 0. 1. If the onset of voicing follows the release, measure the interval between the release of the plosive until the onset of voicing. This is positive VOT. 2. If the onset of voicing coincides (approximately) with the release, this is 0 VOT. There is nothing to measure. 3. If the onset of vocal cord vibration precedes the plosive release, then measure the voicing duration from the onset of voicing (or the onset of closure if there is voicing throughout). This is negative VOT. Note: on a spectrogram, in case of lag voicing, the release of a burst will be indicated by a dark striation followed by the consonant later. For prevoiced sounds you will see the voicing bar before the release burst for a short or zero lag the two will be very close (with release followed by voicing) or overlapping (at the same time). The onset of consonant was taken to be the first high amplitude peak in the spectrogram. 4.3 Stimuli The following tables 2, 3 and 4 present the stimuli that were presented to the native English group. For purposes of clarity, they have been presented below in three separate tables, one each for voicing, aspiration and voicing and aspiration. The VOT values of the initial consonants as produced by the native Hindi speaker have also been measured. The stimuli were recorded by a native speaker of Hindi who was 25 years at the time of recording. The recording was done on a recorder in a noise free room. 4.4 Procedure For the perception experiment the speech samples were recorded by the investigator in a quiet room using a recorder. The researcher is a native speaker of Hindi. The stimuli contained 38 Hindi minimal pairs (a total of 76 words, spoken in pairs) which varied for (1) voicing and (2) aspiration. All the minimal pairs contained stops in the initial position. Four minimal pairs were recorded for each place of articulation bilabial, alveolar and velar. To study the voicing contrast, two pairs were unaspirated (for e.g. p-b) and two pairs were aspirated (for e.g. ph-bh). To study the aspiration contrast, two pairs were voiceless (for e.g. p-ph) and two were kept voiced (for e.g. b-bh). The tokens were intermittently substituted with distractors, to avoid any possible cuing to the listener. However, the distractors were intentionally not made completely different from the tokens, so that they don't appear too different. They were still minimal pairs but contrasted for some feature other than voicing or aspiration. For e.g. [man] and [nan], [dal] and [bal]. So the resulting contrasts were pairs of: 1.a. voiceless aspirated (VlA) - voiced aspirated (VA) 1.b. voiced unaspirated (VU) - voiceless unaspirated (VlU) 2.a. voiced aspirated (VA)- voiced unaspirated (VU) 2.b. voiceless aspirated (VlA) - voiceless unaspirated (VlU) Finally a set of minimal pair which varied both in voicing and aspiration was also tested for perceptibility: 3.a. voiceless - voiced aspirated 3.b. voiced - voiceless aspirated The participants were told to take an AX test wherein they heard each minimal pair and had to determine whether the two words were same or different. They were given a sheet of paper with two columns numbered (1) to (38). One column said "same" and the other "different". The participants were asked to check mark either of the two choices depending upon what they heard. Figure 1: Mean VOT values of Hindi stop consonants Figure 1 shows the mean VOT values of Hindi stop consonants belonging to three place of articulation; bilabial, alveolar and velar. It shows both aspirated and unaspirated stops. Positive VOT indicates positive lag whereas negative VOT indicates prevoicing. It is evident form the figure that Hindi voiceless unaspirated stops have a much shorter lag as compared to voiceless aspirated stops. But in the case of voiced stops, the data in the figure indicates that unaspirated stops have slightly longer prevoicing than aspirated ones. Whether or not the difference between voiced-voiceless and aspirated-unaspirated is significant will be tested below. Table 2: VOT values of Hindi voicing contrasts used VOICING (in ms) unaspirated aspirated voiceless voiced voiceless voiced 0.016 -0.067 0.101 -0.094 0.002 -0.081 0.077 -0.081 0.044 -0.084 0.069 -0.077 0.043 -0.135 0.106 -0.109 0.031 -0.137 0.097 -0.092 0.014 -0.098 0.071 -0.096 0.079 -0.092 p-value=0.00066 p-value=0.00001 Thus, my analysis of the data in Table 2 shows that there is a significant difference between the VOT values of VU (voiced unaspirated) and VlU (voiceless unaspirated) stops (p<.05) and there is also a significant difference, greater than on the basis of chance, between the VOT values of VlA (voiceless aspirated) and VA (voiced aspirated) stops in Hindi. Table 3: VOT values of Hindi aspiration contrasts used ASPIRATION (in ms) voiceless voiced unaspirated aspirated unaspirated aspirated 0.036 0.051 -0.107 -0.076 0.027 0.086 -0.114 -0.081 0.013 0.069 -0.13 -0.123 0.029 0.1 -0.16 -0.125 0.026 0.112 -0.096 -0.084 -0.124 -0.15 p-value=0.00835 p-value=0.169097493 Table 3 shows that there is a significant difference between the VOT values of VlU and VlA stops (p<.05). However, the VOT values of VU and VA are not significantly different in Hindi. Table 4 includes a list of minimal pairs that contrast both in voicing and aspiration and their corresponding VOT values. Table 4: VOT values of Hindi voicing and aspiration contrasts used Minimal pairs VOT (in ms) dal -0.126 thal 0.085 kal 0.024 ghal -0.125 tal 0.025 dhal -0.11 pai 0.009 bhai -0.122 Minimal pairs VOT (in ms) kat 0.021 ghat -0.132 pher 0.085 ber -0.129 5. The results The data from the perception study has been presented below in Table 5, 6 and 7. Table 5: Perception of voicing contrast Voicing contrast Number of times perceived same (total=20) Number of times perceived different (total=20) p-b 15 5 t-d 15 5 k-g 19 1 ph-bh 8 12 th-dh 8 12 kh-gh 8 12 For the voicing contrast above, the number of times two unaspirated stops in the minimal pairs are heard same is significant p=.001, however the result for aspirated stops is inconclusive, we need more data. This indicates that voicing contrast is not perceived by non-native speakers (at least) in unaspirated initial stops. Table 6: Perception of aspiration contrast Aspiration contrast Number of times perceived same (total=20) Number of times perceived different (total=20) d-dh 3 17 b-bh 11 9 g-gh 4 16 Aspiration contrast Number of times perceived same (total=20) Number of times perceived different (total=20) p-ph 2 18 t-th 1 19 k-kh 1 19 For the aspiration contrast above, the number of times two voiceless (unaspirated and aspirated) stops in the minimal pairs are heard different is significant p=.00001 and the fact that voiced aspirated and voiced unaspirated stops are heard different is also significant. This indicates that aspiration can be perceived by non-native speakers irrespective of voicing. Table 7: Perception of voicing and aspiration contrast Voicing and aspiration contrast Number of times perceived same (total=10) Number of times perceived different (total=10) p-bh 0 10 ph-b 0 10 t-dh 2 8 th-d 0 10 k-gh 5 5 The results for voicing and aspiration contrast are highly significant p=.0004, which indicates that non-native speakers have no problem hearing the two contrast when presented together. 6. Analysis Since within OT every stage of acquisition has a grammar, which can be explained by means of some constraints and their ranking; the aim of this study would be to find the constraints that the native English speakers have and how they are ranked in their current stage of acquisition. I propose the following set of constraints to explain the initial stage of learning by monolingual English speakers: IDENT-IO (aspiration)/#_ - the specification for the feature [aspirated] of an input segment must be preserved in its output correspondent word initially. IDENT-IO (voice)/#_ - the specification for the feature [voice] of an input segment must be preserved in its output correspondent word initially. IDENT-IO (Asp)- the specification for the feature [aspirated] of an input segment must be preserved in its output correspondent. * [VOICE]/#_ - no voiced consonants word initially. *VOICED OBS- obstruents should not be voiced (context free markedness constraint). *ASPIRATED OBS- obstruents should not be aspirated (context free markedness constraint). Based on the results what we see then is that voiced-voiceless distinction is neutralized word initially except when the initial stop is aspirated. So we need a constraint hierarchy that neutralizes voicing distinction word-initially but preserves aspiration distinction in the same context. The following tableaux show the ranking of the faithfulness and markedness constraints to produce the initial stage of grammar the English monolingual speakers are at: Tableaux 1.a: Voiceless stop stays voiceless word initially /pal/ IDENT-IO (asp)/#_ IDENT-IO (asp) *ASP OBS *VOICED OBS *[voice] IDENT-IO (voice)/#_ pal bal *! * *! phal * * *! bhal * * *! * * * Tableaux 1.b: Voiced stop neutralizes to voiceless stop word initially /bal/ IDENT-IO (asp)/#_ IDENT-IO (asp) *ASP OBS *VOICED OBS *[voice] IDENT-IO (voice)/#_ ^ pal * *! bal *! * phal * * *! bhal * * *! * * * If the English speakers perceive /p/ and /b/ as [p] then in their ranking it is essential to have *VOICED OBS above IDENT-IO(voice)/#_. It means that context free markedness will be above faithfulness in order to neutralize the voicing contrast in the word initial context. On the other hand aspiration in the word initial position is always perceived different from unaspirated stops. That is the English speakers do not have any difficulty in hearing the aspiration contrast word initially. Tableaux 2: Ranking based on tableaux 1a. and 1b. /Chal/ IDENT-IO (asp)/#_ IDENT-IO (asp) *ASP OBS *VOICED OBS *[voice] IDENT-IO (voice)/#_ pal *! * *(if /bhal/) bal *! * *! * *(if/phal/) chal * It indicates that aspiration contrast is better perceived than voicing contrast in word initial position. 7. Discussion Following Best's model (and knowing that English /b/ is voiceless) we can conclude that the Hindi VlUand VU(for e.g. /p/-/b/) are placed in the same category by English speakers. This would be a case of Single category assimilation. Since the VlA and VlU are significantly heard different this makes for a case for two category assimilation. Next, the fact that /ph/-/bh/ are perceived to be same or different almost equal number of times, indicates that it is a matter of category goodness, /ph/ might be a "good" exemplar of the category and bh might be "not so good". Last, the distinction between /b/-/bh/ can also be characterized as category goodness, since the difference between the two is not very well perceived by the English speakers, although a firm generalization would require more data for /b/-/bh/. Attention must also be paid to the fact that the difference in the VOT of stimuli /b/-/bh/ was much lesser than that of /d/-/dh/ or /g/-/gh/. Considering the fact that English /b/ is actually voiceless, or in other words it is [p] we can also say that there exists a relationship of CG between [p]-/bh/. A diagram would best capture this relationship between the different categories. CG SC Figure 2: Analysis using PAM model Another contribution of this study can be to add to the study of p-maps, a recent addition to correspondence theory. "P-map is a mental representation of the degree of distinctiveness of different contrasts in various positions. It is a set of statements with different degrees of generality about absolute confusability from which relational statements can be deduced." (Steriade, 2001). The P-map's broadest claim is that the range of systematic, cross-linguistically invariant differences goes beyond the expressive capabilities of current theories of correspondence. In addition, we need to show that perceived degree-of-similarity differences correlate with choices made in phonological systems between alternative options of modifying an input. In the present study for instance we see that [p] and [b] are judged as more similar than [p] and [ph]. It indicates some significant preference for [b] as against [ph], since substituting [b] for [p] is a less significant departure from the input than substituting [ph]. The finding is well supported by the results of the present study, wherein, for English speakers, voicing contrast is significantly more confusable relative to aspiration contrast. The idea that some features contribute more to dissimilarity than others has been investigated by phoneticians and psycholinguists for some time. This study I hope successfully fills the void in the understanding that [+aspiration] feature plays a major role in generating dissimilarity judgments, in contrast to voicing. It will enable us to make statements about relative confusability such as: The contrast t/d word initially gives rise to more instances of misidentification than the contrast t/th in the same context. 8. Predictions Unlike Brown (1998), who suggested the inability of a learner to acquire a non-native phonological feature, I believe that the learner will be able to learn the L2 contrast. This is based on the fact that although p-ph, t-th, k-kh etc. are not phonemically present in the phonological system of the participants of this study they were still able to perceive them as distinct sounds. That is although aspiration being phonemic in Hindi and not in English can still be perceived by English speakers, it is possible that with enough training voicing contrast can be heard too. However, we need to keep in mind, p-maps and their implications on learning: more confusable features might be harder to learn than less confusable ones. A target like perception (and production) will then be exhibited by demoting context free markedness (*VOICED OBS) and contextual markedness (*[voice]/#_) below faithfulness (IDENT-IO (voice)/#_ to get rid of word initial voicing neutralization: Tableaux 3: Target Hindi grammar /bal/ IDENT-IO(asp)/#_ IDENT-IO (asp) *ASP OBS IDENT-IO (voice)/#_ *VOICED OBS *[voice] /#_ pal *! bal * * phal * * *! bhal * * *! * * * References Best, C., McRoberts,G., & Goodell, E. (2001). Discrimination of non-native consonant contrasts varying in perceptual assimilation to the listener's native phonological system. Journal of the Acoustical Society of America, 109 (2), 775-794. Brown, C. (1998). The role of the L1 grammar in the L2 acquisition of segmental structure. Second Language Research 14, 136-193. Flege, J. (1991). Age of learning affects the authenticity of voice onset time (VOT) in stop consonants produced in a second language. Journal of the Acoustical Society of America, 89, 395-411. Flege, J. (1992a) Speech learning in a second language. In C. Ferguson, L. Menn, & C. Stoel-Gammon (Eds.), Phonological Development: Models, Research, and Application (pp. 565-604). Timonium, MD: York Press. Flege, J. (1992b). The Intelligibility of English vowel spoken by British and Dutch talkers. In R.Kent (Ed.), Intelligibility in Speech Disorders: Theory Measurement and Management (pp. 157-232). Amsterdam: Benjamins. Hancin-Bhatt, B. & R. Bhatt. (1997). Optimal L2 syllables. Studies in Second Language Acquisition 19: 331-378. Hancin-Bhatt, B. (2000). Optimality in second language phonology: Codas in Thai ESL. Second Language Research, pp. 201-232. Lisker, L., & Abramson, A. S. (1964). Across-Language Study of Voicing in Initial Stops: Acous- tical Measurements. Word. Vol. 20 , pp. 384-422. Steriade, D. (2001). The phonology of perceptibility effect: The p-map and its consequences for constraint organization. In K. Hanson & S. Inkelas (Eds.), The nature of the word (p. 151179). Cambridge: MIT Press. Tesar, B. & Smolensky, P. (1998). Learnability in Optimality Theory. Linguistic Inquiry 29:229-268.