Logo_FF_ ENG_BLACK
Acta Linguistica Asiatica 
Volume 11, Issue 1, 2021 
 
 
 
 
 
 
 
 
ACTA LINGUISTICA ASIATICA Volume 11, Issue 1, 2021 
 
Editors: Andrej Bekeš, Nina Golob, Mateja Petrovčič  
Editorial Board: Alexander Alexiev (Bulgaria), Bi Yanli (China), Cao Hongquan (China), Luka Culiberg (Slovenia), Tamara Ditrich (Slovenia), Kristina Hmeljak Sangawa (Slovenia), Hsin Shih-chang (Taiwan), Ichimiya Yufuko (Japan), Terry Andrew Joyce (Japan), Jens Karlsson (Sweden), Lee Yong (Korea), Lin Ming-chang (Taiwan), Arun Prakash Mishra (Slovenia), Nagisa Moritoki Škof (Slovenia), Nishina Kikuko (Japan), Sawada Hiroko (Japan), Chikako Shigemori Bučar (Slovenia), Irena Srdanović (Croatia). 
© University of Ljubljana, Faculty of Arts, 2021 All rights reserved. 
Published by: Znanstvena založba Filozofske fakultete Univerze v Ljubljani  (Ljubljana University Press, Faculty of Arts) 
Issued by: Department of Asian Studies 
For the publisher: Dr. Roman Kuhar, Dean of the Faculty of Arts 
The journal is licensed under a  Creative Commons Attribution-ShareAlike 4.0 International License. 
Journal's web page: http://revije.ff.uni-lj.si/ala/ The journal is published in the scope of Open Journal Systems 
ISSN: 2232-3317 
Abstracting and Indexing Services: Scopus, COBISS, dLib, Directory of Open Access Journals, MLA International Bibliography,  Open J-Gate, Google Scholar and ERIH PLUS. 
Publication is free of charge.  
Address: University of Ljubljana, Faculty of Arts Department of Asian Studies Aškerčeva 2, SI-1000 Ljubljana, Slovenia 
E-mail: nina.golob@ff.uni-lj.si 
 
 
TABLE OF CONTENTS 
 
 
Foreword ............................................................................................................... 5 
 
RESEARCH ARTICLES 
The Polysemy of ‘Futsugo (Common Language)’ and the Modern Japanese Nation: the universalization of a ‘standard language’ to correct dialects?  
Saki AMANO .................................................................................................................... 9 
From native-speaker likeness to self-representation in language: Views from the acquisition of Japanese transitive and intransitive verbs 
ITO Hideaki .................................................................................................................... 25 
Corpus Analysis of the Collocations of the Transitive Verbs owaru and oeru 
Nastja PAHOR ................................................................................................................ 37 
Contact-induced Variation in Tetun Dili Phonology 
Andrei A. AVRAM .......................................................................................................... 75 
Marked Geminates as Evidence of Sonorants in Sylheti Bangla: An Optimality Account 
Arpita GOSWAMI ........................................................................................................... 99 
Stop Voicing and F0 Perturbation in Pahari 
Nazia RASHID, Abdul Qadir KHAN, Ayesha SOHAIL, Bilal Ahmed ABBASI ................... 113 
Word Stress system of the Saraiki language 
Firdos ATTA ................................................................................................................. 129 
 

 
 
 
FOREWORD 
 
 
The winter issue of Volume 11 presents a selection of seven different research articles on Japanese, Tetun Dili, Sylheti Bangla, Pahari, and Saraiki language. The rise of the Covid-19 pandemic, of which continuation unfortunately still allows many to collect data for research, has prompted us to publish several other interesting studies. This compilation brings to the readers the following topics. 
This issue opens with Saki AMANO’s paper “Polysemy of ‘Common Language’ and the Modern Japanese Nation: The Universalization of a ‘Standard Language’ to correct ‘Dialects’?”. The author examines the term futsugo (common language) over two periods and explains the shift from the populace’s everyday commonplace language to a unified national language. 
In the next paper “From Native-speaker Likeness to Self-representation in Language: Views from the Acquisition of Japanese Transitive and Intransitive Verbs”, ITO Hideaki considers the degree to which a language user’s own will is recognized in language education. The author demonstrates that the usage-centric acquisition process can create opportunities for language users to make expressive choices focused on what they wish to say. 
The third article is Nastja PAHOR’s paper “Corpus analysis of the collocations of the transitive verbs owaru and oeru”, in which the author approaches the transitivity of Japanese verbs from the corpus perspective. Semantical analysis of collocations in combination with the morphological analysis of co-occurring verbs reveals some interesting findings. 
After the first three papers that focus on Japanese, the fourth one brings some new insights into Tetun Dili. Andrei A. AVRAM in his paper “Contact-induced variation in Tetun Dili phonology” analyzes Portuguese influence on Tetun Dili phonology, and demonstrates that the intricacies of inter-speaker variation cannot be merely reduced to variation between more Portuguese-like phonology and a more Tetun-Dili-like one.   
Arpita GOSWAMI’s paper “Marked Geminates as Evidence of Sonorants in Sylheti Bangla: An Optimality Account” analyzes the universal concept that sonorants are marked geminates in the gemination process of Sylheti Bangla, and proposes a hierarchy of the constraints for analyzing the gemination processes in SHB. Besides, the author illustrates some additional constraints found to be necessary. 
The following article “Stop Voicing and F0 Perturbation in Pahari” presents the findings of Nazia RASHID, Abdul Qadir KHAN, Ayesha SOHAIL, and Bilal Ahmed 
ABBASI. The authors investigate the perturbation effect of the voicing of initial stops on the fundamental frequency of the following vowels in Pahari.    
Last but not least, “Word Stress system of the Saraiki language” is an article by Firdos ATTA, who presents an Optimality-Theoretic analysis of Saraiki word stress. The author concludes that Saraiki has a trochaic stress system and falls in the category of quantity-sensitive languages. This paper also indicates further research work on word stress at the sentence level. 
 
Editors and Editorial board wish the regular and new readers of the ALA journal a pleasant read full of inspiration, and a rise of new research ideas inspired by these papers. 
 
 
 Editors 
 
 
RESEARCH ARTICLES 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
THE POLYSEMY OF ‘FUTSUGO (COMMON LANGUAGE)’ AND THE MODERN JAPANESE 
NATION: THE UNIVERSALIZATION OF A ‘STANDARD LANGUAGE’ TO CORRECT DIALECTS?  
Saki AMANO  
Aoyama Gakuin University, Japan 
saki10093141@gmail.com 
Abstract 
In this paper, the term futsugo (common language) was viewed over two periods. The first period (1880s-1894) was concerned with education but aimed to establish everyday, commonplace language and script that was familiar to the populace. However, by the 1890s, the policy of Europeanization was being reconsidered, and national consciousness was on the rise. The second period (1894-early 1900s), with the start of the Sino-Japanese War, saw an increase in the national consciousness in strengthening both literary and military arts, with a desire for the establishment of an artificially unified language with artificial rules that would unify the populace and the nation. The natural shift from the populace’s everyday commonplace language to a unified national language became possible through the linguistic logic, or mediation of terminology, seen in the single (but ambiguous) word futsugo.  
Keywords: common words; common language; standard language; national language; spoken and written language 
Povzetek 
V tem prispevku je izraz futsugo (skupni jezik) obravnavan v dveh obdobjih. Prvo obdobje (pribl. 1880-1894) se je ukvarjalo z izobraževanjem, vendar je bilo namenjeno vzpostavitvi vsakdanjega, običajnega jezika in pisave, ki so bili ljudem poznani. Toda v devetdesetih letih 19. stoletja je bila politika evropeizacije ponovno preučena in nacionalna zavest je bila v vzponu. V drugem obdobju (1894 - začetek 19. stoletja) se je z pričetkom kitajsko-japonske vojne povečala nacionalna zavest pri krepitvi tako literarne kot vojaške umetnosti in posledično želja po vzpostavitvi umetno poenotenega jezika s pravili, ki naj bi poenotili prebivalstvo in narod. Naravni prehod iz vsakdanjega, običajnega jezika prebivalstva v poenoten nacionalni jezik je postal mogoč z jezikovno logiko ali posredovanjem terminologije, ki je vidna v eni (vendar dvoumni) besedi futsugo. 
Ključne besede: skupne besede; skupni jezik; standardni jezik; narodni jezik; govorjeni in pisni jezik 
1 Introduction 
During the Meiji era, especially after the consecutive enactment of the Elementary School Order Revision, the establishment of the ‘National Language Course’ (kokugo-ka), and the establishment of the Ministry of Education National Language Research Committee in 1900, there were calls for the establishment of a national language for national unification, and it was decided that dialects should be corrected to a ‘standard language’ (hyojungo) based on the dialect of ToŻkyoŻ as the capital city. National language education adopted the same policy.1 
1 This series of movements was an attempt to clarify the unity of the nation by artificially polishing the ‘Japanese language’, which included several registers (described below), as a form of state-led nationalism. The use of the word ‘nation/al’ as used in this paper allows for the shifting significance of the concept of ‘nation’ towards a more state-led nationalism. 
2 One of the few considerations of the relationship among various examples is seen in Sato (1991). 
Previous studies (Ogasawara & Funaki, 2001; Murakami, 2005) about this series of events have discussed the issue of ‘general education’ through an ideal, unified futsugo (usually referring to ‘common language’) related to national language establishment and education; indeed, futsugo was characterized by being a ‘created norm’ of the modern state. On the other hand, however, as is often the case with Meiji era terminology, a single term is often polysemous, used in multiple fields and several ages, causing conceptual confusion in our research in later periods (Amano, 2019). This is also the case for the term futsugo, which is the subject of this study. In actual research, each field independently uses the term as if it was self-evident, with little discussion about the uses of the term in other fields, much less the mutual relationship among them.2 
Just because the single term futsugo is used it does not mean that it is necessary to limit conceptual discussion to a single meaning. Rather, it is because the term appears to be the same at first glance that its conceptual conflicts in multiple contexts need to be examined carefully, making it possible to clarify the linguistic logic of the ‘created norms’ that makes them appear identical. I believe that it was a strategy to follow the process of having an everyday, commonplace ‘common language’ for widespread education change to focus on a simplified and unified national education system – in other words, a strategy to unify a ‘national language’ (kokugo) through the mediation of the term ‘common language’ (futsugo). 
In this paper, I will examine the term futsugo from when it is thought to have first appeared in the 1880s, to the use of futsugo to correct ‘dialects’, and finally to prominent calls for the establishment of a national language at the turn of the century. The Sino-Japanese war, which broke out in 1894, was accompanied by a pronounced rise in the national consciousness. This period marked a major turning point in the use 
of the term futsugo to express the desire for the establishment of an artificially unified language with artificial rules that would unify the populace and the nation. 
2 An overview of national language reforms in the Meiji era: The issue of futsu 
2.1 The beginning of the national language reforms 
First, the Japanese language during the Meiji era, whether the spoken or written language, had various registers (sometimes called ‘phases’, iso) depending on the environment of their use. 
For example, the spoken language differed by region, such as the ToŻkyoŻ dialect and the Okinawa dialect. Depending on social class, people used the formal, refined language or the rough language of the downtown craftspeople. In a literary context, a frog was a kawazu, and in everyday speech it was kaeru. For the written language, there were many styles, such as a style like translated literary Chinese and an epistolary style. As for scripts, there were multiple methods of writing, such as Chinese logographs (kanji), Japanese phonographs (kana), and mixtures of the two.  
Within these various registers, the questions of which linguistic system to use in which situation, or which script system, or which writing style – all of these questions are broadly referred to as the ‘national language reforms’ (in Japanese, kokugo kokuji mondai, literally ‘national language and script problem’ is used). The aspects of this issue can be summarized as follows. 
1. Aspect on speech (e.g., region, class) 

2. Aspect on writing (e.g., literary Chinese translation style, epistolary style) 

3. Aspect on script (e.g., kanji, kana, Latin alphabet) 


Next, we consider the perspective of Masao Hirai’s research on the national language reforms (Hirai, 1949) which represents the major trend. In the 1870s of the early Meiji era, Japan came into contact with western civilization, nominally dismantled its feudal class system, and began to pursue a democratic way of life. When the national language reforms are considered in this light, their goal was the democratization of education, or the standardization of an educational curriculum that used everyday, commonplace language aimed at the widespread populace. Before the Meiji Restoration (1868), education was administered by the domains (han), and its opportunity was mostly limited to studying Chinese learning (kangaku) as the educational prerogative of the ruling warrior class.  
Hisoka Maejima, a Meiji government official who is also known for founding the postal system, made a petition (Maejima, 1899) to the shogun Yoshinobu Tokugawa as early as in December 1866 to abandon Chinese characters, and also argued his vision 
for the future of education. His petition (1899) called for ‘general education’ (futsu kyoiku), explaining that: 
The national essence is the education of the people. That education should not depend on class but should be spread throughout the people. In order to spread education, it is necessary to use a script and writing style that are as simple as possible (Maejima, 1899, p. 6).3 
3 This and subsequent sources were translated by the author and Loren Waller (Yale University). 
In other words, regardless of social class, education should be aimed at the general populace, and script and writing should be simplified as much as possible. Here, the meaning of the word futsu may mean ‘widespread’ or ‘universal’. 
 
2.2 The development of the national language reforms 
What exactly is everyday, commonplace language? Though we might say ‘everyday, commonplace’, there are many levels of language with different standpoints and trends.  
The dawn of the national language reforms was the 1870s, when a debate over the script (third aspect) was being held. Maejima drafted his petition to abandon Chinese characters, and instead adopt the simpler phonographic kana script.  
The second aspect, the actual practice of writing, was also the subject of discussion. Before the Meiji Restoration, the mainstream of education was Chinese-derived words written in the kanji logographic script from China, and when words originating from Japanese were used, they tended to be archaic words from the Heian period (794-1185). In other words, it was not just a question of whether the script should be modernized, but whether the written language should be reformed to correspond to the modern spoken language. 
In this context, Maejima (1899) wrote:  
When establishing national writing and literature, one should not reconstruct the classical language with words such as haberu (archaic polite ‘to be’) and kerukana (archaic exclamation), but should use today’s common language, such as tsukamatsuru (humble ‘to do’) and gozaru (polite ‘to be’), and these should be fixed as the standard (Maejima, 1899, p. 15).  
The important words in this quotation are seen contrasting the ‘classical language’ (kobun) and the ‘today’s common language’ (konnichi futsu no gengo). ‘Classical language’ refers to the Heian period language that was extant in written form, and ‘today’s common language’ existed in verbal form, as the spoken language. In other words, Maejima aimed to create a written style of writing words as they were used in 
everyday speech rather than the traditional, elegant style that had become fixed in written documents.4 This writing style that corresponded to the spoken language was known as the genbun itchi (unified speech and writing) style. 
4 However, classical language is not limited to elegant words, and modern spoken language is not limited to colloquial words. For example, classical literature includes both waka and haikai. The former was elegant court poetry and the latter more lighthearted, colloquial, popular poetry. The medium of the source and the determination of elegant or colloquial is not fixed. 
In the wake of these debates of written style, the first aspect mentioned above came to be debated concurrently by the 1880s, the period of development examined in this paper. Since the spoken language is organic in nature, it produces a variety of different linguistic usages and registers. However, for the spoken language to be established as a standard, it was necessary that certain standards of the spoken language itself first be established.  
In particular, the aim was to unify the language at the local and class level. This will be discussed in more detail in the following sections since it relates to the discussion of ‘common language’ as a term. Here, I will simply state the overall trend that the language of the middle class of ToŻkyoŻ, as the capital, would become the standard. 
The concrete ways in which the everyday, commonplace language was used to democratize education, becoming the mainstream course of the national language reforms, can be summarized as follows. 
1. Speech (ToŻkyoŻ, middle class) 

2. Writing (genbun itchi, unified speech and writing) 

3. Script (mixed kanji and kana) 


3 The concept of futsugo (1880s-1894): everyday, commonplace language 
As shown in the previous section, the national language reforms and the word futsu as used to democratize education evoked discussions at different levels, resulting in many standpoints and trends. The term futsugo which is the focus of this paper underwent a similar phenomenon in that it too became polysemous, being used in several different fields and over several different periods. 
However, since this is used as an academic term, it should be defined as a clear concept, so this phenomenon of polysemy requires careful attention. In this section, I will first look at the main examples of references to futsugo up through the first period from the 1880s to 1894. I will argue that there was no normative consciousness of national unification when the term futsugo was used in this first period. 
 
3.1 Futsugo in dictionary compilation projects 
3.1.1 Futsugo as the Japanese language as a whole 
Just as in the West, dictionary compilation projects were a way of honoring the nation’s culture also in Japan. Japanese dictionaries5 are books that collect words from just the Japanese language; in other words, they are fixed cultural heritages that put Japan’s linguistic culture into a visible form. 
5 Before 1900, Japanese dictionaries were called ‘Japanese’ dictionaries; kokugo (‘national language’) was not used for dictionaries as it is today. For example, in 1915, Dainihon kokugo jiten (Ueda & Matsui, 1915) appeared. This shift to using kokugo for dictionaries was probably due to the national establishment of Kokugo as an academic subject. 
In the previous studies listed at the beginning of this paper, futsugo is seen as a translation of ‘common language’. However, the earliest instance of the word futsugo is thought to be taken from the lecture by a Japanese linguist Kazutoshi Ueda entitled ‘On Compiling the Great Japanese Dictionary’ (Nihon daijisho hensan ni tsukite, Ueda 1889), but an examination of the original materials for the lecture reveals that the original phrase was ‘common words’. Still, Ueda’s use of futsugo greatly transcended the scope expressed by the original, stemming from the differences in the relationship between the spoken and written language in Japanese and English. Indeed, even the essential word futsu took on a new meaning. 
This specific example deserves special attention in that it is the first known instance. To explain the varieties of go (words), Ueda (1895b) gives the following explanation (Figure 1): “the varieties of words as defined by Murray, the president of London Philological Society" (Ueda, 1895b, p. 306). Ueda does not provide details of his source but based on other English quotations used and on the dates of publication, it is clear that this is from the first edition of the Oxford English Dictionary (hereafter OED), published with James A. H. Murray (1884) as Editor in Chief. The following Figure 2 is adapted from the first fascicle’s ‘General Explanation’, in the first section ‘The Vocabulary’ (Murray, 1884, p. vii). 
 
 

Figure 1: Varieties of words (Ueda, 1889) 
 
 

Figure 2: Varieties of words, OED, 1st ed. (1884) 
 
Each figure shows the language in question in the center with its derived registers branching outward. The following two quotations are from the text of the OED describing the relationship between these types of vocabulary. I have changed the key terms to underlined. 
So the English vocabulary contains a nucleus or central mass of many thousand words whose ‘Anglicity' is unquestioned; some of them only literary, some of them only colloquial, the great majority at once literary and colloquial.they are the Common Words of the language. 
The center is occupied by the ‘common' words, in which literary and colloquial usage meet. ‘Scientific' and ‘foreign' words enter the common language mainly through literature; ‘slang' words ascend through colloquial use; the ‘technical' terms of crafts and processes, and the ‘dialect' words, blend with the common language both in speech and literature. (Murray, 1884, p. vii) 
The first quotation defines the term ‘common words’ and the second is a detailed description of the relationship between ‘common words’ and each register. According to this, the English lexical system has a central part where written and spoken usage coincides, represented on the diagram as a circle with common in the middle surrounded by the words literary and colloquial. Vocabulary in actual use stems from one of the qualities and usages of these two central terms and develops into registers. The arrows stemming from the center represent this relationship. However, ‘common words’ particularly refers to the central part that is both written and oral, that is the very core words of the English language. 
Here, it is important to recognize that literary and colloquial usage in the contemporary Japanese language that Ueda described did not correspond. As mentioned above, written Japanese used Chinese-derived words or Japanese words from the Heian period, making the systems of written and spoken Japanese completely different. 
We next consider how Ueda’s diagram corresponds to the OED diagram. Ueda’s diagram (Figure 1) has ... futsugo (common) in the center, with six surrounding branches. Reading from the left, the top has .... kagakutekigo (scientific words), .... bunshotekigo (literary), .... gairaitekigo (foreign words), and the bottom has .... gijutsutekigo (technical words), .... tsuzokutekigo (colloquial) with .... higetekigo (slang words) below it, and .... hogentekigo (dialectal words). Ueda’s diagram differs from the OED diagram (Figure 2) of five branches in that he made ‘literary words’ a new branch and combined ‘colloquial words’ with the ‘slang words’ branch. 
This difference is a clear indication of the lack of unity between written and spoken words at that time, which was one of the issues in the national language reforms. As was discussed above, the gap between written (literary) language and spoken (colloquial) language was great during the early Meiji era. Therefore, it was necessary to separate these two as independent registers; indeed, we can understand that there were few words in the central area where the two areas overlapped. 
What, then, is the meaning of Ueda’s translation futsugo? I think of it not as a central area (a core group of words), but as a starting point on the diagram. In other words, it is not the overlapping area of several registers within a language, but that language itself. That is, Ueda understood ‘common words’ to be the whole of the Japanese language itself, with each branch representing words in its various registers. This can be surmised from the fact that the central elements are not described or shown in particular, and that the branches are not shown as arrows. Furthermore, considering that the purpose of Ueda’s dictionary compilation project was to honor his nation’s culture, as with the OED in the West, it is not hard to imagine that there was a sense of Japanese as a national language as opposed to other national languages. 
Therefore, we can conclude that the futsu in Ueda’s futsugo did not mean ‘everyday, commonplace words’, but the ‘general language’ comprising its various registers. Still, even considering Ueda’s background, he studied linguistics from B. H. Chamberlain, and also had experience studying abroad, so was versant in the field of western linguistics, and was a forerunner in organizing the theoretical foundations of modern Japanese linguistics. He, therefore, looked broadly at Japanese from a national perspective. Ueda’s broad vision of a dictionary as seen in his lecture notes spanned periods and vocabulary and aimed at establishing a linguistic-cultural heritage for all of Japan. 
 
3.1.2 Futsugo as ‘everyday words’ 
Next, we will look at the Japanese dictionary Genkai (‘Sea of Words’) by the Japanese linguist Fumihiko Otsuki (1889). Like Ueda, he was aiming at a large dictionary that would eventually contain words from many periods as a part of a dictionary compilation project. However, ‘The purpose of this compilation’ in the preface to Genkai describes the contemporary position of dictionary compilation regarding a particular group of words in a language as its ‘common words’, and aims to create a ‘common dictionary’ primarily comprising those words. Although Otsuki was using the same term futsugo in the same context of dictionary compilation as Ueda, he was using it to refer to a more limited group of words. 
Otsuki divides a language into its ‘common words’ (futsugo) and its ‘proper nouns and specialized words’ (koyu meisho mata wa senmongo), and as is apparent through this contrasting conceptual categorization, the term futsugo here refers to general nouns and everyday words in particular within the Japanese lexicon. For example, Otsuki includes unfamiliar western loanwords in the dictionary as ‘everyday words’ (nichijogo), which is synonymous with futsugo. A specific example is seen in the word mishin, which is a Japanese word deriving from the English sewing machine, though included in the dictionary with the kanji logographs .. (literally, ‘sewing machine’). Indeed, this is an appliance necessary to everyday life, and such words can also be called everyday-use words. 
Reconsidering Otsuki’s relationship to Ueda, Ueda referred to one language being made up of futsugo, but Otsuki divided one language into its everyday lexicon and its specialized lexicon, calling the former general-use words futsugo. In other words, futsu referred to everyday as opposed to a specialized register, and go referred to words. Even within the same context of dictionary compilation, the policy of lexical classification and the level of ‘common’ differed. We can say that Otsuki’s stance had become closer to the ‘everyday’ that was desired during the national language reforms. 
Otsuki had still made no clear distinction between spoken and written language, but BimyoŻ Yamada (1892), in Nihon Daijiten (‘Great Japanese Dictionary’) clarified this point, distinguishing ‘everyday-use words, and both spoken and written words’. 
Yamada was also a novelist and practiced the genbun itchi style of writing using everyday speech in his works. In this way, we can say that he was more sensitive to the relationship between the spoken and written language. 
 
3.2 Futsugo according to a Nativist Studies Scholar: Futsugo as ‘modern language’ 
Next, we shall consider the use of futsugo by the Meiji Period Nativist studies (Kokugaku) scholar Naozumi Ochiai (1889). Ochiai’s use of the term compares to Otsuki’s, but he emphasized the historical period in which a word was used. Based on the understanding that familiar modern words (e.g., ossharu) derive from phonetic changes in classical words (e.g., ohoseraru), he deepened the understanding of words by observing the process of how they traced back to classical words that were no longer familiar. He referred to modern words as futsugo in this context. 
Regarding this goal, Ochiai (1889) argued that “If one writes using futsugo, then it will be understood even to the most unlearned. Likewise, even the most unlearned will be able to write freely.” (Ochiai, 1889, p. 26). This is a dual structure of ‘today’s common language’ and ‘classical language’ that was the key to the establishment of the genbun itchi style for the democratization of education, as seen in Section 2.2 above.  
While the ‘classical language’ was the language of the Heian period extant in surviving written texts, ‘today’s common language’ was available only from modern oral sources, or in other words, everyday spoken language. Therefore, according to Ochiai, futsugo referred to the modern colloquial language as opposed to the classical language. 
Furthermore, if we reconsider the relationship between Ochiai and Otsuki, both are similar in that they were interested in everyday words. On the other hand, they focused on different aspects: Otsuki pursued the everydayness of futsugo, while Ochiai focused on not only the everydayness of futsugo, but also how it related to the issue of writing, as well as how everyday words were related to classical words. 
I believe the fact that Ochiai himself was a Nativist studies scholar is one of the reasons for this minor difference. Nativist studies, to put it simply, was an academic discipline that attempted to rigorously examine and understand ancient literary texts in order to understand the ancient language and its manifest spirituality. In other words, it was an academic discipline that always looked to the past. Ochiai seems to have taken advantage of the contemporary focus on futsugo to enhance the academic significance of his field of study. 
 
3.3 Trends in Japanese linguistics around 1890 
As mentioned above, the national language reform debate in the 1880s focused on establishing a unity of spoken and written language – genbun itchi – for the democratization of education, and there were calls for the unification of the spoken language to serve as the standard for written style. Around 1890, the consciousness for unification at the regional and class level was systematized through Japanese linguistics. This was conceptualized through ‘dialects’, linguistic systems that differed from region to region, and ‘standard language’ (hyojungo), based on the ToŻkyoŻ dialect. This was just at the time that the policy of Europeanization was being reconsidered, and national consciousness was on the rise. 
The first use of the term Standard Japanese, and its first contrast with dialects, was by the linguist YoshisaburoŻ Okakura (1890), in Nihongogaku ippan (‘A Study of Japanese Linguistics’). Okakura argued that “…  the separation of a language into parts is a great barrier to the spread of education, [and therefore] we must have dialects surrender to standard speech without delay.” (Okakura, 1890, p. 164). In other words, a clear standard – the ‘standard language’ – and its accompanying the genbun itchi style, were essential for contemporary society, particularly in the field of education. If teachers taught in different dialects depending on the region, it would be impossible to produce a stable educational outcome, especially for students. 
Thus, the national language reforms were tied to the reconsideration of the policy of Europeanization, as well as to Japanese linguistics and the issue of a grammar education. By this point, the issue of national language was no longer confined to the level of debate over everyday, commonplace language for the democratization of education. The issue had become the artificial strategy of national unification. 
4 The concept of futsugo (1894-early 1900s): The national language reforms and national consciousness 
4.1 The national language reforms and the Sino-Japanese War 
In the mid-1890s, there was a major turning point, the Sino-Japanese War, which began with the Donghak Peasant War in 1894, developing into a war between Japan and the Qing dynasty over the right to rule Korea. In 1895, Japan secured the Treaty of Shimonoseki, where Qing accepted Korea’s independence and ceded the Liaodong Peninsula. However, just after the treaty was signed, Japan was required to return the peninsula through the Tripartite Intervention. Japan thereafter adopted the slogan gashin shotan ...., literally ‘Sleeping on sticks and tasting gall’, to call for perseverance in developing Japan and improving its status in the world. As this was the age of imperialism, Japan was striving to catch up to western nations that were seeking 
to acquire colonies for export destinations, so was also beginning to deepen its sense of confrontation with those world powers.  
Such a nationalistic ideology ushered in a new phase to the issue of the national language reform. As is symbolized by the statement “History is not without examples of nations that have been defeated in literature, though they were victorious in war.” (Ueda, 1895a, p. 37), national language and writing received increasing attention, backed by an awareness of the importance of both the literary and the military arts. 
 
4.2 Futsugo according to pioneers of Japanese linguistics 
In the previous period, Japanese linguists who looked at the issue of the national language reforms had developed certain linguistic rules needed for education. In this period, they developed those further into an artificial language with artificial rules for the unification of the populace and the nation. One of the most representative Japanese linguists who advocated this need was the aforementioned Kazutoshi Ueda, who had promoted dictionary compilation projects and coined the word futsugo from Murray’s concept of common words but used in to refer to the shared Japanese language as a whole. 
 
4.2.1 Futsugo as ‘universal normative linguistic system’ 
In November 1894 lecture entitled “On the Study of the National Language” (Kokugo kenkyu ni tsukite), Ueda (1895a) first advocated for the establishment of a ‘standard language’ (hyojungo) based on the dialect of ToŻkyoŻ, the capital city of Japan, as the mainstream argument of Japanese linguistics in the previous period. And Ueda goes on to say that he made a “great resolution to create what should be called a ‘common language’ (futsugo) for the entire East, which everyone involved in the arts, politics, or industry of the East should know, from Koreans to Chinese, to Europeans, to Americans" (pp. 29-30). This parallelism between ‘standard language’ and ‘common language’ (futsugo) was not seen in discussions of Japanese linguistics in the previous period and is particularly noteworthy in this second period. 
As was shown in Section 3.1.1, Ueda in 1889 aimed to compile a large dictionary containing a large number of words from many different periods, using the word futsugo to refer to the national language of Japan as a whole, as opposed to other national languages. Granted, he was stressing the dictionary as a way to assert Japan’s civilization to increase its authority internationally, but in actuality, he was just collecting a diversity of words within the Japanese language itself. 
However, Ueda’s usage of the same term futsugo had by 1894 been colored by the times, and its meaning as a term had changed drastically. His argument was more applied and not of the nature of a detailed definition of words. Simply stated, Ueda had 
come to use the word futsugo to describe an established Japanese ‘standard language’ that could then become a ‘common language’ that all people related to East Asia should know regardless of their citizenship. By now, futsugo had come to refer to an artificial, unified language - a national language in Japan and the world. That it was unified also meant that it had a limited whole. Whether consciously or unconsciously, the meaning of the wholeness of the aspect of futsugo had completely changed for Ueda within five years. 
The quotation from Section 4.1 is in fact from this lecture. Based on the date and his expressions, it is clear that Ueda was alluding to the Sino-Japanese War. Attention had been turned to the unity of Japan as a nation both domestically and internationally by the nationalism in the wake of the Sino-Japanese War, and the term futsugo played (or was made to play) a role in that context. 
 
4.2.2 Futsugo as a ‘normative linguistic system’ 
In Ueda’s example, the relationship between a ‘standard language’ and a ‘common language’ was not fully discussed, so a clear definition was not formulated, though they both certainly referred to the Japanese national language. In other words, we can surmise that since domestically Japanese was used as a common language, it had thereby increased its value as a unified language, and as a result, it could become the Japanese national language, useful domestically and internationally at a high level. 
It was Ueda’s student, KoŻichi Hoshina who went on to clarify these two phrases as terms. Firstly, Hoshina (1901) presented the following definition: “futsugo... (Common language = Gemeinsprache)” (Hoshina, 1901, p. 48). Most likely, he was directly adopting Ueda’s 1889 use of the term and its expansion of the debate from the lexical level to that of the language as a whole. 
Secondly, he also defined ‘standard language’ (hyojungo) as an artificially polished ToŻkyoŻ dialect, but this was not so different from Okakura’s and Ueda’s examples. However, Hoshina went on to envision a further step. After the establishment of the ‘standard language’, when it eventually unified the national language and came to be used as the common, unified language throughout the country, then it would finally be called the futsugo. The relationship between ‘standard language’ and ‘common language’ should be clear: After a ‘standard language’ is established and becomes unified throughout the country, it can finally be called a ‘common language’. Both ‘standard language’ and ‘common language’ undergo a process of artificial unification that eventually leads to universalization. At that point, a Japanese national language is finally envisioned. 
Furthermore, given that they use the same translated terms, that they structure their arguments the same as parallelism between ‘standard language’ and ‘common language’ (futsugo), and that they were teacher and student, I believe that Ueda’s and 
Hoshina’s arguments are in agreement. The creation of a ‘standard language’ that was necessary for the establishment of the genbun itchi style of writing coincided with the heightened national consciousness, raising the Japanese national language – the futsugo, or common language – to a level where it could be used both domestically and abroad. It was not only an important presence in all of Japan, but also a prerequisite for competing with the rest of the world, and a norm that established its position among world languages. The theories of these teacher-student pioneers of national linguistics succeeded as terminology and as concepts due to their combined efforts. The term futsugo had traveled a far distance from being an everyday, commonplace language for the democratization of education in the first period. 
5 Conclusion 
The term futsugo referred to an ideal, unified language for national language establishment and education, and was characterized by the ‘created norms’ of the modern state. However, in the process, it was used in multiple fields over multiple periods, resulting in multiple conceptual meanings. 
In this paper, the term was viewed over two periods. The first period (1880s-1894) was concerned with education but aimed to establish everyday, commonplace language and script that was familiar to the populace. Specifically, lexicographers selected everyday-use words, and Nativist studies scholars selected modern colloquial language; indeed, futsugo corresponded with ‘common’ language. However, by the 1890s, the policy of Europeanization was being reconsidered, and national consciousness was on the rise. 
The second period (1894-early 1900s), with the start of the Sino-Japanese War, saw an increase in the national consciousness in strengthening both literary and military arts, with a desire for the establishment of an artificially unified language with artificial rules that would unify the populace and the nation. Examples of futsugo in this new context reemerged with Kazutoshi Ueda, who established the theoretical foundations of modern Japanese linguistics, and who had first used the term futsugo during the first period. The natural shift from the populace’s everyday commonplace language to a unified national language became possible through the linguistic logic, or mediation of terminology, seen in the single (but ambiguous) word futsugo.  
This study has examined examples of the word futsugo in the centralized nation. There is still room to research new perspectives from the side of the register of ‘dialects’ that were thought to have needed correction. This is because the discussion of ‘common language’ and ‘standard language’ has not sufficiently examined the perspective of the descriptions (self-identification) of those who were using the regional dialects. 
For example, it is typically thought that the contrasting consciousness between dialects and common/standard language began in regions where the dialect was furthest from ToŻkyoŻ, such as in Okinawa. On the contrary, when seen from the perspective of how those using ‘dialects’ perceived their language, it is now known that the regions of Okinawa and Kyushu were slower in considering their language ‘dialects’ than the regions in the mainland. In the future, I would like to expand the scope of my inquiry to include new terms and regions and elucidate a more bi-directional view of language between the center and periphery. 
Acknowledgments 
This paper is based on a manuscript that was to be presented at the International joint symposium “Embodiment in the Age of Imperialism” at the University of Ljubljana, Faculty of Arts, Department of Asian Studies, on May 14-16, 2020, which was cancelled due to Covid-19 pandemic. In preparing this paper, I have received valuable advice from Nagisa Moritoki Škof (University of Ljubljana), Yasuhiko Komatsu (Aoyama Gakuin University), and Loren Waller (Yale University). 
References 
Amano, S. (2019). Gengo geijutsu ni okeru onsei to moji: Jutsugo o chushin toshite (Voice and script in the linguistic arts: Focusing on technical terms) [Unpublished master’s thesis]. Aoyama Gakuin University (housed in the University Library in 2020). 
Fukui, K. (1907). Nihon bunposhi: Kyoiku narabini gakujutsujo yori mitaru. ToŻkyoŻ: Dai Nihon Tosho. 
Hirai, M. (1949). Kokugo kokuji mondai no rekishi. ToŻkyoŻ: ShoŻshinsha. 
Hirai, M., & Yasuda, T. (1998). Kokugo kokuji mondai no rekishi. ToŻkyoŻ: Sangensha. 
Hokama, S., Ono, S., & Marutani, Z. (1981). Okinawa no kotoba. (Nihongo no sekai) ToŻkyoŻ: ChuoŻkoŻron-sha. 
Hoshina, K. (1901). Futsu no gengo. In K. Hoshina. Kokugo kyojuho shishin. ToŻkyoŻ: HoŻeikan Shoten. 
Itani, Y. (2006). Okinawa no hogen fuda: Samayoeru Okinawa no kotoba o meguru ronko. Naha-shi: BoŻdaŻinku. 
KondoŻ, K. (2008). Hogenfuda: Kotoba to shintai. ToŻkyoŻ: Shakai HyoŻronsha. 
Konno, S. (2014). Jisho kara mita Nihongo no rekishi. ToŻkyoŻ: Chikuma ShoboŻ. 
Maejima, H. (1899). Kanji gohaishi no gi. In H. Maejima. Kokuji kokubun kairyo kengisho. ToŻkyoŻ: Shueisha. 
Murakami, R. (2005). ShoŻgakkoŻ kokugoka seiritsu to Okinawa chiiki futsugo gainen ni chumoku shite. Ryukyu Daigaku Kyoikugakubu kiyo, 67, 15-34. 
Murray, A. H. (Ed.). (1884). General explanations, The vocabulary. In The Oxford English dictionary (1st ed., pp. vii-viii). Retrieved from https://public.oed.com/history/oed-editions/#first-edition 
Ochiai, N. [Naozumi]. (1889). Futsugo ni tsukite. Koten kokyujo koen, 1(4), 24-40. 
Ochiai, N. [Naobumi]. (1890-1891a). Nihon no bunshoŻ (enzetsu hikki). In Kokugo Denshujo (Ed.), Kokugo kogiroku. ToŻkyoŻ: Kokugo Denshujo. 
Ochiai, N. [Naozumi]. (1890-1891b). Futsugo no kisoku. In Kokugo Denshujo (Ed.), Kokugo kogiroku. ToŻkyoŻ: Kokugo Denshujo.  
Ogasawara, T., & Funaki, T. (2001). Kindai kokugoka seiritsuki ni okeru futsu gainen no kentoŻ. Kobe Daigaku Hattatsu Kagakubu Kenkyu Kiyo, 9(1), 65-76.  
Okakura, Y. (1890). Nihongogaku ippan, Vol. 1. ToŻkyoŻ: Meiji Gikai. 
Otsuki, F. (1889). Honsho hensan no taii. In Otsuki, F., Genkai. Otsuki Fumihiko (self-published). 
Sato, H. (Ca. 1889-1891). Kokugo o kaigai e yushutsu subeki ron. Koten Kokyujo Koen, 6(58), 50-59. 
Sato, S. (1991). Futsu to futsugo to: konogoro omoishi koto jakkan. Kokugo Kokubungaku, (30), 1-17. 
Shibata, T., & Ono, S. (Eds.). (1977). Nihongo 11: Hogen. ToŻkyoŻ: Iwanami Shoten. 
Ueda, K. (1895a). Kokugo kenkyu ni tsukite. In K. Ueda. Kokugo no tame, Vol. 1. ToŻkyoŻ: FuzanboŻ. (2nd ed., 1903). 
Ueda, K. (1895b). Nihon daijisho hensan ni tsukite. In K. Ueda. Kokugo no tame, Vol. 1. ToŻkyoŻ: FuzanboŻ. (2nd ed., 1903). 
Ueda, K., & Matsui, K. (1915). Dainihon kokugo jiten. ToŻkyoŻ: FuzanboŻ: KinkoŻdoŻ Shoseki. 
Yamada, B. (1892). Shogen. In B. Yamada. Nihon daijisho. ToŻkyoŻ: Nihon Daijisho HakkoŻjo. 
 
 
FROM NATIVE-SPEAKER LIKENESS TO SELF-REPRESENTATION IN LANGUAGE: 
VIEWS FROM THE ACQUISITION OF JAPANESE TRANSITIVE AND INTRANSITIVE VERBS 
ITO Hideaki  
University of Tsukuba, Japan 
ito.hideaki.gb@u.tsukuba.ac.jp 
Abstract 
This study considers the degree to which a language user’s own will is recognized in language education. It also looks at the use of Japanese transitive and intransitive verbs to reexamine the differentiation between language use that is native-like, and language use that is representative of the learner’s self. The reexamination indicates that shifting previous approach to a more usage-centric acquisition process can create opportunities for language users to make expressive choices focused on what they wish to say. This shift may be accomplished by introducing backward design and critical pragmatics into teaching practices, thereby enabling the pursuit of self-representing language use, and prompting individuality in each learner without binding the learner solely to linguistic rules. 
Keywords: transitive and intransitive verbs; discriminative knowledge; pragmatic choice; cultural literacy; diversity 
Povzetek 
Raziskava obravnava stopnjo prepoznavanja lastne volje uporabnika jezika pri jezikovnem izobraževanju. Obenem proučuje uporabo japonskih prehodnih in neprehodnih glagolov in ugotavlja njune razlike v jezikovni rabi, predvsem razlike, ki nastanejo med ciljnim jezikom in jezikom učečega. Ugotovitve nakazujejo, da lahko sprememba k pristopu, ki je bolj osredotočen na pragmatično-usmerjeni učni proces, pripomore k izboljšani izraznosti lastne volje učečega. To spremembo je moč doseči z uvedbo t.i. retroaktivne metode (angl. backward design) in kritične pragmatike v učni proces, s čimer učečemu omogočimo samoevalvacijo uporabe jezika in spodbujamo njegovo individualnost, ne da bi ga vezali zgolj na jezikovna pravila. 
Ključne besede: prehodni in neprehodni glagoli, diskriminativno znanje; pragmatična izbira; kulturna opismenjenost; raznolikost  
1 Introduction 
The CEFR (Common European Framework of Reference for Languages: Learning, teaching, assessment), published by the Council of Europe in 2001, understands language users to be social agents. This idea recognizes each person as a member of society regardless of his or her varying levels of linguistic proficiency. This may seem obvious, but let us consider the degree to which language education recognizes a language user’s intention when learning a language. 
For instance, one grammatical feature highlighted as difficult for learners of Japanese to acquire is verbal transitivity1 (also termed ‘transitive-intransitive verbs’ below) (Kobayashi, 1996). It is common when teaching the Japanese language to instruct learners to express their actions in Japanese as naturally occurring phenomena in order to avoid actively emphasizing those actions as their own doing. To give a specific example with (1) and (2) below, the sentence in (1) would often be considered ‘correct’ in the sense that it resembles what a native speaker would use. 
1 Nakaishi (2005a) includes the verb combination hai-ru/ire-ru [enter/include] as comprising a ‘paired intransitive verb’ and a ‘paired transitive verb.’ The verbs in such pairs do not share a common root but are taught as transitive–intransitive verb pairs in elementary Japanese language textbooks. This paper addresses transitive–intransitive verb pairs in Japanese language education and therefore follows Nakaishi’s (2005a) definition and terminology. 
 
(1) 
 ........................
 

 
 fukyoŻ de taihen datta kedo, yatto shigoto ga mitsukatta yo 
 
 
 ‘It was tough with the recession, but work was found in the end.' 
 


 
(2) 
 .......................
 

 
 fukyoŻ de taihen datta kedo, yatto shigoto wo mitsuketa yo 
 
 
 ‘It was tough with the recession, but I found work in the end.' 
 


 
In fact, both of these sentences are grammatically correct; it cannot be said that one is more correct than the other, or that only one of the expressions should be learned. There may be trends in the differentiated use of these expressions coming from the Japanese culture or the Japanese language itself, but as mentioned at the beginning of this paper, it is the language user who should decide how he or she wishes to speak. Meanwhile, an utterance will be based on the language user’s own will only when that language user has judged what it is that he or she wishes to say. Societal diversity is ever more important at present, and in such times, it would seem problematic that, in Japanese language education, learners are being taught to use only particular turns of phrases based on existing usage trends. Learners of Japanese should therefore have the knowledge to be able to differentiate between and make proper 
use of ‘native-speaker likeness’ and ‘self-representation’ in their target language use. This paper reexamines what that knowledge comprises by looking at the example of transitive and intransitive verbs, usage of which can change according to differences in cognitive understanding. We will also consider why self-representation is vital in future education. 
2 Three Perspectives 
In examining what constitutes knowledge of the properly differentiated use of native-speaker likeness and self-representation in this paper, we will refer to three perspectives: flexible language use; native speaker diversity, and pragmatic choice. 
 
2.1 Flexible language use 
One common topic in language education in Japan is ‘native speaker worship’ or ‘native-speakerism’, i.e., where native-level proficiency is considered the ultimate goal, be it in teaching Japanese, English, or any other language. Yoshitomi (2014) considers English language education in Japan in terms of the sociolinguistic categories of an ‘inner circle’, ‘outer circle’, and ‘expanding circle’. The inner-circle includes nations and regions where English is societally standard as the main first language (L1), and the outer circle includes places such as India and Singapore where English has become distinct and societally standard as an official or semi-official language. The expanding circle features nations and regions such as Japan, China, Russia, and Spain where English is considered a foreign language. Yoshitomi asserts that language users within this category should not defer to norms defined by native speakers; linguistically varied English should instead be understood as equally valuable, and this unique variation should be used with confidence (Yoshitomi, 2014, p. 147). Yoshitomi thus both advocates for a departure from ‘native-speakerism’ and also asserts that, although native speakers are models for some aspects of language learning, it is unrealistic to treat such a model as a target in language education. Yoshitomi stresses the need to train users to convey what they wish to say even if they cannot think of the vocabulary or grammar that could most precisely express their intended meaning, be it by using the basic linguistic knowledge they do have or indeed through non-verbal means (Yoshitomi, 2014, pp. 148–149). In sum, regardless of deviation from native-speaker norms, there is still a need to foster flexible language use that incorporates verbal and non-verbal means available to the learner. 
 
2.2 Native speaker diversity 
We have addressed the need to nurture flexible language use in language learners; let us now consider the case of Japanese L1 speakers. Makihara (2013) investigated selection tendencies in the use of transitive-intransitive verbs by undertaking a questionnaire survey of eight Japanese L1 university students. The results revealed that the choice between a transitive verb or an intransitive verb differed depending on the speaker’s social relationship with his or her listener(s). Existing research has revealed diversity even in aspects of Japanese that had been taught to be used in a manner premised on tendencies of native speakers – saying, for instance, that a native speaker of Japanese would prefer a certain expression. Makihara (2016) also investigated selection tendencies in the use of transitive-intransitive verbs by conducting a questionnaire survey of 35 Japanese L1 university students who evaluated example utterances of transitive and intransitive verbs in simple and complex sentences. The results showed considerable variation in transitive-intransitive choices even amongst native speakers of Japanese. Nonetheless, Makihara (2016) notes that the existence of some variation does not necessarily mean that there is no basis for choices made by native speakers concerning their use of transitive-intransitive verbal forms, however, the following trends can still be observed: 
4. If the speaker is more clearly responsible, a transitive verb is more likely to be used. 

5. In complex sentences, the speaker is more likely to use a transitive verb sentence for an action that directly affects a target object, and for any resulting state. 


This research shows that, in many cases, trends ultimately form the basis of what is considered ‘correct’ in teaching the Japanese language. 
 
2.3 Pragmatic choice 
Pragmatic choice concerns cases where learners are aware of the pragmatic norms and linguistically capable of producing native-like forms but make deliberate choices not to use them on particular occasions (Ishihara & Cohen, 2014, p. 77). This means that, while a learner understands the pragmatic norms in the linguistic behavior of native speakers for particular expressions, he or she is not free from preconceptions about the world; rather, the learner is a social being with his or her cultural values, beliefs, and worldview, and accordingly, the question of how a person expresses him- or herself linguistically should be left to that person (Ishihara & Cohen, 2014). This certainly aligns with the CEFR interpretation of language users as social agents. Indisputably, there is an intention behind each linguistic expression and a reason for its existence, which must be known in the context of additional language instruction. However, the idea of pragmatic choice, namely that the choice of linguistic expression should be left to the 
learner, has not yet garnered much discussion in teaching the Japanese language. Rather, this sort of thinking has been considered quite idealistic. However, our current era calls for diversity, including in language, and if we want language education to place importance on diversity, we should aim to teach in ways that actively allow for pragmatic choice. 
3 Changing ideas about acquiring discriminate knowledge 
In the field of Japanese language education, transitive and intransitive verb acquisition has been considered difficult because differences in language use can emerge from differences in cognitive understanding. This section will address how to change ideas about transitive and intransitive verb acquisition after outlining previous findings from transitive-intransitive verb acquisition research, centering on flexible language use, native speaker diversity, and pragmatic choice, as described in Section 2. 
 
3.1 Research in Japanese language education on the acquisition of transitive and intransitive verbs 
Kobayashi & Naoi (1996), Morita (2004), Nakaishi (2005a, b), and ItoŻ (2012) represent the core research on transitive and intransitive verb acquisition. 
Kobayashi and Naoi (1996) showed how the misuse of several tasks, such as morphological judgments of lexical forms and discourse completion, made it difficult for learners in Mexican university to acquire intransitive verbs, particularly those that express resultative states such as kie-te-iru (‘has disappeared’), through several tasks, including morphological interpretation of vocabulary and discourse completion. The study indicats stages of acquiring such verbs, from learning their morphology to using them pragmatically. Next, in Morita (2004), Australian learners were asked to interpret the morphological forms of transitive and intransitive verbs. No significant difference was found when their conversational ability and rate of correct responses to this task were compared. Interviews also revealed that learners encountered transitive verbs more frequently than intransitive verbs in explanations from their textbooks and instructors, and thus better understood transitive verbs. This suggests the possibility that the acquisition of transitive and intransitive verbs is burdened by the frequency of contact.  
Nakaishi (2005a) investigated trends in learners’ transitive and intransitive verb usage through video image tasks. The study found that intransitive verbs were more difficult to acquire than transitive verbs, and that individuals showed fixed acquisition patterns such as ‘using a transitive verb for a particular conjugation (always use transitive verbs when there is a te-form such as kime-te kudasai).’  
Nakaishi (2005b) also conducted a study that made use of storytelling, noting a lack of research demonstrating that learners make suitably differentiated use of transitive-intransitive verb pairs in working contexts. The results showed that learners not necessarily differentiate between transitive and intransitive verbs consciously, and only use one of the forms (the one that they are accustomed to) in a variety of settings, that they are accustomed to.  
ItoŻ (2012) studied the use of transitive and intransitive verbs by Chinese learner data in the KY Corpus, a linguistic resource that includes transcriptions of oral proficiency interview tapes. The study asserts that it is necessary to implement a shift from morphological accuracy to pragmatic appropriateness in the focus of instruction, based on the learner’s level of study. 
In summary, the following key points are stated in the existing transitive-intransitive verb acquisition research detailed above: 
6. Intransitive verb acquisition is difficult. 

7. Contact frequency is relevant. 

8. Lexical issues are relevant. 

9. Usage and pragmatic acquisition are linked with a morphological acquisition. 


Section 3.2 discusses transitive and intransitive verb acquisition in light of the findings from the above studies. 
 
3.2 Changing ideas about acquisition 
Stanovich, writing on vocabulary acquisition, states:  
There is considerable agreement that much – probably most – vocabulary growth takes place through the inductive learning of the meanings of unknown words encountered in oral and written language. It appears that the bulk of vocabulary growth does not occur via direct instruction. (Stanovich, 1986, p. 379).  
Additionally, Taylor states: “To know how to use a word, the speaker of a language would need to know specific facts about that word, facts which could only be acquired through exposure to how the word is used” (Taylor, 2012, p. 45). Nakaishi (2005b) also suggests that transitive and intransitive verb acquisition is possibly a vocabulary issue. Accordingly, we may suppose that, in transitive and intransitive verb acquisition, it is more efficient to have learners acquire morphological forms as part of pragmatic instruction rather than focus primarily on the morphological differences between individual words. This also ties into the usage-based categorization of transitive-intransitive verbs proposed by ItoŻ (2017): ‘transitive and intransitive verbs indicating situation construal’ and ‘transitive and intransitive verbs indicating situation report.’ This perspective is also visible in the connection with contact frequency described by 
Morita (2004) and suggests the possibility that learners may in practice already be learning from pragmatic use.  
The process of acquisition has previously been conceptualized in stages: a learner first acquires a form and then in the next stage acquires its usage. As in the morphocentric-type schema in Figure 1, the process starts from the morphological and syntactic opposition that is characteristic of Japanese transitive and intransitive verbs, e.g., aku/akeru (‘open’ (intr.)/‘open’ (tr.)). If we turn the morphocentric-type schema to the right, however, usage becomes the central focus around which forms are then involved. This enables us to change to the usage-centric type schema in Figure 1 without dismantling the process established in previous research. This creates an opportunity for language users to make expressive choices based on what they wish to say. 
 
 
 
 
.... .........

...........
Figure 1: Changing the acquisition process of transitive and intransitive verbs 
4 Differentiating between self-representation and native-speaker likeness 
4.1 Backward design in pursuit of self-representation 
Shirakawa (2005) provides the following example of improperly differentiated transitive-intransitive verb use. 
 
(3) 
 [.......................]
 

 
 [kitsu kute nakanaka akerare nai bin no futa wo akete] *ake-ta! 
 
 
 ‘The cap was stiff, so the bottle was hard to open [tr.], but I tried, and I did open [tr.] it.' (Shirakawa, 2005, p. 51) 
 


The learner perhaps thinks, “It was me who opened the bottle; the cap didn’t come off by itself.” When expressed with a transitive verb in Japanese like this, however, the utterance carries a boastful implication not intended by the speaker. According to Shirakawa, the learner is not aware of this unintended nuance, therefore rendering (3) erroneous (Shirakawa, 2005, p. 52). The study also posits the need for a more involved explanation when addressing improper use or non-use of expressions; as well as instruction in proper usage, learners also require an explanation of why other expressions are unsuitable (Shirakawa, 2005, p. 60). As discussed in the previous section, the common approach to transitive-intransitive verb acquisition has until now been gaining an understanding of the verbs’ contrasting morphology, linking then to a pragmatic, usage-based understanding that includes tense and aspect. On the other hand, we can create the opportunity to confront ideas of ‘native-speakerism’ by factoring the following into our stages of instruction: first, further motivating the learner to speak by clarifying his or her actions and intentions/wishes, and then considering which expression is needed, and whether or not other expressions could be suitable, to accomplish linguistic behavior appropriate to the learner’s intentions. By constructing classes in this manner, we can equip learners with the knowledge to properly differentiate between language use that resembles that of a native speaker and language use that is more representative of the learner’s self, thus challenging ‘native-speakers’ approaches that ask learners to ‘talk like a native speaker.’ 
Let us now discuss how we might construct classes in this manner. Wiggins and McTighe (2005) propose backward design as a method for planning learning that pursues specific understanding. The backward design follows the class-planning stages below (Figure 2), and the actual classes are conducted in practice from the third stage through to the first stage. 
 
 
 
......

...........
Figure 2: Stages of backward design (Wiggins & McTighe, 2005, p. 18) 
 
First stage: Identify desired results. 
First, identify what the student needs to learn, understand, and/or be capable of. The discussion above indicates that the biggest objective in learning Japanese transitive and intransitive verbs is for the learner to be able to convey what he or she wishes to say in light of pragmatic and cultural knowledge. 
 
Second stage: Determine acceptable evidence. 
Investigate how to determine whether the desired results established in the first stage have been achieved. If our desired result is for the learner to be able to convey what he or she wishes to say in light of pragmatic and cultural knowledge about Japanese language use, then it will be important to verify whether the learner has an understanding of pragmatic and cultural knowledge in the use of Japanese transitive and intransitive verbs and whether the learner is conveying what he or she wishes to say. 
 
Third stage: Plan learning experiences and instruction. 
Consider what manner of instruction would be most suitable for learning the material that needs to be verified in the second stage and plan individual classes. Here, we construct specific classes from what needs to be learned to gain an understanding of pragmatic and cultural knowledge in the use of Japanese transitive and intransitive verbs and from what needs to be learned for the learner to convey what he or she wishes to say. 
 
One potential risk when putting an idea shift like this into practice is that merely emphasizing pragmatic instruction may lead to Japanese learners treating Japanese L1 speakers as an ideal end-goal model, only further solidifying native-speakers’ ideas. To avoid native-speakerism here, there is a need for cultural literacy – that is, deciphering why particular linguistic behavior carries the meaning it does within the relevant culture. In the practice of teaching this cultural literacy, it is necessary to also include a critical pragmatics standpoint to help learners more readily judge whether or not to participate in those customs based on cultural information. This standpoint may be fostered by having learners study the pragmatic norms of the target language, discuss cultural perspectives with other learners and sometimes also native speakers, and thereby come to know why such norms exist in the target language (Ishihara, 2014). This may also lead to the recognition and correction of unequal vertical hierarchies that lead to native-speakerism in established stratified arrangements, such as the binary of ‘native’ versus ‘non-native’ speakers, and also consequentially lead to pragmatic choice 
as described in Section 2, which factors in properly differentiated use of both native-speaker-like and self-representing language. 
 
4.2 What lies beyond the choice of self-likeness 
It is well known that the spread of COVID-19 has occasioned a sudden leap in the proliferation of online education. Perhaps this will lead to more active adoption of online classes in future education even after the pandemic has ended. Amidst this increased adoption of online approaches is a predicted increase in global co-learning courses, even in language education. In this case, note that the original aim of co-learning (that is, coming into contact with diverse attitudes and values) will likely not be achieved merely under learners studying in the same setting. 
Fadel et al. (2015, p. 67) collected curricula from 35 nations, regions, and organizations across the globe and sorted their educational goals into the following four dimensions: 
1. Knowledge: What we know and understand 
2.  Skills: How we use what we know  
3.  Character: How we behave and engage in the world 
4. Meta-learning: How we reflect and adapt 
In the interests of space, the authors’ original work should be referred to for elaboration on each of these dimensions; the important point here is that future education includes character, a dimension separate from knowledge and skills. For Fadel et al. (2015), character includes mindfulness, curiosity, courage, resilience, ethics, and leadership. These elements relate to the individual identities of learners and, thus, to the idea of self-representation discussed in this paper. 
As global co-learning courses proliferate in the future, let us cultivate character in each learner, not by binding solely to linguistic rules but by adopting the aforementioned critical pragmatics standpoint in our practice and aiming for self-representative language use. 
5 Summary and future topics 
This paper began with an inquiry into the degree to which a language user’s own will is recognized in language education. Learners of the Japanese language should know to be able to differentiate between and make proper use of native-speaker likeness and self-representation in the language; this paper reexamines what that knowledge is by looking at the example of transitive and intransitive verbs, usage of which can differ depending on differences in cognitive understanding. The process for learner acquisition of Japanese transitive and intransitive verbs has conventionally started with learning forms and then 
moving to usage, drawing on the morphosyntactic opposition characteristic of such verbs. This paper has noted, however, that by shifting our approach to a more usage-centric acquisition process, we can create the opportunity for language users to make expressive choices based on what they wish to say. We next looked to designing classes to foster knowledge for proper differentiation between native-speaker likeness and self-representation with transitive and intransitive verbs by identifying through backward design the foundations of what learners need to know and learn; we further noted the importance of including critical pragmatics in our practice rather than merely emphasizing pragmatic instruction. Finally, the paper asserted that adopting this critical pragmatics standpoint in teaching practices and aiming for self-representing language use will link to cultivating character in each learner without binding the learner solely to linguistic rules. We have previously seen an idealistic discussion about studying and using the Japanese language concerning the views and ideas of learners. However, the proliferation of online education, which is only being accelerated by COVID-19, indicates that pro-diversity education must not be brushed aside as idealistic, and indeed we should aim to make it an actual reality. Accordingly, it is likely to become increasingly important to break free from adhering to native speaker norms; to side with learners, teaching them how to properly differentiate language use and enabling them to express themselves in self-representing ways; and to cultivate the character of each learner. 
Acknowledgments 
This work was supported by JSPS KAKENHI under Grant JP18K12419; and JSPS KAKENHI under Grant JP18H00680. 
References 
Ishihara, N. (2014). Chapter 6: Theories of language acquisition and the teaching of pragmatics. In N. Ishihara & A. D. Cohen (Eds.), Teaching and learning pragmatics: where language and culture meet (pp. 99–122). London: Routledge. 
Ishihara, N., & Cohen, A. D. (2014). Chapter 5: Learners’ pragmatics: potential causes of divergence. In N. Ishihara & A. D. Cohen (Eds.), Teaching and learning pragmatics: where language and culture meet (pp. 75–96). London: Routledge. 
ItoŻ, H. (2012). How do learners use transitive-intransitive verb pairs? A study of intermediate to higher-advanced Chinese learners of the Japanese language. Journal of International and Advanced Japanese Studies, 4, 43–52. https://doi.org/10.15068/00129435 
ItoŻ, H. (2017). Usage-based perspectives on transitive-intransitive verb pairs: using the balanced corpus of contemporary written Japanese. Researching Japanese and the Teaching of Japanese, 9, 103–118.  
Kobayashi, N. (1996). Expression of results and states with relative intransitive verbs: acquisition by learners of the Japanese language. Literature and Linguistics, 41–56.  
Kobayashi, N., & Naoi, E. (1996). Are [Japanese] relative transitive-intransitive verbs learnable? A study of Spanish speakers. Japanese Language Teaching Journal, 11, 83–98.  
Shirakawa, H. (2005). Teaching Japanese through a grammar independent from Japanese language research. In H. Noda (Ed.), A grammar for teaching communicative Japanese (pp. 43–62). Tokyo: Kurosio Publishers. 
Nakaishi, Y. (2005a). A second language study on transitive-intransitive verb pairs: an analysis of the uses of ‘tsuku/tsukeru,’ ‘kimaru/kimeru’ and ‘kawaru/kaeru’. Japanese Language Education, 124, 23–32.  
Nakaishi, Y. (2005b). Do learners make properly-differentiated use of transitive and intransitive verbs? A transitive-intransitive verb pair acquisition study using utterance surveys. In M. Minami (Ed.), New directions in applied linguistics of Japanese (151–161). Tokyo: Kurosio Publishers. 
Nakaishi, Y. (2017). Routes to acquiring Japanese transitive-intransitive verbs and their appropriate teaching methods [distributed material]. A Public Symposium on Japanese Transitive-Intransitive Verbs. 
Makihara, T. (2013). Controlling gestures and expressions of consideration. Japanese Communication Theory, 3, 63–72.  
Makihara, T. (2016). State construal and expression: transitive and intransitive verb selection. In M. Ono & Q. Li (Eds.), Subjectivity in language: the interface between cognition and politeness (pp. 151–171). Tokyo: Kurosio Publishers. 
Fadel, C., Bialik, M., & Trilling, B. (2015). Four-dimensional education: the competencies learners need to succeed. La Vergne, TN: Lightning Source, Inc. 
Morita, M. (2004). The acquisition of Japanese intransitive and transitive paired verb by English speaking learners: case study at the Australian National University. Japanese-Language Education Around the Globe, 14, 167–192.  
Stanovich, K. E. (1986). Matthew effects in reading: some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly, 21(4), 360–407. 
Taylor, J. R. (2012). The mental corpus: how language is represented in the mind. Oxford: Oxford University Press. 
Wiggins, G., & McTighe, J. (2005) Understanding by design (expanded 2nd ed.). Alexandria, VA: Association for Supervision and Curriculum Development. 
Yoshitomi, A. (2014). English as a lingua franca. In CLT Project, Sophia University (Ed.), Proficiency of English as an international language, considering communicative English teaching: theories and practices for language education in Japan (pp. 146-149). Tokyo: ALC Press Inc. 
 
CORPUS ANALYSIS OF THE COLLOCATIONS OF THE TRANSITIVE VERBS OWARU AND OERU 
Nastja PAHOR 
University of Ljubljana, Slovenia 
nastja.pahor@gmail.com 
Abstract 
The transitivity of Japanese verbs is a topic, which had been widely discussed in Japan even before the beginning of the Meiji period and is still one of the major obstacles for learners of Japanese nowadays. This paper focuses on the transitivity of the verbs owaru ... (to end [tr./intr.]) and oeru ... (to end [tr.]). It encompasses the analysis of collocations of the two verbs and examines their objects in the patterns ‘N wo owaru’ and ‘N wo oeru’. The aim of this research is to give a new perspective on the usage of the two verbs. The analysis of both collocations and co-occurring verbal forms shows collocations grouping into individual semantic categories. Furthermore, verbs exhibit specific morphological characteristics in different semantic fields of the collocations. 
Keywords: verb transitivity; owaru; oeru; collocations; corpus 
Povzetek 
Prehodnost japonskih glagolov je tema, o kateri se je na Japonskem široko razpravljalo že pred začetkom obdobja Meiji in še dandanes predstavlja eno večjih ovir pri učenju japonskega jezika. Raziskava se osredotoča na prehodno rabo glagolov owaru ... (končati, končati se) in oeru ... (končati) in podaja analizo njunih kolokacij oziroma predmetov v stavčnem vzorcu ‘N wo owaru’ in ‘N wo oeru’. Cilj študije je razjasnitev rabe obeh glagolov. Rezultati analize kolokacij in sopojavljajočih glagolskih oblik kažejo na porazdelitev kolokacij v specifična semantična polja ter na obstoj določenih morfoloških lastnostih glagolov, ki se pojavljajo v različnih semantičnih skupinah kolokacij. 
Ključne besede: glagolska prehodnost; owaru; oeru; kolokacije; korpus 
1 Introduction 
The discussion regarding verb transitivity in Japan can be observed even before the beginning of the Meiji period (Okutsu, 1967, p. 46). To this day, differentiating between transitive and intransitive verbs is still one of the major obstacles for learners of Japanese. Transitive verbs are those that take on an object and intransitive verbs are those that cannot take on an object.  
This paper discusses the transitivity of the verbs owaru ... (to end, finish [tr./intr.]) and oeru ... (to end, finish [tr.]). Semantically, the verb owaru comprises both the transitive and intransitive meaning, whereas oeru is only used as a transitive verb. Therefore, despite both verbs holding the meaning of ‘end’ or ‘finish’, their usage in terms of transitivity differs. The focal point of this paper is the semantic examination and categorization of their respective collocations in the patterns ‘N wo owaru’ and ‘N wo oeru’. ‘N’ represents the object of the verb, followed by the accusative case particle wo .. Additionally, verbal forms co-occurring with each collocation are examined. This part of the research aims to determine if any specific structural patterns or forms are present in each semantic group. 
 
1.1 Research motivation and purpose 
The general aim of this research is to give a new perspective on the usage of the verbs owaru and oeru, by focusing on the issue of transitivity and categorization of semantic fields of collocations belonging to each verb. 
In Section 2, previous research on the topic of transitivity is examined. Section 2.1 establishes the meanings of ‘transitive’ and ‘intransitive verbs’, while Section 2.2 focuses specifically on the verbs owaru and oeru as a pair. 
However, despite extensive discussions on the topic, discrepancies can still be observed in dictionary definitions of the verbs (Section 3.1). Similarly, an unbalanced representation of the two verbs in instructional materials (Section 3.3), as well as a consequential non-uniform perception of their usage (Section 3.4), can be seen. An example of diachronic change in verb use is also provided (Section 3.2). 
To clarify the abovementioned inconsistencies regarding verb transitivity, this paper examines collocations co-occurring with each verb in order to portray a picture of their semantic distribution and compare the two verbs (Section 5). Furthermore, verb forms of owaru and oeru belonging in each semantic group are also analyzed (Sections 6 and 7). 
 
1.2 Methodology 
After an analysis of prior research in Section 2, Section 3 is divided into four segments. The first segment (Section 3.1) is dedicated to the analysis of verb definitions that have been retrieved from eight dictionaries, published over several decades. Secondly, Section 3.2 compares dictionary definitions with sample sentences retrieved from the Corpus of Historical Japanese (Nihongo rekishi kopasu ......—.., henceforth CHJ). The comprised data consist of over 16 million words, with source texts dating to the eras of Nara, Heian, Kamakura, Muromachi, Edo, and Meiji. This analysis is followed by an examination of instructional materials and the way each of them presents the verbs to their audience (Section 3.3). The last part (Section 3.4) deals with the general public’s perception of both verbs and their transitivity. Examples, which indicate a mixed understanding of the verbs, especially in regards to owaru, have been retrieved from sites Yahoo! Chiebukuro ... and HiNative. 
Following is the empirical part of this research. Firstly, sample sentences were retrieved using the following three corpora created by NINJAL (National Institute for Japanese Language and Linguistics Kokuritsu kokugo kenkyujo .......): 
• Balanced Corpus of Contemporary Written Japanese (Gendai nihongo kakikotoba kinko kopasu ............—.., henceforth BCCWJ). The data are comprised of 104.3 million words from various texts published between 1976-2005. 

• Corpus of Spontaneous Japanese (Nihongo hanashikotoba kopasu........—.., henceforth CSJ). It comprises over 650 hours of recordings, transcribed into approximately 7 million words. The data were recorded between 1999-2001. 

• Nagoya University Conversation Corpus (Meidai kaiwa kopasu .....—.., henceforth NUCC). 129 conversation recordings, created between 2001-2003, expand over a span of roughly 100 hours. 


The concordancer used to filter sentences is Chunagon. Due to a high number of result sentences BCCWJ, the concordancer NINJAL-LWP, which allows sorting according to the frequency of appearance of each word, was used as well. Subsequently, the results were downloaded as an .xlsx file and all sentences were manually analyzed. Details concerning this part of the research are elaborated in Sections 4. 
The semantic analysis of collocations is based on Bunrui Goihyo: zoho kaiteiban (.....:....., Word List by Semantic Principles, Revised and Enlarged Edition), published by NINJAL in 2003. The list is a collection of words classified and arranged by their meanings. Details regarding the process of classification within this paper are explained in Section 5. 
The categorization of verbal forms is provided in detail in Sections 6 and 7. 
2 Verb transitivity 
The discussion regarding verb transitivity has been an ongoing debate, dating back to the beginning of the Meiji era. Okutsu (1967, p. 46) cites several linguists that have already researched this topic. These include Motoori Haruniwa, Gonda Naosuke, Kurokawa Harumura, Otsuki Fumihiko, Yamada Yoshio, Mochizuki SeikyoŻ, Nishio Teraya, Sakuma Kanae, and Bernard Bloch. Okutsu himself shares some points of view that Haruniwa, Sakuma, and Bloch proposed. However, he points out that the results discovered up to that point were not sufficient to give clear answers regarding the issue of transitivity and consequently proposes his own categorization. 
 
2.1 Defining intransitive and transitive verbs 
According to Okutsu’s criteria, transitive verbs are those that have an object in the form of a noun followed by the case particle wo. All other verbs are intransitive (Numata, 1989, p. 196). It is important to differentiate between the particle wo marking an object and the particle wo that is followed by verbs of motion. 
Additionally, two verbs have to meet conditions on three separate levels to be recognized as a pair. 
Firstly, on a morphological level, two verbs must present the same root. Amano et al. (2013, p. 70) give the verbs aku .., ‘to open’ [intr.], and akeru ..., ‘to open’ [tr.], as an example of verbs which share the root /ak/. 
Secondly, from a syntactic point of view, the sentence with a transitive verb gains a subject A followed by the case particle ga ., while the subject of the sentence with an intransitive verb B becomes the object of the transitive verb followed by the case particle wo as seen in Figure 1 below (Numata, 1989, p. 197; Amano et al., 2013, p. 70). 
 
 

Figure 1: Intransitive verb in relation to its transitive pair 
 
Lastly, a semantic structure must be observed. The subject A, which appears in the sentence with a transitive verb, must influence the occurrence or event B. Event B takes on the role of an object of the same verb and is simultaneously depicted in the sentence with an intransitive verb as its subject. In short, the sentence with a transitive verb must also cover the meaning of the corresponding intransitive verb sentence. 
2.2 The verbs owaru and oeru as a pair 
Okutsu (1967, p. 63) categorizes the verb oeru as a verb that forms its pair owaru with the process of intransitivization. The shared root verb is ‘owe-’ (.).  
In the section regarding ergative verbs, Morita (1994, p. 240) states that owaru and oeru already exist as a pair. He gives the following sentences as examples: 
 
a) 
 .......
 

 
 Watashi wa hanashi wo oeru. 
 
 
 I [top] story [acc] to finish [tr.act.pres] 
 
 
 I finish the story. 
 


 
b) 
 .....
 

 
 Hanashi ga owaru. 
 
 
 Story [nom] to finish [intr.act.pres] 
 
 
 The story finishes. 
 


 
Furthermore, Izuhara (2010) responds to a question regarding the nature of owaru in the sentence ‘with this I finish the lesson’ (kore de jugyo wo owarimasu ...........). With the use of a dictionary definition, according to which owaru is a ‘jitadoshi ....’, he explains an ergative verb that allows both transitive and intransitive use. 
The most accurate definition of owaru and oeru would be that owaru is a verb that forms two pairs in terms of transitivity. The first one is owaru [intr.] . owaru [tr.], whereas the second is owaru [intr.] . oeru [tr.]. 
 
2.3 Use of causative 
When an intransitive verb lacks its transitive pair, the role can be performed by the causative form of the intransitive verb. Intransitive verbs that lack a transitive pair are defined as zettai jidoshi ..... (Amano et al., 2013, p. 70). Due to inconsistencies regarding the usage of the verb owaru and the general perception of it being purely intransitive, which are discussed in detail in the following Section 3, I propose the hypothesis that verbs co-occurring with collocations will be found in sentence patterns or phrases including causative forms of the verb owaru. 
 
3 Meaning and use of the verbs owaru and oeru  
This section gives an overview of dictionary definitions of both verbs, as well as their representation in instructional materials, and the public’s general perception. 
 
3.1 Use of owaru and oeru according to dictionary definitions 
The verb owaru is categorized as an ergative verb regardless of the dictionary (Table 1). On the other hand, definitions of oeru point at minor inconsistencies in its use. No deviations are found in regards to the transitive use, which is present in all seven dictionaries. However, Kokugo jiten and Daijirin list the intransitive use as well: 
 
Kokugo jiten: 
 ....—·........
 

 
 ‘Kaiki ga -- eta’ no yo ni, jidoteki ni tsukau koto mo aru. 
 
 
 ‘The session --’ Sometimes used intransitively as shown. 
 


 
Daijirin: 
 (...)........
 

 
 (Jidoshi) Owaru. Hateru. 
 
 
 (Intransitive verb) To end. To finish. 
 


 
Shin meikai kokugo jiten lists the intransitive use with an annotation of it being based on incorrect use (moto goyo ni motozuku ........). 
 
Table 1: Dictionary definitions of owaru and oeru 
Dictionary 
 owaru 
 oeru 
 

intr. use 
 tr. use 
 intr. use 
 tr. use 
 
Kojien ..., 1955 
 YES 
 YES 
 NO 
 YES 
 
Kokugo jiten ...., 1979 
 YES 
 YES 
 YES 
 YES 
 
Progressive Japanese-English Dictionary, 1993 
 YES 
 YES 
 NO 
 YES 
 
New Japanese-English Dictionary, 1998 
 YES 
 YES 
 NO 
 YES 
 
Daijirin ..., 2006 
 YES 
 YES 
 YES 
 YES 
 
jaSlo, 2006 
 YES 
 YES 
 NO 
 YES 
 
Shin meikai kokugo jiten ......., 2012 
 YES 
 YES 
 (YES) 
 YES 
 


 
3.2 Comparison of the use of owaru and oeru in corpora CHJ and BCCWJ 
A short comparison of the use of verbs owaru and oeru in the Corpus of historical Japanese CHJ and the Balanced corpus of contemporary written Japanese BCCWJ illustrates some interesting results (see Tables 2 and 3): 
Table 2: Use of owaru and oeru in CHJ 
 
 owaru 
 oeru 
 

Total cases 
 2,178 
 380 
 
Cases of intransitive use (N + ga) 
 64 
 1 
 
Cases of transitive use (N + wo) 
 366 
 227 
 


 
Table 3: Use of owaru and oeru in BCCWJ 
 
 owaru 
 oeru 
 

Total cases 
 19,244 
 4,624 
 
Cases of intransitive use (N + ga) 
 5,861 
 5 (12) 
 
Cases of transitive use (N + wo) 
 1,046 
 2,921 
 


 
 
It is clear that the verb owaru is used more extensively in comparison to oeru. Furthermore, despite some dictionary definitions allowing it, the intransitive use of oeru is negligible, as can be seen in some examples listed below. Out of 12 sentences, only five are actual cases of intransitive use (1-5); others can be explained with the structure indicating volition N1 ga oeyo to suru N2 (6), the potential form oerareru which requires the particle ga . (7), or an incorrect morphological analysis (8-9). 
 
1) 
 .........
 

 
 Tsumari, juninenkan no shugyo ga oete busshikai wo ukeru no de wa nakute, […] 
 
 
 training [nom] to end [intr.act.ger] 
 
 
 In brief, as the 12-year-long training finishes, you don’t accept the Buddhist principles, […] 
 


 
2) 
 ......
 

 
 Kono koto wa, gakushukai ga oete kara isso shomei saremashita. 
 
 
 study group [nom] to finish [intr.act.ger] 
 
 
 This became even clearer once the study session finished. 
 


 
3) 
 ...
 

 
 Yatto sankagetsu ga oeyo to shiteimasu ga, shojiki hotto shiteimasu. 
 
 
 three months [nom] to end [intr.act.vol] 
 
 
 Three months have finally come to pass, and I honestly feel relieved. 
 


 
4) 
 .............
 

 
 Takuji ga oeta no wa juuniji goro, […] 
 
 
 daycare [nom] to end [intr.act.pst.adn] 
 
 
 Daycare ended at around 12 o’clock, […] 
 


 
5) 
 ........................
 

 
 ‘Yorimasa’ ga oe, ‘Jizomai’ ga enjirareru koro kara, […] 
 
 
 Yorimasa [nom] to finish [intr.act.inf] 
 
 
 Ever since Yorimasa ended and Jizomai has been performed, […] 
 


 
6) 
 .....
 

 
 Omoeba, watashitachi ga oeyo to shiteiru konseiki mo, nanto, senso no arashi ga fukiareta jidai de atta koto ka. 
 
 
 we [nom] to end [tr.act.vol] 
 
 
 Come to think of it, has this century, that we are about to end, also been a, what, an era, during which the storm of war blew violently? 
 


 
7) 
 .........
 

 
 Izure ni seyo, buji torihiki ga oerareru yo, onaji shuppinsha no tachiba kara oinori itashite orimasu. 
 
 
 deal [nom] to end [poten.nonpst.adn] 
 
 
 Either way, as a fellow exhibitor myself, I pray that your deal can be completed without problems.  
 


 
8) 
 ...........
 

 
 Incorrect analysis: ...·... . .. (....) 
 


 
9) 
 ............
 

 
 Incorrect analysis: .... . .... 
 


 
Additionally, the diachronic analysis points at a remarkable shift in the usage of owaru. According to results observed in CHJ, the number of cases, in which owaru is used transitively, is approximately six times higher compared to the number of sentences with intransitive use. On the other hand, BCCWJ illustrates a completely reversed picture, as the number of examples of intransitive use is six times higher. This outcome further reinforces the thought that the verb owaru in modern Japanese is heavily leaning towards an exclusively intransitive use. 
 
3.3 Representation of owaru and oeru in instructional materials 
Owaru is generally represented in its intransitive use when appearing within exercises or texts for reading comprehension. Examples of transitive use are only present in textbook sections dedicated to detailed explanations or glossaries. Similarly, oeru is only mentioned in such sections and does not appear in practical example sentences or other exercises (Table 4). 
Table 4: Representation of owaru and oeru in instructional materials 
Textbook 
 owaru 
 oeru 
 

intr. use 
 tr. use 
 intr. use 
 tr. use 
 
Uvod v japonsko pisavo, 2007 
 YES 
 NO 
 NO 
 YES 
 
Japonščina za začetnike 1 in 2, 2012/2016* 
 YES 
 YES 
 / 
 / 
 
Minna no nihongo (shokyu 2), 1998/2013* 
 YES 
 YES 
 / 
 / 
 
Kanji Goi ga yowai anata e, 2013* 
 YES 
 NO 
 / 
 / 
 
Pregled slovnice japonskega jezika, 2005 
 YES 
 YES 
 NO 
 YES 
 
Tobira, 2009* 
 YES 
 NO 
 / 
 / 
 
Essential Japanese Grammar, 2012 
 YES 
 YES 
 NO 
 YES 
 


* The asterisk indicates that the verbs appear exclusively as part of exercises or reading comprehension texts and not in sections dedicated to detailed explanations. 
 
3.4 General perception of the use of owaru and oeru 
The representation of both verbs in instructional materials mirrors the general perception of the usage of owaru and oeru. Based on various examples taken from the websites Yahoo! Chiebukuro ...1 and HiNative2, the results are consistent with previous findings. Links to full examples are provided in footnotes. 
1 Retrieved from https://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q1011595455 
https://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q12108164892 
https://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q11114803361 
https://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q12194686977 
2 Retrieved from https://hinative.com/ja/questions/5638244 
Users express their doubts about the transitive use of owaru and place the two verbs or the particles wo and ga in juxtaposition, questioning the correct use: ‘end a lesson’ (jugyo wo owaru ...... and ......) or ‘end a lesson’ (jugyo wo owaru ......) and ‘the lesson ends’ (jugyo ga owaru ......). It appears that most users lean towards the perception of owaru as solely intransitive. 
In response to the above, some answers correctly state that owaru is an ergative verb and point out both the transitive and intransitive use, while others label the transitive use as an exception to the rule, or explain the presence of the particle wo with the causative owaraseru ...... 
Discrepancies in dictionary definitions, a lack of representation in instructional materials and the public’s unanimous general perception of both verbs are all points of concern, as well as the main reasons for this research being conducted. Section 4 
elaborates on the searching criteria used for sentence sampling and is followed by Sections 5-8, which are dedicated to the examination of collocations, verb forms and their relation, and the clarification of the transitivity of owaru and oeru. 
4 Searching criteria 
In order to extract collocations of owaru and oeru in their transitive use from three corpora, the following criteria were set (Figure 2). 
 
 

Figure 2: Searching criteria in Chunagon 
 
 
Within this research, the ‘short unit method’ (tan-tan’i kensaku .....) was used, as the keywords are limited to owaru ‘...’ and oeru ‘...’. 
The purple box indicates the keyword (ki .—). An additional searching condition is searching by ‘lexeme’ (goiso ...), which includes all of the verbs’ tokens, such as conjugated forms and various kanji characters appearing in the corpora, as long as they are all classified under the same lemma (Srdanović, 2016, p. 28).  
The green box above shows the ‘front collocation 1’ (zenpo kyoki 1 ....1). It is important to note that this option refers to the collocation exclusively in the context of corpus searching and not the collocation (verb object) that is discussed in the rest of this paper. Within this analysis, the front collocation option serves as a tool to limit the search to transitive verb use. For this purpose, the option is set to the accusative case particle wo ‘.’, which defines the object of a verb. 
The concordancer Chunagon also allows searching for distant collocations. Such collocations can be found at a distance of at least one or more interposed words (Srdanović, 2016, p. 21). However, due to an already high number of results, the option was set to a fixed distance of one word; the front collocation is immediately followed by the keyword. The option shojikei shutsugenkei ...... limits the search to the form of wo as written in the box (.). 
On a semantical level, all collocations within examples of corpora CSJ and NUCC were analyzed. A minimal frequency of appearance of 5 for owaru and 15 for oeru was set for collocations found in BCCWJ due to a high number of results. Some additional examples with lower frequency were analyzed for comparison as they appear in both corpora (see Section 5). 
On the same basis, verb forms co-occurring with a collocation that represents at least 1% of all gathered sentences for each verb were analyzed within BCCWJ. For owaru (1,046 sentences) this means the analysis of verbs co-occurring with collocations of a frequency of 10 or above; for oeru (2,921 sentences) of 29 or above (see Table 5 below). 
 
Table 5: Number of results in corpora 
 
 BCCWJ 
 CSJ 
 NUCC 
 Total 
 

Appearance frequency (general) 
 owaru 
 19,247 
 2,539 
 416 
 22,202 
 
oeru 
 4,624 
 119 
 3 
 4,746 
 
Appearance frequency (transitive) 
 owaru 
 1,046 
 303 
 8 
 1,357 
 
oeru 
 2,921 
 63 
 3 
 2,987 
 
Analyzed verb forms 
 owaru 
 581 
 303 
 8 
 892 
 
oeru 
 623 
 63 
 3 
 689 
 
Analyzed collocations (semantic)  
 owaru 
 640 
 303 
 8 
 951 
 
oeru 
 1,046 
 63 
 3 
 1,112 
 


 
 
Due to a low number of sentences in NUCC, the results are grouped with those from CSJ. Both are corpora of spoken Japanese. 
Detailed information regarding the semantical and morphological analyses is explained in the upcoming Sections 5, 6, and 7. 
5 Semantical analysis of collocations 
The semantical categorization of collocations (objects of the transitive verbs) is based on Bunrui goihyo: zoho kaiteiban (Word list by semantic principles, revised and enlarged 
edition, henceforth Bunrui goihyo). The word list is comprised of 79,027 lemmas (94,985 words in total). Words in Bunrui goihyo are classified as follows. 
Each word is first categorized by ‘class’, rui .. This category consists of four groups: 1) nouns - tai no rui ..., 2) verbs - yo no rui ..., 3) -i and -na adjectives, adverbs and adnominal adjectives - so no rui ..., and 4) other - part of adverbs, conjunctions, and interjections - sono ta no rui ...... 
The categories are then further grouped into ‘divisions’ (bumon ..), followed by ‘sections’ (chukomoku ...), and finally ‘articles’ (bunrui komoku ....). Each ‘article’ is then divided into several numbered paragraphs. 
For example, the word ‘question’ (shitsumon ..) can be found next to the ID number 1.3132, indicating: 
• number 1 (1.3132) - class (1. noun, tai no rui ...) 

• number 3 (1.3132) - division (1.3 human activity - psyche and actions, ningen katsudo – seishin oyobi koi ....—.......) 

• number 1 (1.3132) - section (1.31 language/speech, gengo ..) 

• number 32 (1.3132) - article (1.3132 dialogue, mondo ..) 


All collocations in this paper are classified into semantic groups based on ‘sections’ (in the example above ‘language/speech’), in order to prevent the categorization from becoming too fragmentary and at the same time making a clear distinction between each semantic field. 
Tables 6, 7, 9, and 10 provide a list of all collocations with corresponding transcriptions and translations, as well as their frequency of appearance and the semantic field they were sorted into as per the Bunrui goihyo classification. 
As mentioned at the end of Section 4, due to a high number of results, restrictions have been applied to collocations and verb forms from BCCWJ. The collocations co-occurring with analyzed verb forms are marked in bold. 
 
5.1 Owaru 
Collocations of the verb owaru can be classified into nine semantic fields or groups. These are ‘time’, ‘person’, ‘work’, ‘speech’, ‘quantity’, ‘mental process’, ‘relationship’, ‘organization’ and ‘life’ (see Tables 6 and 7). 
 
 
 
 
 
 
Table 6: List of collocations (owaru - BCCWJ) 
owaru
 

Verb object 
 Transcription 
 Translation 
 Frequency 
 Semantic field 
 
.. 
 shitsumon 
 question 
 339 
 speech 
 
.. 
 shikko 
 execution 
 78 
 work 
 
.. 
 toron 
 discussion 
 37 
 speech 
 
.. 
 shigoto 
 work 
 28 
 life 
 
.. 
 issho 
 whole life 
 17 
 time 
 
.. 
 hokoku 
 report 
 16 
 speech 
 
. 
 hanashi 
 story, talk 
 14 
 speech 
 
.. 
 setsumei 
 explanation 
 14 
 speech 
 
.. 
 senso 
 war 
 12 
 relationship 
 
.. 
 shogai 
 life 
 11 
 time 
 
... 
 kyokasho 
 textbook 
 8 
 speech 
 
.. 
 shokuji 
 meal 
 8 
 life 
 
. 
 kai 
 -times 
 7 
 quantity 
 
.. 
 jinsei 
 life 
 7 
 life 
 
.. 
 jidai 
 period, era 
 6 
 time 
 
.. 
 subete 
 everything, all 
 6 
 quantity 
 
.. 
 hatsugen 
 statement 
 6 
 speech 
 
.. 
 temae 
 tea-ceremony procedure 
 6 
 life 
 
. 
 yume 
 dream 
 5 
 mental process 
 
.. 
 chinjutsu 
 declaration 
 5 
 speech 
 
.. 
 shitsugi 
 question, interpellation 
 5 
 speech 
 
.. 
 shori 
 processing, treatment 
 5 
 work 
 


 
Table 7: List of collocations (owaru - CSJ/NUCC) 
owaru
 
Verb object 
 Transcription 
 Translation 
 Frequency 
 Semantic field 
 

.. 
 happyo 
 speech 
 171 
 speech 
 
. 
 hanashi 
 story, talk 
 72 
 speech 
 
.. 
 hokoku 
 report 
 18 
 speech 
 
.. 
 setsumei 
 explanation 
 4 
 speech 
 
.. 
 shigoto 
 work 
 4 
 life 
 
.. 
 kanbu 
 executive 
 3 
 work 
 
.. 
 koen 
 lecture 
 3 
 speech 
 
..—. 
 supichi 
 speech 
 3 
 speech 
 


owaru
 
Verb object 
 Transcription 
 Translation 
 Frequency 
 Semantic field 
 

~.... (.) 
 ni tsuite (hanashi) 
 about (talk) 
 2 
 speech 
 
.. 
 jugyo 
 lesson 
 2 
 work 
 
..... 
 kokosei iko 
 during/after high school 
 1 
 time 
 
... 
 yakyubu 
 baseball club 
 1 
 organization 
 
.. 
 hatsuwa 
 utterance, speech 
 1 
 speech 
 
.. 
 kensaku 
 searching 
 1 
 mental process 
 
... 
 zen’yasai 
 night before a festival 
 1 
 life 
 
.. 
 seikatsu 
 life 
 1 
 life 
 
.. 
 bunseki 
 analysis 
 1 
 mental process 
 
. 
 bun 
 sentence 
 1 
 speech 
 
.. 
 senshu 
 player 
 1 
 work 
 
.. 
 kanso 
 impression 
 1 
 mental process 
 
... (.) 
 machi urawa (hanashi) 
 Urawa city (story) 
 1 
 speech 
 
.. 
 nyuryoku 
 input 
 1 
 work 
 
..... 
 shogakko jidai 
 time period of primary school 
 1 
 time 
 
.. 
 senso 
 war 
 1 
 relationship 
 
...... 
 konpanion 
 companion 
 1 
 work 
 
.. (..) 
 kore (funso) 
 this (dispute) 
 1 
 relationship 
 
.—. 
 gemu 
 game 
 1 
 life 
 
.. 
 shirabe 
 investigation 
 1 
 mental process 
 
... 
 yamato 
 Yamato 
 1 
 person 
 
.. 
 yatsu 
 he 
 1 
 person 
 
.—. 
 toku 
 talk 
 1 
 speech 
 
...— 
 dokuta 
 doctor (PhD) 
 1 
 work 
 
.—...(..) 
 tagetto (mokuhyo) 
 target (objective) 
 1 
 mental process 
 
.. 
 shokuji 
 meal 
 1 
 life 
 
. (..—.) 
 no (repoto) 
 nominalization (report) 
 1 
 speech 
 
... 
 kosodate 
 parenting 
 1 
 work 
 
.. (..) 
 sore (taiken) 
 that (experience) 
 1 
 mental process 
 
.. (..) 
 sore (jisshu) 
 that (practice) 
 1 
 mental process 
 
.. (..) 
 sore (setsumei) 
 that (explanation) 
 1 
 speech 
 


 
Once the results are summed up, it becomes apparent that over 75% of analyzed sentences include collocations classified into the semantic field of ‘speech’, such as ‘question’ (shitsumon ..) or ‘story’ (hanashi .). The most prominent group is followed by the semantic field of ‘work’, including collocations such as ‘execution’ (shikko ..). 6% of all collocations are sorted into the semantic field of ‘life’ (‘life/lifetime’ shogai .., ‘life/living’ seikatsu ..). The remaining 5% are evenly distributed between smaller groups (Table 8 and Figure 3). 
 
Table 8: Semantic fields of collocations (owaru) 
 
 Time 
 Person 
 Work 
 Speech 
 Quantity 
 Mental 

 Relationship 
 Organization 
 Life 
 

BCCWJ 
 34 
 0 
 83 
 444 
 13 
 5 
 12 
 0 
 49 
 
5.3% 
 0.0% 
 13.0% 
 69.4% 
 2.0% 
 0.8% 
 1.9% 
 0.0% 
 7.7% 
 
CSJ/NUCC 
 8 
 2 
 10 
 279 
 0 
 7 
 2 
 1 
 8 
 
2.5% 
 0.6% 
 3.2% 
 88.0% 
 0.0% 
 2.2% 
 0.6% 
 0.3% 
 2.5% 
 
Total 
 42 
 2 
 93 
 723 
 13 
 12 
 14 
 1 
 57 
 
4.4% 
 0.2% 
 9.7% 
 75.5% 
 1.4% 
 1.3% 
 1.5% 
 0.1% 
 6.0% 
 


 
 
 
0

100

200

300

400

500

600

700

800

time

person

work

speech

quantity

mental process

relationship

organization

life

BCCWJ

CSJ/NUCC

Total


Figure 3: Semantic fields of collocations (owaru) 
 
5.2 Oeru 
Collocations are divided into seven semantic categories: ‘time’, ‘work’, ‘speech’, ‘mental process’, ‘relationship’, ‘organization’ and ‘life’. All categories overlap with those observed in owaru. (see Tables 9 and 10).  
Table 9: List of collocations (oeru - BCCWJ) 
oeru - BCCWJ 
 
Verb object 
 Transcription 
 Translation 
 Frequency 
 Semantic field 
 

.. 
 shigoto 
 work 
 189 
 life 
 
.. 
 shogai 
 life 
 63 
 time 
 
.. 
 shokuji 
 meal 
 62 
 life 
 
.. 
 issho 
 whole life 
 57 
 time 
 
.. 
 sagyo 
 work, duty 
 51 
 life 
 
. 
 shiki 
 ceremony 
 42 
 life 
 
.. 
 yakuwari 
 part, role 
 40 
 work 
 
.. 
 torihiki 
 trade, business 
 34 
 work 
 
. 
 tabi 
 travel 
 30 
 life 
 
.. 
 yakume 
 duty, role 
 29 
 work 
 
.. 
 choshoku 
 breakfast 
 27 
 life 
 
.. 
 satsuei 
 photography 
 27 
 work 
 
. 
 hanashi 
 story, talk 
 26 
 speech 
 
.. 
 junbi 
 preparation 
 24 
 mental process 
 
.. 
 seikatsu 
 life 
 23 
 life 
 
.. 
 jugyo 
 lesson 
 23 
 work 
 
.. 
 shuzai 
 collecting data 
 22 
 mental process 
 
.. 
 kyoiku 
 education 
 21 
 work 
 
.. 
 katsudo 
 activity 
 20 
 work 
 
. 
 hi 
 day 
 20 
 time 
 
.. 
 denwa 
 phone call 
 20 
 speech 
 
. 
 sei 
 life 
 20 
 life 
 
.. 
 yushoku 
 dinner 
 19 
 life 
 
.. 
 kunren 
 training 
 19 
 work 
 
.. 
 shussan 
 birth, delivery 
 19 
 life 
 
.. 
 chosa 
 survey 
 18 
 mental process 
 
. 
 kata 
 kata (sports) 
 18 
 life 
 
.. 
 shujutsu 
 surgery 
 18 
 work 
 
.. 
 renshu 
 exercise 
 17 
 mental process 
 
. 
 ikusa 
 battle 
 17 
 relationship 
 
. 
 kai 
 meeting 
 16 
 organization 
 
.. 
 jinsei 
 life 
 15 
 life 
 


 
 
Table 10: List of collocations (oeru - CSJ/NUCC) 
oeru - CSJ/NUCC 
 
Verb object 
 Transcription 
 Translation 
 Frequency 
 Semantic field 
 

.. 
 shigoto 
 work 
 7 
 life 
 
..... 
 puroguramu 
 program 
 4 
 speech 
 
.. 
 torihiki 
 trade, business 
 4 
 work 
 
.. 
 hokoku 
 report 
 3 
 speech 
 
.. 
 gakushu 
 learning 
 2 
 mental process 
 
.. 
 shussan 
 birth, delivery 
 2 
 life 
 
.. 
 issho 
 whole life 
 3 
 time 
 
.. (..) 
 gakko (kyoiku) 
 school (education) 
 2 
 work 
 
. 
 tabi 
 travel 
 2 
 life 
 
... 
 kekkonshiki 
 wedding 
 2 
 life 
 
.. 
 nenkan 
 in a year (period) 
 1 
 time 
 
.. 
 suiron 
 deduction 
 1 
 mental process 
 
. 
 hanashi 
 story, talk 
 1 
 speech 
 
.. 
 renshu 
 exercise 
 1 
 mental process 
 
.. 
 jisshu 
 practice 
 1 
 mental process 
 
.. 
 honban 
 performance 
 1 
 work 
 
.. 
 yakyu 
 baseball 
 1 
 life 
 
.. 
 shogai 
 life 
 1 
 time 
 
.. 
 seikatsu 
 life 
 1 
 life 
 
.. 
 jugyo 
 lesson 
 1 
 work 
 
.. 
 sagyo 
 work, duty 
 1 
 life 
 
.. 
 chosa 
 survey 
 1 
 mental process 
 
.. 
 ronbun 
 thesis, article 
 1 
 speech 
 
. 
 bun 
 sentence 
 1 
 speech 
 
.. 
 aisatsu 
 greeting, address 
 1 
 speech 
 
.. 
 taiyaku 
 important role 
 1 
 work 
 
.. 
 gasshuku 
 training camp 
 1 
 life 
 
.. 
 shiai 
 match, game 
 1 
 life 
 
.. 
 sokai 
 general meeting 
 1 
 relationship 
 
. 
 nin 
 duty 
 1 
 work 
 
...—. 
 intan 
 intern 
 1 
 work 
 
.... 
 yakiire 
 quenching 
 1 
 work 
 
... 
 shikomi 
 preparation 
 1 
 work 
 
... 
 konpa 
 party, event 
 1 
 relationship 
 
~..... (..) 
 tte iu no (jugyo) 
 nominalization (lesson) 
 1 
 work 
 
..... 
 rejidento 
 medical resident 
 1 
 work 
 


oeru - CSJ/NUCC 
 
Verb object 
 Transcription 
 Translation 
 Frequency 
 Semantic field 
 

... 
 baito 
 part-time job 
 1 
 life 
 
.. (..) 
 subete (soretsu) 
 everything, all (funeral) 
 1 
 life 
 
..—. 
 fezu 
 phase 
 1 
 time 
 
..—... 
 suparingu 
 sparring 
 1 
 life 
 
...—. 
 furumeku 
 make-up 
 1 
 life 
 
.. (..) 
 sore (shigoto) 
 that (work) 
 1 
 life 
 
.. (..) 
 sore (kyoiku) 
 that (education) 
 1 
 work 
 
 .........1 (...) 
 Minna no nihongo 1 (kyokasho) 
 Minna no nihongo 1 (textbook) 
 1 
 speech 
 
...— 
 dokuta 
 doctor (PhD) 
 1 
 work 
 


 
 
When compared to owaru, a noticeable difference in semantic distribution can be observed. Almost half (48.4%) of all collocations are classified into the semantic field of ‘life’. This category is followed by the semantic group of ‘work’, which amounts to a little less than a quarter of all collocations (22.3%). The only other notable category is collocations related to ‘time’ with 13.1%. The remaining groups are relatively small. It is worth pointing out that the largest semantic category of owaru, ‘speech’, only amounts to 5.2% within oeru (Table 11 and Figure 4). 
 
Table 11: Semantic fields of collocations (oeru) 
 
 Time 
 Work 
 Speech 
 Mental 

 Relationship 
 Organization 
 Life 
 

BCCWJ 
 140 
 231 
 46 
 81 
 17 
 16 
 515 
 
13.4% 
 22.1% 
 4.4% 
 7.7% 
 1.6% 
 1.5% 
 49.2% 
 
CSJ/NUCC 
 6 
 17 
 12 
 6 
 2 
 0 
 23 
 
9.1% 
 25.8% 
 18.2% 
 9.1% 
 3.0% 
 0.0% 
 34.8% 
 
Total 
 146 
 248 
 58 
 87 
 19 
 16 
 538 
 
13.1% 
 22.3% 
 5.2% 
 7.8% 
 1.7% 
 1.4% 
 48.4% 
 


 
 
 
0

100

200

300

400

500

600

time

work

speech

mentalprocess

relationship

organisation

life

BCCWJ

CSJ/NUCC

Total


Figure 4: Semantic fields of collocations (oeru) 
6 Categorization of verb forms and collocations 
This section discusses the categorization of verb forms. 
Analyzed verbs were first grouped into six categories according to the voice (voisu, tai ...., .) or modality (modariti, ho ....., .) they display. Due to an already high number of results, only verb forms exhibiting modality through inflection were counted. 
Furthermore, almost all verbs in modality categories are found in the active voice. There are six exceptions where a combination of two morphemes can be observed (causative + volition, causative + desire). Such exceptions are only found in examples of the verb owaru and were assigned to both categories (see 6.1). Once categorized, all verb forms were further classified into eight subcategories (see 6.2). 
All samples were exported in an .xslx file and analyzed manually. Both categories and subcategories are based on verb forms appearing throughout sample sentences. Verb forms not observed within examples of either owaru or oeru are therefore not included. 
When a verb form corresponded to one of the categories or subcategories, a point was assigned in the designated table. Two examples of point-counting are given at the end of this section (refer to Tables 12 and 13 below). This was done in order to determine the distribution of verb forms within example sentences and, most importantly, within semantic groups of collocations (Sections 7 and 8). 
Both categories and subcategories as well as example sentences representative of each are provided on the following pages (Sections 6.1 and 6.2). 
 
6.1 Categories of verb forms 
10. Active form 


Active voice of the verb, including the final form shushikei ... and attributive form rentaikei .... 
1) 
 .................................
 

 
 Kare no iu ni wa, kami, sunawachi, wareware to wareware no mawari no ningen no sozosha wa, sono shigoto wo oeru mae ni shinda to iu no desu. 
 
 
 work [acc] to finish [tr.act.nonpst.adn] 
 
 
 According to his words, God, the creator of us and others, died before he finished his work. 
 


 
 
11. Causative form 


Causative voice (-(s)ase-) of the verb. 
2) 
 ....
 

 
 Jikannai ni shigoto wo owaraseru koto mo taisetsu na sekinin desu. 
 
 
 work [acc] to finish [tr.caus.nonpst.adn] 
 
 
 Finishing work within the time frame is just as big of a responsibility. 
 


 
 
12. Passive/potential/honorific form 


Indicates the passive form, potential form, or honorific form of the verb (-rare-).  
Out of seven cases, five verbs (including Example 3 below), are used as honorific speech. Four co-occur with nouns meaning ‘life’, such as ‘shogai..’ or ‘issho ..’, while one is used in dialogue as a direct question ‘have you completed your PhD?’ (dokuta wo owarareta no desu ka ...—..........). 
One example is ambiguous and can be interpreted as either the potential or as the honorific form - ‘you, who were able to safely complete your duty’ or ‘you, who safely completed your duty’ (buji oyakume wo oerareta anata .............). 
The last example is a direct question referring to a third party. Oeru is used in its potential form - ‘do you think the villagers will be able to finish the long journey?’ (murabitotachi wa […] nagai tabi wo oerareru to omoimasuka? ..... […] ................). 
There are no examples of either verb in its passive form. 
3) 
 ................................
 

 
 Shikashi, fuko na koto ni ninen tarazu de daichogan ni kakari, gojugosai no mijikai shogai wo oeraremashita. 
 
 
 life [acc] to end [tr.pol.pst] 
 
 
 However, unfortunately, in less than two years, he fell ill with colorectal cancer and ended his short life at the age of 55. 
 


 
 
13. Volition 


Verb form expressing volition (-(y)o). 
4) 
 ......
 

 
 Chodo sono shigoto wo oeyo to shita toki, erebeta no ugoku oto ga shita. 
 
 
 work [acc] to finish [tr.act.vol]  
 
 
 Just as I was about to finish work, I heard the sound of the elevator moving. 
 


 
 
14. Desire 


Verb form expressing desire, wish (-tai). 
5) 
 .............
 

 
 Kono niten gotoben itadaite, shitsumon wo owaritai to omoimasu. 
 
 
 question [acc] to end [tr.act.des.nonpst] 
 
 
 Once you provide an explanation of these two points, I would like to end my question. 
 


  
 
15. Gerundive/-te form 


The gerundive or -te form of the active voice. It is generally a subcategory (see 6.2), but separated in the case of active voice due to a very high frequency of appearance. 
6) 
 ......
 

 
 Shigoto wo oete ajiwau sake wa honto ni bimi. 
 
 
 work [acc] to finish [tr.act.ger] 
 
 
 Alcohol savored after work is truly delicious. 
 


  
 
6.2 Subcategories of verb forms 
Subcategories highlight additional characteristics of the verbs. The eight groups are classified as follows. 
16. Positive form 


Positive form of the verb. 
7) 
 ........................
 

 
 Ano ko wa oya no namae mo kao mo shiranai wazuka juyonen no mijikai shogai wo oeta. 
 
 
 life [acc] to end [tr.act.pst.pos] 
 
 
 That child ended his short life of ten years or so without knowing the names and faces of his parents.  
 


 
 
17. Negative form 


Negative form of the verb. 
8) 
 .........
 

 
 Gyosha san ga tonikaku shigoto wo owarasenai to kaerenai no desu. 
 
 
 work [acc] to finish [tr.caus.nonpst.neg] 
 
 
 In any case, workers cannot go home if they do not finish work. 
 


 
 
18. Non-past form 


The verb is found in non-past tense. 
9) 
 ............
 

 
 Ijo de happyo wo owarimasu. 
 
 
 presentation [acc] to end [tr.act.pol.nonpst] 
 
 
 With this I end my presentation. 
 


 
 
19. Past form 


The verb is found in past tense. 
10) 
 .......
 

 
 Kosaka wa awatete hanashi wo owaraseta. 
 
 
 story [acc] to finish [tr.caus.pst] 
 
 
 Kosaka hurriedly finished the story. 
 


 
 
 
 
20. Past context 


The context of an analyzed sentence as a whole is placed in the past. The category was added in order to compare the use of tenses between owaru and oeru. For example, the 
gerundive form of oeru (oete) does not indicate past nor non-past on its own. However, it frequently appears within sentences where the main verb is used in past tense. 
11) 
 .......
 

 
 Hodonaku, kare wa hanashi wo oete modotte kita. 
 
 
 story [acc] to finish [tr.act.ger] 
 
 
 Soon after, he finished the story and came back. 
 


 
 
21. Adnominal use 


The verb is used adnominally in the structure V-ru + N. 
12) 
 ...............
 

 
 Koritsu yoku shigoto wo owaraseru kotsu. 
 
 
 work [acc] to finish [tr.caus.nonpst.adn] 
 
 
 The secret to finishing work effectively. 
 


 
 
22. Gerundive/-te form’ 


The gerundive form of all categories with the exception of active voice. In order to differentiate it from the main category, it is marked with the apostrophe sign ’. 
13) 
 ...........
 

 
 Ijo, watashi no shoken wo majie, shitsumon wo owarasete itadakimasu. 
 
 
 question [acc] to finish [tr.caus.ger] 
 
 
 With this, I have expressed my opinion and will now finish my question. 
 


 
 
23. Conditional form 


-ba and -tara forms. 
14) 
 ........
 

 
 Shokuji wo owattara, kimi no kyakushitsu e itte hanashiao ka? 
 
 
 meal [acc] to finish [tr.act.cond] 
 
 
 After we finish the meal, shall we head to your room and talk? 
 


 
The following Tables 12 and 13 present examples of point assignment during the analysis. The first row lists the category, while the second one lists the subcategories. When the verb form coincides with one of the subcategories, one point ‘1’ is added to the chart. If the verb form does not correspond to any category, no points ‘0’ were assigned.  
Example for the sentence ‘allow me to finish my presentation’ (happyo wo owarasete itadakimasu ..............): 
 
Table 12: Point assignment 1 
Causative 
 

Positive 
 Negative 
 Non-past 
 Past 
 Past context 
 Adnominal 
 Gerundive’ 
 Conditional 
 
1 
 0 
 0 
 0 
 0 
 0 
 1 
 0 
 


 
Example for the sentence ‘after finishing work, I read a book’ (shigoto wo oeta ato hon wo yomimashita ..............): 
 
Table 13: Point assignment 2 
Active 
 

Positive 
 Negative 
 Non-past 
 Past 
 Past context 
 Adnominal 
 Gerundive’ 
 Conditional 
 
1 
 0 
 0 
 1 
 1 
 1 
 0 
 0 
 


 
Once each verb form was analyzed in line with this procedure, points were summed up and edited in the form of tables and graphs. These provide a picture of the morphological distribution in two corpora separately as well as a general picture (total) where all results are counted together. This made it possible to easily determine the frequency of morphological categories and subcategories, presented in the upcoming Section 7, and, additionally, determine their relation to the semantic fields (Section 8). 
7 Frequency of morphological verb categories and subcategories 
In this section results of the sentence, analysis are provided. For practical reasons the forms owarareru and oerareru are marked as ‘honorific’ in tables and figures, as they are the highest in frequency (refer to Section 6.1). 
 
7.1 Owaru 
7.1.1 Categories 
Over half of all examples find owaru in its active form (55.2%). Causative use is placed second, with 19.3% of appearance frequency. In this case, the high number is to be expected. As pointed out in Section 2.3, as well as during the analysis of instructional materials and the public’s general perception of owaru, the verb is consistently being presented or perceived as solely intransitive. Two smaller groups consist of the gerundive owatte (10.7%) and the form expressing desire owaritai (13.8%). Meanwhile, the volitional form owaro (0.9%) and the honorific form owarareru (0.1%) are barely present (see Table 14 and Figure 5). 
 
Table 14: Verb forms (owaru) 
 
 owaru 
 owaraseru 
 owarareru  
 owaro 
 owaritai 
 owatte 
 

active 
 causative 
 honorific 
 volition 
 desire 
 gerundive 
 
BCCWJ 
 301 
 99 
 0 
 5 
 97 
 84 
 
51.4% 
 16.9% 
 0.0% 
 0.9% 
 16.6% 
 14.3% 
 
CSJ/NUCC 
 195 
 74 
 1 
 3 
 27 
 12 
 
62.5% 
 23.7% 
 0.3% 
 1.0% 
 8.7% 
 3.8% 
 
Total 
 496 
 173 
 1 
 8 
 124 
 96 
 
55.2% 
 19.3% 
 0.1% 
 0.9% 
 13.8% 
 10.7% 
 


 
 
 
0

100

200

300

400

500

owaru

owaraseru

owarareru

owaro

owaritai

owatte

BCCWJ

CSJ/NUCC

Total


Figure 5: Verb forms (owaru) 
 
When comparing the corpora of written (BCCWJ) and spoken (CSJ/NUCC) Japanese, the most remarkable difference can be found in the causative use of owaru. The active form is the most notable in both corpora and encompasses over 60% of the forms in CSJ/NUCC. 
However, the causative use is more prevalent in the corpus of spoken Japanese (23.7%), whereas the frequency in BCCWJ reaches only 16.9%. The reason for this difference could be assigned to one specific collocation, ‘presentation, speech’ happyo ... It is ranked first in frequency and frequently co-occurs with the phrase ‘allow to finish’ owarasete itadaku .......... This structure consists of the causative morpheme -(s)ase- and the verb ‘to receive’ in its humble form itadaku. It can be used when the speaker is granted permission from the listener for a specific action, or as a phrase when there is no actual need for permission and the speaker simply wants to express politeness or humbleness when talking about a planned action (Shigemori Bučar, 2008, p. 76-77). 
BCCWJ on the contrary exhibits a higher percentage of the form owaritai, which most frequently co-occurs with the collocation ‘question’ shitsumon .. in the phrase ‘I think I want to finish’ owaritai to omoimasu ........... 
 
7.1.2 Subcategories 
The verb owaru is found almost entirely in its positive form (99.2%) and non-past tense (86.5%). Even considering the context in its entirety, the past tense of the main verb can only be observed in 50 cases (Table 15 and Figure 6). 
Gerundive forms are visible especially in the category of causative use (owarasete), which is due to the frequently used phrase owarasete itadaku. In this research, this structure most often co-occurs with collocations in the semantic field of ‘speech’, such as ‘question’ shitsumon .., ‘presentation’ happyo .. or ‘story’ hanashi .. As this semantic group is the most prominent for owaru, the high percentage of causative use and gerundive forms are not unexpected. The latter is particularly present in CSJ/NUCC. There are no other significant differences between the two corpora. 
 
Table 15: Subcategories of verb forms (owaru) 
 
 Positive 
 Negative 
 Non
 Past 
 Past context 
 Adnominal 
 Gerundive’ 
 Conditional 
 

BCCWJ 
 581 
 5 
 482 
 20 
 37 
 35 
 73 
 2 
 
99.1% 
 0.9% 
 82.3% 
 3.4% 
 6.3% 
 6.0% 
 12.5% 
 0.3% 
 
CSJ/NUCC 
 310 
 2 
 295 
 5 
 13 
 12 
 65 
 2 
 
99.4% 
 0.6% 
 94.6% 
 1.6% 
 4.2% 
 3.8% 
 20.8% 
 0.6% 
 
Total 
 891 
 7 
 777 
 25 
 50 
 47 
 138 
 4 
 
99.2% 
 0.8% 
 86.5% 
 2.8% 
 5.6% 
 5.2% 
 15.4% 
 0.4% 
 


 
 
0

100

200

300

400

500

600

700

800

900

positive

negative

non-past

past

past context

adnominal

gerundive'

conditional

BCCWJ

CSJ/NUCC

Total


Figure 6: Subcategories of verb forms (owaru) 
 
7.2 Oeru 
7.2.1 Categories 
In contrast with the verb owaru, the distribution of categories is significantly different for oeru. Only two major groups can be identified; the active form taking up almost 60% of all examples and the gerundive form amounting to almost 40%. If counted together, the two groups make up for 95.8% of all analyzed verb forms. 
No other form, including the causative use, stands out (Table 16 and Figure 7). 
 
Table 16: Categories of verb forms (oeru) 
 
 oeru 
 oesaseru 
 oerareru 
 oeyo 
 oetai 
 oete 
 

active 
 causative 
 honorific 
 volition 
 desire 
 gerundive 
 
BCCWJ 
 369 
 0 
 6 
 7 
 10 
 231 
 
59.2% 
 0.0% 
 1.0% 
 1.1% 
 1.6% 
 37.1% 
 
CSJ/NUCC 
 34 
 3 
 0 
 1 
 2 
 26 
 
51.5% 
 4,5% 
 0.0% 
 1.5% 
 3.0% 
 39.4% 
 
Total 
 403 
 3 
 6 
 8 
 12 
 257 
 
58.5% 
 0.4% 
 0.9% 
 1.2% 
 1.7% 
 37.3% 
 


 
 
0

50

100

150

200

250

300

350

400

450

oeru

oesaseru

oerareru

oeyo

oetai

oete

BCCWJ

CSJ/NUCC

Total


Figure 7: Categories of verb forms (oeru) 
 
As explained in Section 7.1, causative use is primarily found co-occurring with collocations in the semantic field of ‘speech’. However, such collocations are not as frequent when discussing oeru. The lack of this category could be an explanation for the low percentage of causative forms when compared to owaru. 
There are no significant differences between the corpora of written and spoken Japanese. However, two minor discrepancies can be mentioned; all causative forms are observed in CSJ/NUCC, while all honorific forms are located in BCCWJ. 
 
7.2.2 Subcategories 
It is worth noting one major discrepancy can be observed when comparing the results with those of the verb owaru. 
While the absence of negative forms is characteristic of both verbs, the use of the tense is significantly different. Oeru is found in past tense in the majority of cases (past tense 271 cases, non-past tense 150 cases), and past context prevails as well (395 cases out of 689). Adnominal use is also often found in past forms. 
The gerundive’ group has an extremely low frequency (0.4%), which can however be explained with the gerundive of the active form being a separate category. (Table 17 and Figure 8). 
The distribution of forms (active and gerundive forms covering over 95% of all verb forms) and the major use of both past tense and past context suggest that oeru is a verb that tends to express ‘completion’. Past tense and past context on their own define an action that has already been finished. Furthermore, the gerundive also 
implies a sequence of two or more actions, in which the first one has to be completed before the next one begins. 
Other than a higher percentage of the ‘past context’ category observed in BCCWJ, no major differences between the corpora of written and spoken Japanese are present. 
 
Table 17: Subcategories of verb forms (oeru) 
 
 Positive 
 Negative 
 Non
 Past 
 Past context 
 Adnominal 
 Gerundive’ 
 Conditional 
 

BCCWJ 
 623 
 0 
 139 
 246 
 368 
 178 
 0 
 8 
 
100.0% 
 0.0% 
 22.3% 
 39.5% 
 59.1% 
 28.6% 
 0.0% 
 1.3% 
 
CSJ/NUCC 
 66 
 0 
 11 
 25 
 27 
 21 
 3 
 3 
 
100.0% 
 0.0% 
 16.7% 
 37.9% 
 40.9% 
 31.8% 
 4.5% 
 4.5% 
 
Total 
 689 
 0 
 150 
 271 
 395 
 199 
 3 
 11 
 
100.0% 
 0.0% 
 21.8% 
 39.3% 
 57.3% 
 28.9% 
 0.4% 
 1.6% 
 


 
 
 
0

100

200

300

400

500

600

700

positive

negative

non-past

past

past context

adnominal

gerundive'

conditional

BCCWJ

CSJ/NUCC

Total


Figure 8: Subcategories of verb forms (oeru) 
 
8 Collocations in relation to verb forms 
This section provides the analysis of relations between semantic fields of collocations and morphological categories of verb forms. Only collocations of higher frequency co-occurring with analyzed verb forms are counted (refer to Section 4). 
The tables in this section list all semantic fields of collocations and their frequency of appearance, as well as the morphological categories of each verb. 
Full lists of collocations belonging to separate semantic fields can be found in Sections 5.1 and 5.2. Morphological categories are described in detail in Section 6. 
Each table corresponds to the graph located beneath it. Sections on the graphs represent the distribution of verbal forms within a specific semantic group of collocations. 
 
8.1 Owaru 
The most prominent semantic group of collocations for the verb owaru is that of ‘speech’ (refer to Section 5). Numbers in Table 18 show that collocations in this field most frequently (62.9%) co-occur with active forms of the verb. Additionally, in the same semantic field, there are two other emerging morphological categories; the causative form owaraseru (19.3%) and the form expressing desire owaritai (17.2%). 
Collocations classified into the semantic field of ‘work’ show a high co-occurrence of 85.2% with the gerundive form owatte, as well as the active form (12.5%). The remaining morphological categories fluctuate between 0–1.1%. 
Similarly, the semantic groups of ‘life’, ‘time’, and ‘relationship’, although lesser in frequency, also show that the most prominent morphological categories are those of active, causative, and gerundive forms. 
Other semantic groups of collocations are not common and show no particular relations in regards to any morphological category of the verbs. 
 
Table 18: Collocations in relation to verb forms (owaru) 
Semantic field 
 Frequency 
 owaru 
 owaraseru 
 owarareru  
 owaro 
 owaritai 
 owatte 
 
active 
 causative 
 honorific 
 volition 
 desire 
 gerundive 
 

Time 
 30 
 21 
 1 
 0 
 1 
 2 
 6 
 
70.0% 
 3.3% 
 0.0% 
 3.3% 
 6.7% 
 20.0% 
 
Person 
 2 
 0 
 2 
 0 
 0 
 0 
 0 
 
0.0% 
 100.0% 
 0.0% 
 0.0% 
 0.0% 
 0.0% 
 
Work 
 88 
 11 
 1 
 1 
 0 
 0 
 75 
 
12.5% 
 1.1% 
 1.1% 
 0.0% 
 0.0% 
 85.2% 
 


Semantic field 
 Frequency 
 owaru 
 owaraseru 
 owarareru  
 owaro 
 owaritai 
 owatte 
 
active 
 causative 
 honorific 
 volition 
 desire 
 gerundive 
 

Speech 
 699 
 440 
 135 
 0 
 3 
 120 
 2 
 
62.9% 
 19.3% 
 0.0% 
 0.4% 
 17.2% 
 0.3% 
 
Mental process 
 7 
 3 
 1 
 0 
 1 
 0 
 2 
 
42.9% 
 14.3% 
 0.0% 
 14.3% 
 0.0% 
 28.6% 
 
Relationship 
 14 
 2 
 12 
 0 
 0 
 1 
 0 
 
14.3% 
 85.7% 
 0.0% 
 0.0% 
 7.1% 
 0.0% 
 
Organization 
 1 
 1 
 0 
 0 
 0 
 0 
 0 
 
100.0% 
 0.0% 
 0.0% 
 0.0% 
 0.0% 
 0.0% 
 
Life 
 51 
 18 
 21 
 0 
 3 
 1 
 11 
 
35.3% 
 41.2% 
 0.0% 
 5.9% 
 2.0% 
 21.6% 
 


 
 
 
0

50

100

150

200

250

300

350

400

450

time

person

work

speech

mentalprocess

relationship

organization

life

owaru

owaraseru

owarareru

owaro

owaritai

owatte


Figure 9: Collocations in relation to verb forms (owaru) 
 
8.2 Oeru 
The results of collocations in relation to verbal forms of oeru are consistent with the semantical and morphological analysis (Sections 5 and 7). 
In all semantic fields, the most noticeable morphological are of the active voice oeru and the gerundive form oete (see Table 19 and Figure 10).  
Only one exception can be observed. In line with the results of owaru, the verb oeru also displays a higher percentage of causative oesaseru, as well as the form expressing desire oetai, in the semantic field of ‘speech’. Causative is only observable 
in this semantic field. Each morphological category constitutes 7.9% of analyzed verb forms, whereas in owaru causative is prevalent. 
This pattern of distribution can be anticipated, as the analysis of the morphological categories (Section 7) reveals that oeru is largely observed in either its active or its gerundive form. 
The remaining categories (causative, honorific, volition and desire) are, except for the abovementioned semantic group of ‘speech’, very low in percentage or not observed in several cases. 
 
Table 19: Collocations in relation to verb forms (oeru) 
Semantic field 
 Frequency 
 oeru 
 oesaseru 
 oerareru 
 oeyo 
 oetai 
 oete 
 

active 
 causative 
 honorific 
 volition 
 desire 
 gerundive 
 
Time 
 126 
 101 
 0 
 4 
 2 
 5 
 14 
 
80.2% 
 0.0% 
 3.2% 
 1.6% 
 4.0% 
 11.1% 
 
Work 
 120 
 88 
 0 
 1 
 3 
 2 
 26 
 
73.3% 
 0.0% 
 0.8% 
 2.5% 
 1.7% 
 21.7% 
 
Speech 
 38 
 18 
 3 
 0 
 0 
 3 
 14 
 
47.4% 
 7.9% 
 0.0% 
 0.0% 
 7.9% 
 36.8% 
 
Mental process 
 6 
 4 
 0 
 0 
 0 
 0 
 2 
 
66.7% 
 0.0% 
 0.0% 
 0.0% 
 0.0% 
 33.3% 
 
Relationship 
 2 
 2 
 0 
 0 
 0 
 0 
 0 
 
100.0% 
 0.0% 
 0.0% 
 0.0% 
 0.0% 
 0.0% 
 
Life 
 397 
 190 
 0 
 1 
 3 
 2 
 201 
 
47.9% 
 0.0% 
 0.3% 
 0.8% 
 0.5% 
 50.6% 
 


 
 
 
0

20

40

60

80

100

120

140

160

180

200

time

work

speech

mental process

relationship

life

oeru

oesaseru

oerareru

oeyo

oetai

oete


Figure 10: Collocations in relation to verb forms (oeru) 
9 Conclusion 
The evidence from this research suggests that differences between the use of transitive verbs owaru and oeru exist. The semantical analysis of collocations in combination with the morphological analysis of co-occurring verbs brings forth interesting results. It is however necessary to point out that major issues remain to be solved, despite numerous studies and discussions on verb transitivity. 
Section 2 elaborates on verb transitivity with an emphasis on owaru and oeru as a pair. Although it is correct to refer to owaru [intr.] as a verb forming two pairs, one with its transitive counterpart owaru [tr.] and one with oeru [tr.], the analysis presented in Section 3 highlights several inconsistencies in both representation and perception of the verbs. 
 Firstly, some differences can be discerned in dictionary definitions of the verbs (Section 3.1), particularly regarding the transitivity of oeru. Owaru is listed as an ergative verb in all cases. These results stand out especially when the diachronic change in verb use is taken into consideration. As can be gathered from the comparison of sample sentences from CHJ and BCCWJ, it is evident that a significate shift in the use of owaru has occurred, despite the consistent dictionary definitions. In modern Japanese, owaru tends to lean towards its intransitive use, while oeru overall appears in a small number of cases (see 3.2). 
Furthermore, a similar pattern can be observed in the representation of the verbs within instructional materials, as owaru is most frequently used intransitively. Oeru is rarely seen at all (see 3.3). This also leads to a non-uniform perception of the verbs, as seen in examples gathered from two websites, where users express their doubts regarding owaru and question the correctness of its transitive use (see 3.4). 
With these points of concern in mind, the next part of this research deals with the analysis of example sentences gathered from three corpora; BCCWJ, CSJ, and NUCC (Sections 4-8). 
Section 5 examines collocations of owaru and oeru and categorizes them into limited semantic fields in order to spot similarities and differences between the verbs. The following Section 6 explains the morphological categories, used to later analyze verb forms in Section 7. Lastly, Section 8 points out structural forms of the verbs in relation to collocations. These analyses bring forth some noteworthy results. 
Firstly, owaru has been observed to most often co-occur with collocations classified into the semantic field of ‘speech’, which is also the largest semantic group found within collocations of owaru. Results also show that causative forms of the verb ‘owaraseru’ are very common in this semantic field. The high frequency of causative forms, even more so in the corpora of spoken Japanese CSJ/NUCC, is not unexpected. This is partly due to the collocations being classified into the aforementioned semantic 
field of ‘speech’, which demonstrates a tendency of co-occurring with the frequently used phrase ‘owarasete itadaku’ (i.e., happyo wo owarasete itadakimasu). A higher number of verb forms expressing desire ‘owaritai’ has also been observed (i.e., happyo wo owaritai to omoimasu), although not as often as the causative. In some cases, gerundive forms have a higher frequency of appearance, for example in the semantic field of ‘work’. Other semantic fields show no particular patterns. When comparing written and spoken Japanese, causative is found in even higher percentages in the latter corpus (CSJ/NUCC). 
On the other hand, oeru illustrates a different picture. While the semantic fields overlap with owaru, their distribution differs. Most prominent are the semantic fields of ‘life’, ‘work’ and ‘time’, whereas the largest group within collocations of owaru, ‘speech’, amounts to only 38 examples for oeru. Similarly, causative is also found in much smaller numbers and is only present in the corpus of spoken Japanese, among verbs co-occurring with collocations semantically classified into the group of ‘speech’. Regarding morphological categories of the verb forms, the active oeru and gerundive oete combined cover over 95% of all forms. That makes the distribution within semantic fields quite uniform and generally split between the two categories. 
Another significant result is the correlation of oeru and the past tense. It is often observed in either past form, or within sentences set in the past. This characteristic, along with the high frequency of gerundive forms, which indicate a sequence of actions (one has to end before the other begins), signifies that oeru correlates to the meaning of completion. 
Interestingly, both verbs appear almost exclusively in positive forms. 
To sum up, it can be concluded that the distribution of semantic fields of collocations for each verb varies heavily. The semantic field of ‘speech’ covers most of the collocations for the verb owaru, whereas oeru mostly co-occurs with collocations relating to ‘life’, ‘work’, and ‘time’. 
Additionally, it has been observed that compared to owaru, oeru strongly gravitates towards active and gerundive forms as well as past tense, and displays a nuance of ‘completion’. 
However, it is imperative to admit that the discussion regarding the transitivity of owaru in oeru is still insufficient and in need of further research. To facilitate the understanding and correct the perceiving of verbs, it is necessary to focus on rare cases of ergative verbs during the educational process. This can be done with the help of dictionary definitions, practical examples, and the use of corpora, where special attention is given to owaru as a verb with two transitive pairs; the transitive owaru and oeru. Instructional materials should also provide detailed information, covering all aspects of the two verbs. 
As future research, I propose a questionnaire that focuses on the students’ perception of the two verbs. As noted for different languages, this is a topical issue at different levels of L2 acquisition (Pavlovič, 2020; Ito, 2021, etc.). Comparing the results with this paper could potentially be the next step towards a better understanding of intransitive and transitive verbs and their relations. 
Abbreviations 
[acc] 
 accusative 
 

[adn] 
 adnominal use 
 
[caus] 
 causative 
 
[cond] 
 conditional 
 
[des] 
 desire/wish 
 
[ger] 
 gerundive 
 
[inf] 
 infinitive 
 
[intr] 
 intransitive 
 
[neg] 
 negation 
 
[nom] 
 nominative 
 
[nonpst] 
 non-past tense 
 
[pst] 
 past tense 
 
[pol] 
 polite 
 
[pos] 
 positive 
 
[poten] 
 potential 
 
[pres] 
 present tense 
 
[top] 
 topic 
 
[tr] 
 transitive 
 
[vol] 
 volition 
 


 
References 
Books 
Adachi, A. ...., Kurosaki, N. ...., & Nakayama, Y. ..... (2013). Kanji goi ga yowai anata e ..·.......... Tokyo: Bonjinsha. 
Amano, M. ....., Oshima, M. ...., Sugimoto, T. ..., Numata, Y. ...., Masuoka, T. ...., & Yazawa, M. ..... (2013). Wakubukku nihon bunpo .—......... Tokyo: Ofu. 
Bekeš, A. (2005). Pregled slovnice japonskega jezika [Lecture notes]. Ljubljana: Faculty of Arts, Department of Asian and African studies. 
Hmeljak Sangawa, K., Kobayashi, R., Kumagai, Y., Shigemori Bučar, C., Maeno, Y., & Shukuri, Y. (2007). Uvod v japonsko pisavo: hiragana, katakana in prvih 854 pismenk. Ljubljana: Faculty of Arts, Department of Asian and African studies. 
Hmeljak Sangawa, K., Ichimiya, Y., Ida, N., Kawashima, T., Koga, M., Moritoki Škof, N., & Ryu, H. (2012). Japonščina za začetnike 1. Ljubljana: Ljubljana University Press, Faculty of Arts. 
Hmeljak Sangawa, K., Ichimiya, Y., Ida, N., Kawashima, T., Koga, M., Moritoki Škof, N., & Ryu, H. (2012). Japonščina Za Začetnike 2. Ljubljana: Ljubljana University Press, Faculty of Arts. 
Hmeljak Sangawa, K., Ichimiya, Y., Ida, N., Kawashima, T., Koga, M., Moritoki Škof, N., & Ryu, H. (2016). Japonščina Za Začetnike 2. Ljubljana: Ljubljana University Press, Faculty of Arts. 
Ito, H. (2021). From Native-speaker Likeness to Self-representation in Language: Views from the Acquisition of Japanese Transitive and Intransitive Verbs. Acta Linguistica Asiatica, 11(1), 25-36. https://doi.org/10.4312/ala.11.1.25-36 
Izuhara, S. (2010). Q&A-7 'owaru' wa tadoshi? Q&A-7 .......... Nihongo kyoŻshi Izuhara ShoŻji webusaito ..... .......... Retrieved from http://blog.livedoor.jp/s_izuha/archives/1924977.html 
Kokuritsu kokugo kenkyujo ........ (2003). Bunrui goihyo: zoho kaitei-ban .....: ..... (Word list by semantic principles, revised and enlarged edition). http://doi.org/10.15084/00002282 
Makino, A., et al. (Eds.). (1998). Minna no nihongo shokyu 2 honsatsu .......:.. II:... Tokyo: Surie nettowaŻku. 
Makino, A., et al. (Eds.). (2013). Minna no nihongo shokyu 2 honsatsu .......:.. II:... Tokyo: Surie nettowaŻku. 
Morita, Y. ..... (1994). Doshi no imironteki bunpo kenkyu ............ Tokyo: Meiji Shoin. 
Numata, Y. ..... (1989). Nihongo doŻshi jita no imiteki taioŻ (1): tagigo ni okeru taioŻ no ketsuraku kara ..... .·.......(1):.............. (Semantic Correspondence between Transitive and lntransitive Verbs in Japanese (1): Correspondence Gaps in the Case of Polysemy). Kenkyu hokoku-shu ....., 10, 193-215. http://doi.org/10.15084/00001122 
Oka, M., Tsutsui, M., Kondo, J., Emori, S., Hanai, Y., & Ishikawa, S. (2009). Jokyu e no tobira: kontentsu to maruchimedia de manabu nihongo .......:................... (Tobira: Gateway to Advanced Japanese Learning through Content and Multimedia: Textbook). Tokyo: Kuroshio shuppan. 
Okutsu, K. ...... (1967). JidoŻka tadoŻka oyobi ryoŻkyokuka tenkei – jitadoŻshi no taioŻ [...·...........--.·...... Kokugogaku ..., 70, 46–66. 
Pavlovič, M. (2020). Grammar Errors by Slovenian Learners of Japanese: Corpus Analysis of Writings on Beginner and Intermediate Levels. Acta Linguistica Asiatica, 10(1), 87-104. https://doi.org/10.4312/ala.10.1.87-104 
Shigemori Bučar, C. (2008). Causative and Politeness. Asian and African Studies, 12(3), 71–82. 
Srdanović, I. (2016). Kolokacije in kolokacije na daljavo v japonskem jeziku: korpusni pristop. Ljubljana: Scientific Research Institute of the Faculty of Arts. 
Tanimori, M, & Sato, E. (2012). Essential Japanese Grammar a Comprehensive Guide to Contemporary Usage. Tokyo: Tuttle. 
Dictionaries 
Hmeljak Sangawa, K. (2006). .... In Japonsko-slovenski slovar JaSlo. Retrieved from http://nl.ijs.si/jaslo/cgi/jaslo.pl. 
Hmeljak Sangawa, K. (2006). .... In Japonsko-slovenski slovar JaSlo. Retrieved from http://nl.ijs.si/jaslo/cgi/jaslo.pl. 
KondoŻ, I. ....., & Takano, F. .... (Eds.). (1993). Oeru. In Shogakukan Progressive Japanese-English Dictionary. Tokyo: ShoŻgakukan. 
KondoŻ, I. ....., & Takano, F. .... (Eds.). (1993). Owaru. In Shogakukan Progressive Japanese-English Dictionary. Tokyo: ShoŻgakukan. 
Nishio, M. ..., Iwabuchi, E. ....., & Mizutani, S. .... (Eds.). (1955). .... In Kokugo jiten ..... Tokyo: Iwanami shoten. 
Nishio, M. ..., Iwabuchi, E. ....., & Mizutani, S. .... (Eds.). (1955). .... In Kokugo jiten ..... Tokyo: Iwanami shoten. 
Masuda, Ko. ... (Ed.). (1998). Oeru. In Kenkyusha's New Japanese-English Dictionary. Tokyo: Kenkyusha 
Masuda, Ko. ... (Ed.). (1998). Owaru. In Kenkyusha's New Japanese-English Dictionary. Tokyo: Kenkyusha 
Matsumura, A. ... (Ed.). (2006). .... In Daijirin .... Tokyo: SanseidoŻ. Retrieved from https://kotobank.jp/. 
Matsumura, A. ... (Ed.). (2006). .... In Daijirin .... Tokyo: SanseidoŻ. Retrieved from https://kotobank.jp/. 
Shinmura, I. ... (Ed.). (1955). .... In Kojien .... Tokyo: Iwanami shoten. 
Shinmura, I. ... (Ed.). (1955). .... In Kojien .... Tokyo: Iwanami shoten. 
Yamada, T. ...., Shibata, T. ..., Sakai, K. ...., Kuramochi, Y. ...., Yamada, A. ...., Uwano, Z. ...., Ijima, M. ...., & Sasahara, H. .... (Eds.). (2012). .... In Shin meikai kokugo jiten ........ Tokyo: SanseidoŻ. 
Yamada, T. ...., Shibata, T. ..., Sakai, K. ...., Kuramochi, Y. ...., Yamada, A. ...., Uwano, Z. ...., Ijima, M. ...., & Sasahara, H. .... (Eds.). (2012). .... In Shin meikai kokugo jiten ........ Tokyo: SanseidoŻ. 
 
Corpora 
Fujimura, I., Chiba, S., & Oso, M. (2012). Lexical and grammatical features of spoken and written Japanese in contrast: Exploring a lexical profiling approach to comparing spoken and written corpora. Proceedings of the VIIth GSCP International Conference: Speech and Corpora, 393–98. 
Kokuritsu kokugo kenkyujo ........ (2019). Gendai nihongo kakikotoba kinko kopasu ............—.. (Balanced Corpus of Contemporary Written Japanese - BCCWJ). Available online at https://chunagon.ninjal.ac.jp/. 
Kokuritsu kokugo kenkyujo ........ (2019). Nihongo hanashikotoba kopasu ........—.. (Corpus of Spontaneous Japanese - CSJ). Available online at https://chunagon.ninjal.ac.jp/. 
Kokuritsu kokugo kenkyujo ........ (2019). Nihongo rekishi kopasu ......—.. (Corpus of Historical Japanese - CHJ). Available online at https://chunagon.ninjal.ac.jp/. 
 
CONTACT-INDUCED VARIATION IN TETUN DILI PHONOLOGY 
Andrei A. AVRAM 
University of Bucharest, Romania 
andrei.avram@lls.unibuc.ro 
Abstract 
The paper analyzes the Portuguese influence on Tetun Dili phonology, which can be can be identified at different levels. The phonemic inventory of Tetun Dili has been enriched via borrowing of several consonantal phonemes, triggering an increase in the number of phonological contrasts. Portuguese influence also accounts for the phonetic realizations of a number of consonantal and vocalic phonemes, with some allophonic rules extended even to words belonging to the native stock. Furthermore, the massive influx of Portuguese loanwords has greatly increased the number of permissible onset clusters, and lexical borrowings from Portuguese have led to the occurrence of antepenultimate stress. Finally, Portuguese influence also accounts for the considerable inter-speaker variation. These contact-induced phenomena are shown to correlate with the following factors: knowledge of Portuguese; the exo-normative vs endo-normative orientation of speakers in the case of Portuguese, i.e. towards European or Brazilian Portuguese vs. the East Timorese variety of Portuguese. 
Keywords: Tetun Dili; Portuguese; variation; phonological restructuring 
Povzetek 
Prispevek analizira portugalski vpliv na fonologijo jezika tetun dili, ki ga je mogoče prepoznati na različnih ravneh. Popis fonemov tega jezika je obogaten z izposojo več soglasniških fonemov, kar je sprožilo povečanje števila fonoloških kontrastov. Portugalski vpliv so tudi fonetične realizacije številnih soglasniških in zlogotvornih fonemov, katerih alofonska pravila so razširjena tudi na domače besede. Poleg tega je močan pritok portugalskih izposojenk močno povečal število dovoljenih soglasniških nizov v zaprtem zlogu in spodbudil naglaševanje na predzadnjem zlogu. Nenazadnje pa portugalski vpliv predstavlja precejšnje razlike med govorci. Pokazalo se je, da omenjeni pojavi posledično vplivajo na znanje portugalščine ter na ekso-normativno oziroma endo-normativno usmerjenost govorcev v primeru portugalščine, tj. uporabo evropske ali brazilske portugalščine oziroma uporabo vzhodno-timorske različice portugalščine.  
Ključne besede: tetun dili; portugalščina; sprememba; fonološko prestrukturiranje 
1 Introduction 
Previous work on the Portuguese influence on Tetun Dili (e.g. Hajek, 2007; Williams-van Klinken & Hajek, 2016; Avram, 2018; Williams-van Klinken & Hajek, 2018) was mainly concerned with the morphology, syntax, and lexicon. The present paper analyzes the Portuguese impact on Tetun Dili phonology. 
The sources consulted for Portuguese are: (i) European Portuguese – Mateus & D’Andrade (2000); Massini-Cagliari et al. (2016); (ii) Brazilian Portuguese – Seara et al. (2011); Massini-Cagliari et al. (2016); (iii) Timor-Leste Portuguese – de Albuquerque (2010b, 2011b, 2011c, 2012, 2014a, 2014b, 2015). 
The corpus of Tetun Dili examples is from: grammars (Williams-van Klinken et al., 2002a, 2002b); dictionaries (Costa, 2001a; Hull, 2002; Loch & Tschanz, 2005; Hull, 2006; Manhitu, 2007); phrasebooks (Costa, 2001b; Saunders, 2004; Hajek & Tilman, 2008); theses and dissertations (de Araújo e Corte-Real, 1990; de Albuquerque, 2011a; Greksáková, 2018); papers (Hull, 2000; Esperança, 2001; Chen, 2015). The examples are kept at a reasonable minimum. The citation forms of Tetun Dili examples are given in the current standard orthography and their transcription is in IPA.  
The paper is organized as follows. Section 2 outlines the language situation in East Timor, with a focus on Tetun Dili and Portuguese. Section 3 is a brief overview of the phonologies of Tetun Terik and Tetun Dili. Section 4 is concerned with the imported consonantal phonemes. Section 5 looks at the new phonological contrasts. Section 6 deals with the phonetic realizations of the imported phonemes. Section 7 discusses a number of selected allophonic rules. Section 8 analyzes developments in syllable structure. Section 9 focuses on stress placement. Section 10 summarizes the findings. 
2 Language situation in East Timor: Tetun Dili and Portuguese 
Tetun Dili1 is spoken in East Timor2. Tetun Dili is one of the two official languages of East Timor, alongside Portuguese. Williams-van Klinken et al., 2002b: 5) write that Tetun Dili is spoken by “some 60–70% of the population of East Timor”, while de Albuquerque (2010a, p. 30) states that it is “falado por mais de 80% da populaçao” [= spoken by more than 80% of the population, translation mine]. It is estimated that some 36% are first-language speakers and some that 60% speak Tetun Dili as a second language (Williams-van Klinken et al., 2002b, p. 5).  
1 Also known as Tetum-Praça/Tetun-Prasa. 
2 The official name of the country is Timor Loro Sa’e in Tetun and Timor-Leste in Portuguese, respectively. 
The status of Tetun Dili is a matter of some dispute in the literature (see also Avram 2005a), with various authors employing different labels: “pidgin” Smith (1995, p. 360); 
“creole” (Ross, 2017); “an Austronesian language” (Williams-van Klinken et al., 2002a, 2002b); “an Austronesian language with many Portuguese loans” (Chen, 2015, p. 29); “a koiné with heavy Portuguese lexical influence” (Greksáková, 2018, p. 82). 
As for Portuguese, it is spoken only by 36% of East Timor’s population. In addition to the difference in the number of speakers, there is a clear asymmetrical power relationship between Tetun Dili and Portuguese (Taylor-Leech, 2009; de Albuquerque, 2010a, 2018; Greksáková, 2018). Although official language policies favour the promotion and development of Tetun Dili as a nation-building instrument, knowledge and use of Portuguese still carries considerable prestige (Taylor-Leech, 2007, 2009; Ross, 2017). From a sociolinguistic perspective, there is a continuum of Portuguese varieties. According to de Albuquerque (2011b, p. 70) this can be represented as follows: 
 
European Portuguese norm 
 popular Portuguese 
 


Figure 1: Portuguese continuum in Timor-Leste (de Albuquerque 2011b, p. 70) 
 
In the above representation, “popular Portuguese” is, to quote Thomaz (2010, p. 39), “das Portugiesische von Timor, das von Personen mit geringer Bildung gesprochen wird” [= the Portuguese of Timor, which is spoken by persons with little education, translation mine]. De Albuquerque (2011b, p. 75, n. 5) defines it as “a subvariedade do PTL [= portugues de Timor-Leste] que o falante aprendeu de maneira nao formal e […] sofre maior influencia da língua materna do falante, ou seja, mais distante da norma europeia” [= a subvariety of Portuguese which the speaker acquired in a non formal manner and which […] undergoes more influence from the speaker’s mother tongue, namely more distant from the European norm, translation mine]. In fact, the picture is more complex. Neither Thomaz (2010) nor de Albuquerque (2011b) mention Brazilian Portuguese. However, as noted by Hajek & Tilman (2008, p. 181), “you’ll hear at least three different Portuguese accents in East Timor […] Portuguese as spoken by most Timorese” […] European Portuguese [and] Brazilian Portuguese”. The latter variety is a relatively new addition to the Portuguese continuum in East Timor, a consequence of post-independence developments, given that Brazil runs a wide range of support programs in the country, including for the teaching of Portuguese. 
3 Phonology of Tetun and Tetun Dili 
Historically, Tetun Dili developed out of Tetun Teturik, spoken in the south of the island of Timor as well as in the southwest, i.e. the area of the East Timor – West Timor border3. Tetun Terik and Tetun Dili differ in a number of respects in their phonology, morphology, syntax, and lexicon, as shown by e.g. Williams-van Klinken et al. (2002b, pp. 53-56). In what follows, the focus is on the differences between these two major varieties of Tetun in their inventories of phonemes. 
3 Which is part of the Republic of Indonesia. 
There is consensus among authors such as das Dores (1907), Troeboes et al. (1987, p. 14-28), Taryono et al. (1993, p. 25-34), van Klinken (1999), Hull (2000, p. 167, p. 189), Costa (2001a, p.: 23-22), Esperança (2001, p. 50-60), Thomaz (2002, p. 52) with respect to the inventory of consonant and vocalic phonemes of Tetun Terik. As shown in Table 1, the system of vocalic phonemes of Tetun Terik is relatively simple, consisting of /i/, /u/, /e/, /./ and /a/: 
 
Table 1: Tetun Terik: Vocalic phonemes 
 
 front 
 back
 

high 
 i 
 u 
 
high mid 
 e 
  
 
low mid 
  
 . 
 
low 
 a 
  
 


 
 
Tetun Terik also has a relatively small number of consonantal phonemes. The system of consonant phonemes is set out in Table 2 below: 
 
Table 2: Tetun Terik: Consonantal phonemes 
 
 bilabial 
 alveolar 
 velar 
 glottal 
 

stops 
  
 t 
 k 
 . 
 
 
 b 
 d 
  
  
 
fricatives 
  
 s 
  
 h 
 
nasals 
 m 
 n 
  
  
 
tap 
  
 . 
  
  
 
approximants 
 w 
 l 
  
  
 


 
 
The inventory of consonantal phonemes of Tetun Dili is discussed by Costa (2001a, p. 22), Hull (2000, p. 167), Williams-van Klinken et al. (2002a, p. 11, 2002b, p. 12), de 
Albuquerque (2011a, p. 77, p. 83), a.o. Several authors posit a so-called “minimal inventory” (Hull, 2000) or “Umgangslautung” [= colloquial pronunciation, translation mine]4 (Saunders, 2004). According to Hull (2000, p. 189) this consists of “11 consonantal phonemes, given the loss of /’/ and the assimilation of /w/ to /b/” (Hull, 2000, p. 189), as shown in Table 3: 
4 Used “von der großen Mehrheit der Bevölkerung” [= by the large majority of the population, translation mine] (Saunders, 2004, p. 16). 
5 Saunders (2004, p. 16) writes that “die Hochlautung wird vorwiegend von gebildeten Bewohnern der Hauptstadt Dili verwendet” [= the standard pronunciation is mainly used by educated inhabitants of the capital Dili, translation mine].  
 
Table 3: Tetun Dili: Consonantal phonemes (minimal inventory) 
 
 bilabial 
 alveolar 
 velar 
 glottal 
 

stops 
 p 
 t 
 k 
  
 
 
 b 
 d 
  
  
 
fricatives 
  
 s 
  
 h 
 
nasals 
 m 
 n 
  
  
 
tap 
  
 . 
  
  
 
approximants 
  
 l 
  
  
 


 
 
The so-called “maximal inventory” (Hull, 2000) or “Hochlautung” [= standard pronunciation, translation mine]5 (Saunders, 2004) contains 22 consonantal phonemes (Hull, 2002, p. 189; Williams-van Klinken et al., 2002a, p. 8, 2002b, p. 10): 
 
Table 4: Tetun Dili: Consonantal phonemes (maximal inventory) 
 
 bilabial 
 labio
 alveolar 
 alveo
 palatal 
 velar 
 glottal 
 

stops 
 p 
  
 t 
  
  
 k 
  
 
 
 b 
  
 d 
  
  
 g 
  
 
fricatives 
  
 f 
 s 
 . 
  
  
 h 
 
 
  
 v 
 z 
 . 
  
  
  
 
nasals 
 m 
  
 n 
  
 . 
  
  
 
tap 
  
  
 . 
  
  
  
  
 
trill 
  
  
 r 
  
  
  
  
 
liquid 
  
  
 l 
  
 . 
  
  
 
glide 
 w 
  
  
  
 j 
  
  
 


 
 
The maximal inventory in Table 4 is the consequence of the fact that Tetun Dili has borrowed a number of consonantal phonemes, the topic of the next section. 
4 Imported consonantal phonemes 
Tetun Dili has increased the number of consonantal phonemes, in comparison with Tetun Terik of which it is historically an offshoot. The additional consonantal phonemes are from Malay and, in particular, Portuguese. 
Costa (2001a, p. 24) writes that “foram, ainda, introduzidas no tétum […] consoantes, em especial devido a importaçao de palavras […], predominantemente de origem portuguesa” [= due especially to the import of words of […] predominantly of Portuguese origin, consonants were also introduced into Tetun, translation mine]: /p/, /g/, /v/, /z/ and /./. However, according to other authors, the number of the imported consonantal phonemes is larger. Hull (2000, p. 189), for instance, lists nine “(Malay and/or Portuguese-derived) consonantal phonemes”: /g/, /./, /./, /p/, /r/, /v/, /z/, /./ and /./. Williams-van Klinken et al. (2002b, p. 10) attribute to Portuguese exclusively the increase in the number of consonantal phonemes of Tetun Dili, writing that “Portuguese loans are responsible for introducing the phonemes /p g v z . . . . r/”. De Albuquerque (2011a, p. 85) writes that “alguns dos sons [do] Malaio e [do] portugues […] foram incorporados a fonologia” [= some of the sounds of Malay and Portuguese […] were incorporated into the phonology, translation mine]: /p/, /v/, /z/ and g/. 
As can be seen, there are discrepancies in the number of consonantal phonemes of foreign provenance, i.e. from Malay and Portuguese. One of the factors accounting for these discrepancies is the differential extent to which these imported consonantal phonemes have been integrated into the phonology of Tetun Dili. According to de Albuquerque (2011a, p. 86), “p/ e /g/ já foram incorporados de maneira efetiva” [= /o/ and /g/ have already been effectively incorporated, translation mine], whereas “/v/ e /z/ “encontram-se limitados a empréstimos lusófonos” [= are limited to Lusophone loanwords, translation mine]. Note, however, the inconsistency: /v/ and /z/ are also listed among the imported consonantal phonemes which are characterized as “sendo produtivos e aparecendo em alguns vocábulos nativos” [= being productive and occurring in some native words] (de Albuquerque, 2011a, p. 85). Other consonants, i.e. [., ., ., .] “foram emprestadas da língua portuguesa e nao foram incorporadas” [= have been borrowed from the Portuguese language and have not been incorporated, translation mine] and “permanecem limitadas somente aos itens lexicais de origem lusófona” [= remain confined to lexical items of Lusophone origin, translation mine] (de Albuquerque, 2011a, p. 87). On de Albuquerque’s (2011a) analysis, some of the imported consonantal phonemes have a restricted distribution, occurring only Portuguese loanwords.  
A second factor is the considerable inter-speaker variation with respect to the occurrence of the imported phonemes. According to Williams-van Klinken et al. (2002b, p. 10), “many speakers do not have the full set of consonant phonemes”. The only reason for the absence of some of the imported consonantal phonemes in the Tetun Dili of such speakers mentioned in the literature is the influence of the L1s of the speakers (e.g. Williams-van Klinken et al., 2002b, p. 10; de Albuquerque, 2011a). However, the East Timor variety of Portuguese must also have played a role. Indeed, one of the characteristics of the phonology of East Timor Portuguese is the absence of the following phonemes from its phonemic inventory: /p/, /v/, /./, /./, /./, /./ (de Albuquerque, 2010b, pp. 276-277, 2011b, pp. 70-72, 2011c, pp. 234-235; Thomaz, 2010, p. 39). In other words, the local variety of Portuguese functions as a “filter” and the aforementioned consonantal phonemes do not make it into Tetun Dili. 
5 Phonological contrasts 
For Tetun Dili speakers with the maximal inventory of consonantal the massive influx of Portuguese loanwords has led to the emergence of new phonological contrasts: 
 
(1) 
 a. 
 /f/
 

 
 b. 
 /f/-/b/ 
 
 
 c. 
 /f/-/p/ 
 
 
 d. 
 /s/-/z/ 
 
 
 e. 
 /s/-/./ 
 
 
 f. 
 /z/-/./ 
 
 
 g. 
 /n/-/./ 
 
 
 h. 
 /l/-/./ 
 


 
However, according to Williams-van Klinken et al. (2002b, p. 10), “especially for those who are not native speakers of Tetun Dili, there is the possibility of a merger for: /v/-/b/, /./-/s/, /./-/z/, /./-/n/, and /./-/l/”. In this case again, the absence of these phonological contrasts may be attributed to the absence of the consonantal phonemes /v/, /./, /./, /./, /./ in the locally spoken variety of Portuguese. 
6 Phonetic realizations of imported phonemes 
6.1 Nasal vowels 
Some speakers of Tetun Dili denasalize vowels6 in Portuguese loanwords. This is captured by the rule in (2) and illustrated by the example in (3): 
6 See also section 5.2. 
7 Where the occurrence of [m] instead of the expected [n] is an instance of spelling pronunciation. 
8 Also known as “segmentalization” (McColl Millar, 2015, p. 57). 
9 Nasal vowel unpacking is also taken as evidence that nasal vowels are underlyingly two segments (Paradis & Prunet, 2000). 
 
(2) 
 . › [-nasal]
 

(3) 
 jardín [.a.din] ‘garden’ < Portuguese jardim 
 


 
Denasalization of vowels is also a characteristic of East Timor Portuguese, as shown by de Albuquerque (2010b, p. 278, 2011c, p. 235). Consider the examples below: 
 
(4) 
 a. 
 East Timor Portuguese 
 

 
 b. 
 East Timor Portuguese ontem [.ntem]7 ~ [.nten] ‘yesterday’ 
 


 
As can be seen in the examples under (3) and (4) the denasalized vowel is followed by a non-etymological nasal. De Albuquerque (2011c, p. 235) attributes denasalization in East Timor Portuguese to the influence of Tetun, claiming that “ha com frequencia […] a inserçao de un [n] epentético” [= the insertion of an epenthetic [n] frequently occurs, translation mine] since Tetun “ha um grande número de substantives terminados com um sufixo -n” [= has a large number of nouns ending in a suffix –n, translation mine]. In fact, denasalization is an instance of unpacking8, whereby “the phonetic features present in a single segment are split into a sequence of two segments” (McColl Millar, 2015, p. 57). In denasalization, as put by e.g. Crowley (1997, p. 46), “the original nasal and vowel features […] are distributed over two sounds”: 
 
(5) 
 . › V + C[+nasal]
 


 
Therefore, the Portuguese nasal vowels are reinterpreted as sequences made up of an oral vowel and a nasal consonant. As is well known, nasal vowel unpacking is widely attested9 (Crowley, 1997, p. 46; Paradis & Prunet, 2000): 
 
(6) 
 a. 
 French .
 

 
 b. 
 French avion [avi.] > Romanian avion [avion] ‘airplane’ 
 


 
 
6.2 Diphthongs 
Many of the Portuguese loanwords in Tetun Dili contain diphthongs. However, Williams-van Klinken et al. (2002b, p. 12) write that “speakers tend to reduce many of these to single vowels”. This is informally expressed by the rule in (7): 
 
(7) 
 V1V2 › V1
 


 
The following examples (from Williams-van Klinken et al., 2002b, p. 12) illustrate monophthongization: 
 
(8) 
 a. 
 padeiru.
 

 
 b. 
 tezoura [tez..a] ~ [tezo.a] ‘scissors’ < Portuguese tezoura 
 


 
Monophthongization is also attested in East Timor Portuguese (de Albuquerque, 201b, p. 279), as shown below: 
 
(9) 
 a. 
 East Timor Portuguese 
 

 
 b. 
 East Timor Portuguese vassoura [baso.a] ‘broom’ 
 


 
 
6.3 Labio-dentals 
Both the voiceless and the voiced labio-dentals are subject to variation. In addition to the labio-dental pronunciation these consonants are also phonetically realized as bilabial stops. Consider first /f/: 
 
(10) 
 /f/ › [f] ~ [p]
 

 
 fila [fila] ~ [pila] ‘to return’ 
 


 
The same variation has been observed in East Timor Portuguese (Thomaz, 2010, p. 39; de Albuquerque 2011, p. 73): 
 
(11) 
 East Timor Portuguese 
 


 
Consider next the phonetic realizations of /v/: 
 
(12) 
  
 /v/ › 
 

 
 a. 
 xavi [.ave] ~ [sabe] ‘key’ < Portuguese xave 
 
 
 b. 
 servisu [se.visu] ~ [se.bisu] ‘work’ < Portuguese serviço ‘service’ 
 


 
This again parallels the situation in East Timor Portuguese (de Albuquerque, 2010b, p. 277, 2011b, p. 72), as illustrated by the examples below: 
 
(13) 
 a. 
 East Timor Portuguese 
 

 
 b. 
 East Timor Portuguese livro [livru] ~ [libru] ‘book’ 
 


 
 
6.4 Tap /./ 
According to de Albuquerque (2011a, p. 80), /./ is realized as [x] or [.]. Reproduced below are some of de Albuquerque’s (2011a) examples: 
 
(14) 
 a. 
 aeroportu
 

 
 b. 
 boraxa [bo.a.a] ~ [boxasja] ~ [buxasa] ‘rubber’ < Portuguese boraxa 
 


 
(15) 
 farda
 


 
De Albuquerque (2011a, p. 80) further writes that these phonetic realizations of /./ are an instance of “hipercorreçao baseada na língua portuguesa” [= hypercorrection based on the Portuguese language, translation mine], without any specification as to which variety of Portuguese. Massini-Cagliari et al., 2016, p. 59) write with respect to Brazilian Portuguese that “the fricatives [h] and [x] are currently the most frequent realizations of strong r”, as seen in the following examples: 
 
(16) 
 a. 
 Brazilian Portug
 

 
 b. 
 Brazilian Portuguese porta [p.xta] ‘door’ 
 


 
Moreover, Massini-Cagliari et al. (2016, p. 60) explicitly mention the fact that [h] and [x] as phonetic realizations of the rhotic are “exclusively found in BP and do not occur in EP [= European Portuguese]”. To conclude, the variation noted by de Albuquerque (2011a) in the phonetic realization of /./ reflects the influence of Brazilian Portuguese. 
 
6.5 Alveo-palatals 
The two alveo-palatals /./ and /./ found in Portuguese loanwords frequently undergo depalatalization. Hajek & Tilman (2008, p. 21), for instance, note that “<s> “sometimes pronounced as […] ‘sh’ at the end of a word or before a consonant”.  Hajek & Tilman (2008, p. 21) further write that the [.] “is considered very refined and can be a good indicator that the speaker also speaks Portuguese”. The examples under (17) illustrate depalatalization of /./: 
 
(17) 
  
 /
 

 
 a. 
 festa [fe.ta] ~ [festa] ‘party, celebration’ < Portuguese festa 
 
 
 b. 
 xavi [.ave] ~ [sjave] ~ [sabe] ‘key’ < Portuguese xave 
 


 
The same phonetic realizations also occur in East Timor Portuguese (de Albuquerque, 2010b, p. 277, 2011b, p. 72): 
 
(18) 
 a. 
 East Timor Portuguese 
 

 
 b. 
 East Timor Portuguese chá [.a] ~ [sja] ~ [sa] ‘tea’ 
 


 
The next set of examples illustrates depalatalization of /./: 
 
(19) 
  
 /
 

 
 a. 
 janela [.an.la] ~ [zjan.la] ~ [zan.la] ‘window’ < Portuguese janela 
 
 
 b. 
 justisa [.ustisa] ~ [zjustisa] ~ [zustisa] ‘justice’ < Portuguese justiça 
 


 
East Timor Portuguese also exhibits depalatalization of /./ (de Albuquerque, 2010b, p. 277, 2011b, p. 72, 2011c, p. 235). Consider the examples below: 
 
(20) 
 a. 
 East Timor Portuguese 
 

 
 b. 
 East Timor Portuguese hoje [o.e] ~ [oze] ‘today’ 
 
 
 c. 
 East Timor Portuguese já [zja] ‘already’ 
 


 
The Tetun Dili [sj] and [zj] reflexes of /./ and /./ constitute examples of unpacking. As shown by Operstein (2010, pp. 150-151), unpacking of /./ in particular is cross-linguistically widely attested. Below are two of Operstein’s (2010, p. 151) examples: 
 
(21) 
 a. 
 Italian 
 

 
 b. 
 16th-c. English ash [a.] > Welsh <aiss> 
 


 
 
6.6 Palatals 
Depalatalization may also affect /./ and / ./ in Portuguese loanwords. The former has up to four possible phonetic realizations: 
 
(22) 
  
 /
 

 
 a. 
 banhu [ba.u] ~ [banjo] ‘bath’ < Portuguese banho 
 
 
 b. 
 linha [li.a] ~ [lijna] ~ [lina] ‘line’ < Portuguese linha 
 


 
Depalatalized realizations of /./ are also reported to occur in East Timor Portuguese (de Albuquerque, 2010b, p. 277, 2011b, p. 72, and 2011c, p. 235). While in some cases, as in (23a), there is [.] ~ [nj] variation, in others, as in (23b), [nj] this appears to be the only phonetic realization: 
 
(23) 
 a. 
 East Timor Portuguese 
 

 
 b. 
 East Timor Portuguese rascunho [raskunju] ‘sketch’ 
 


 
As for /./, its various phonetic realizations are illustrated in the following example:   
 
(24) 
 /
 

 
 pilha [pi.a] ~ [pijla] ~ [pila] ‘battery’ < Portuguese pilha 
 


 
Similar cases are attested in East Timor Portuguese (de Albuquerque, 2010b, p. 276, 2011b, p. 72, 2011c, pp. 234-235). 
 
(25) 
 a. 
 East Timor Portuguese 
 

 
 b. 
 East Timor Portuguese espelho [espelju] ~ [espelu] ‘mirror’ 
 


 
Depalatalization of both /./ and / ./is yet another instance of unpacking, very frequent cross-linguistically (Operstein, 2010, p. 150; McColl Millar, 2015, p. 57). The former is illustrated in (26) and the latter in (27): 
 
(26) 
 a. 
 16
 

 
 b. 
 Spanish canon [ka.on] > English canyon [khanj.n] ‘canyon’ 
 


 
(27) 
 a. 
 Spanish 
 

 
 b. 
 Spanish ollo [o.o] > Eastern Basque oilo [ojlo] ‘oil’ 
 


 
7 Allophonic rules 
7.1 Allophone [.] of /a/ 
In Portuguese loanwords one of the allophones of /a/ is [.]. The phonological context in which it may occur is described by the rule in (28) and exemplified in (29): 
 
(28) 
 /a/ 
 

(29) 
 pilha ['pi...] ‘battery’ < Portuguese pilha 
 


 
This particular allophone may be accounted for in terms of European Portuguese influence. According to Massini-Cagliari et al. (2016, p. 62), “one of the most salient features of the EP vowel system as compared to BP is the occurrence of [.]” in “unstressed syllables”, as illustrated below: 
 
(30) 
 European Portuguese 
 


 
 
7.2 Nasalized allophones of vowels 
Costa (2001a, p. 23) writes that in Tetun Dili “as consoantes nasais [m] e [n] nasalizam ligeiramente as vogais que as precedem” [= the nasal consonants [m] and [n] slightly nasalize the vowels which precede them, translation mine]. This is expressed by the following allophonic rule: 
 
(31) 
 V › [+nasal] / __ C[+nasal].
 


 
The rule in (31) applies to both Portuguese loanwords and to words from the native stock, as shown in (32) and (33) respectively: 
 
(32) 
 kintál
 

(33) 
 laran ['la..an] ‘inside’ 
 


 
According to Williams-van Klinken et al. (2002b, p. 12), nasalization of vowels may also “when vowels immediately […] follow nasals”, when “they are usually nasalized”, as in the example below: 
 
(34) 
 manu
 


 
Such examples suggest the allophonic rule in (35): 
 
(35) 
 V › [+nasal] / C[+nasal] __
 


 
However, nasalization also occurs when the vowel is not preceded by a nasal consonant, both in Portuguese loanwords (36) and in native Tetun words (37): 
 
(36) 
 banhu
 


 
(37) 
 a. 
 aman
 

 
 b. 
 inan ['i.nan] ‘mother’ 
 


 
Note that in the examples above the nasalized vowel precedes a nasal consonant in the onset of the following syllable. Therefore, the rule accounting for the cases in (34) and (36)-(37) can be formulated as follows: 
 
(38) 
 V [+stress] 
 


 
The allophonic rule in (38) appears to have been borrowed from Brazilian Portuguese, in which “there is (almost) obligatory phonetic nasalization of a stressed vowel preceding a nasal onset consonant” (Massini-Cagliari et al., 2016, p. 63): 
 
(39) 
 a. 
 Brazilian Portuguese ano ['.
 

 
 b. 
 Brazilian Portuguese cima ['si.ma] ‘top’ 
 


 
 
7.3 Allophones of /s/ and /z/ 
The phonology of Tetun Dili as spoken by some of its users includes the following allophonic rule: 
 
(40) 
 /s/ 
 


 
As shown below, for such speakers the rule in (40) applies to both Portuguese loanwords and words from the native stock: 
 
(41) 
 pasta
 


 
(42) 
 a. 
 haas
 

 
 b. 
 loos [lo:.] true, correct, right’ 
 
 
 c. 
 tanis [tani.] ‘to cry’ 
 


 
As noted by Williams-van Klinken (2002b, p. 10), this is “a result of Portuguese influence”, without specifying the particular variety. 
Similarly, /z/ is phonetically realized as [.], in the phonological context specified by the allophonic rule in (43), as illustrated by the example in (44): 
 
(43) 
 /z/ 
 


 
(44) 
 dezmaia
 


 
Williams-van Klinken et al. (2002b, p. 10) state that the retraction of /z/ to [.] is “once again due to Portuguese influence”, with no reference, however, to a particular variety. The allophonic rules in (40) and (43) can only have been borrowed from European Portuguese, in which “the coronal fricatives [s, z] palatalize in coda position to [., .]” (Massini-Cagliari et al., 2016, p. 58 and 59 – table 4.2), as exemplified below: 
 
(45) 
 a. 
 European Portuguese 
 

 
 b. 
 European Portuguese vesgo [ve.gu] ‘squint-eyed’ 
 


 
Finally, some speakers also have the allophonic rule in (46):  
 
(46) 
 /s/ › [z] / __ # V
 


 
As illustrated by the following example, the rule also operates in compounds consisting of native Tetun words: 
 
(47) 
 /li:s/ + /asu/ › ['li:
 


 
The allophonic rule in (47) is borrowed from European and/or Brazilian Portuguese. 
 
7.4 Allophones of /l/ 
As shown in the allophonic rule below, /l/ is optionally velarized: 
 
(48) 
 /l/ 
 


 
The domain of operation of the allophonic rule above includes not only Portuguese loanwords, but also native Tetun words, as illustrated in (49) and (50) respectively: 
 
(49) 
 finál
 


 
(50) 
 nanál
 


 
This is yet another instance of a borrowed allophonic rule. As shown by Massini-Cagliari et al., 2016, p. 57), “the velarized consonant [.] is the typical EP pronunciation” in coda position. Consider the following example: 
 
(51) 
 European Portuguese 
 


8 Syllable structure 
8.1 Word-initial consonant clusters 
Troeboes et al. (1987, p. 22) claim that “dalam bahasa Tetum […] terdapat konsonan ganda, yaitu /kb/, /kd/, /kl/, /km/, /kn/ dan /kr/”, which, “walaupun dituliskan dengan dua huruf, dianggap sebagai satu fonem” [= in the Tetun language there are double consonants, i.e. /kb/, /kl/, /km/, /kn/ and /kr/, which, although written with two letters, are considered one phoneme, translation mine]. However, no evidence is produced in favour of their alleged mono-phonemic status. Moreover, cross-linguistically no such co-articulated consonants are reported to exist.  Therefore, /kb/, /kd/, /kl/, /km/, /kn/ and /kr/ are consonant clusters (see also Taryono et al., 1993, p. 37).  
The clusters /kb/, /kd/, /kl/, /km/, /kn/ and /kr/ are the only ones which may occur in word-initial onsets in Tetun Terik, which explains why in Tetun Dili as well “in native Tetun words, word-initial consonant clusters always begin with /k/” (Williams et al., 2002b, p. 9). As for Tetun Dili, as shown by Williams-van Klinken (2002b, p. 54), in many cases these clusters are simplified via deletion of /k/, as in (52), or epenthesis of [a], as in (53): 
 
(52) 
 Tetun Terik 
 


 
(53) 
 Tetun Terik 
 


 
The massive influx of Portuguese loanwords has led to the occurrence of a large number of new CC- clusters. 
Most of the new word-initial clusters contain a stop as C1. Particularly well represented are stop + liquid clusters. A first group consists of five stop + tap clusters: 
 
(54) 
 a. 
 /p
 

 
 b. 
 /b.-/ brinku ‘ear-ring’ < Portuguese brinco 
 
 
 c. 
 /t.-/ troka ‘to exchange’ < Portuguese troca 
 
 
 d. 
 /d.-/ droga ‘drug’ < Portuguese droga 
 
 
 e. 
 /g.-/ grupu ‘group’ < Portuguese grupo 
 


 
The second group is made up of three stop + lateral clusters: 
 
(55) 
 a. 
 /pl
 

 
 b. 
 /bl-/ bluza ‘blouse’ < Portuguese blusa 
 
 
 c. 
 /gl-/ glória ‘glory’ < Portuguese  glória 
 


 
Two clusters always have /p/ as their C1. One such cluster is /ps-/: 
 
(56) 
 ps
 


 
The second one is /pn-/: 
 
(57) 
 pn
 


10 The cluster /pn-/ may be broken up by epenthesis of [e] or [i]: [peneu] ~ [pineu] (Hajek & Tilman, 2008, p. 224). 
 
Contra de Albuquerque (2011a, p. 91), who claims that “na sílaba CCV, a C1 se restringe a série de oclusivas”, fricative-initial word-initial clusters are also found. In these clusters C1 is always /f/ and C2 is a liquid, i.e. the tap (58) or the lateral (59): 
 
(58) 
 /f
 


 
 
(59) 
 /fl
 


 
 
8.2 Word-medial consonant clusters 
As shown by Williams-van Klinken et al. (2002b, p. 9), “word-internal consonant sequences in underived words are restricted to /kC/ and /mC/: 
 
(60) 
 a. 
 na
 

 
 b. 
 hamlaha ‘hungry’ 
 


 
Portuguese loanwords have introduced other word-medial clusters. Of these, Williams-van Klinken et al. (2002b, p. 9) only mention “(s)C+liquid sequences”, i.e.  -CCC- clusters, as in the following example: 
 
(61) 
 estrada
 


11 Where the occurrence of [e] is an instance of spelling pronunciation. 
12 Phonetically realized as [.] as in European Portuguese. 
 
However, several -CCCC- clusters occur in word-medial position: 
 
(62) 
 a. 
 abstratu
 

 
 b. 
 demonstrasaun [de.mon.st.a.'sa.un] ‘demonstration’ < Portuguese demonstraçao 
 


 
While a word-medial cluster such as /-bstr-/ directly reflects European and/or Brazlian Portuguese influence, those illustrated in (62b) and (62c) also reflect the phonetic realization of nasal vowels in Tetun Dili. Recall from section 6.1 that in Portuguese loanwords nasal vowels may undergo unpacking into V + C[+nasal] sequences; this therefore accounts for the occurrence of the /-nstr-/ cluster in (62b) and (62c). 
 
8.3 Resyllabification of word-medial consonant clusters 
Williams-van Klinken et al. (2002b, p. 12) note that word-initial unstressed [i] “before /sC/ clusters [is] often absent altogether”. The /s/12 in the coda of the word-initial syllable is resyllabified in the onset of the following syllable; such cases yield phonetic (i.e. non-phonological) CCC- clusters: 
 
(63) 
 estrondu
 


 
Resyllabification follows the model provided by European Portuguese, in which “vowels may […] be deleted when they occur in unstressed syllables” (Massini-Cagliari et al., 2016, p. 62). 
9 Stress placement 
In Tetun Terik, stress falls mostly on the penultimate syllable13, but word-final stress is also attested (Costa, 2011a, p. 24; Williams-van Klinken et al., 2002a, p. 12, 2002b, p. 9; de Albuquerque, 2011a, p. 92-93). Portuguese loanwords have introduced a third possible type, i.e. antepenultimate stress. Consider the following examples: 
13 The Austronesian and Papuan languages spoken in Timor-Leste exhibit predominantly penultimate word stress (see Zanten & Godemans, 2007). 
 
(64) 
 a. 
 múzika
 

 
 b. 
 polísia ‘police’ < Portuguese polícia 
 
 
 c. 
 úmidu ‘humid’ < Portuguse húmido 
 


 
According to de Albuquerque (2011a, p. 92), however, “falantes nao-escolarizados” produce forms with penultimate stress” and produce forms such as: 
 
(65) 
 a. 
 animál
 

 
 b. 
 ipóteze [i.po.'te.ze] ‘hypothesis < Portuguese hipótese 
 


 
The account suggested here is that antepenultimate stress reflects the influence of European and/or Brazilian Portuguese, whereas the rightward stress shift, i.e. the occurrence of penultimate stress, should be traced to East Timor Portuguese. As noted by de Albuquerque (2011b, p. 74; 2011c, p. 235; see also de Albuquerque 2014a), “até palavras que possuem a acentuaçao gráfica nao-penúltima sao realizadas como paroxítonas” [= even words which have a non-penultimate graphic stress are realized as paroxitones, translation mine], as seen in the example below:   
 
(66) 
 cômico
 


 
Summing up, cases such as those illustrated in (65) and (66) reflect the tendency of many speakers towards rightward stress shift, which brings Portuguese loanwords with either final stress or antepenultimate stress in line with words from the native stock. 
10 Discussion and conclusions 
The findings in sections 3 through 9 are summarized in Table 5: 
 
Table 5: Portuguese influence on the phonology of Tetun Dili 
Characteri
 Source
 

imported consonantal phonemes /p, v, ., ., ., ./ 
 European and/or Brazilian  Portuguese 
 
phonologicalcontrasts /f/-/v/, /f/-/b/, /f/-/p/, /s/-/z/,/s/-/./, /z/-/./, /n/-/./, /l/-/./ 
 European and/or Brazilian  Portuguese 
 
denasalization of vowels 
 East Timor Portuguese 
 
monopthongization 
 East Timor Portuguese 
 
replacement of labio-dentals with bilabial stops 
 East Timor Portuguese 
 
allophones [.] and [x] of // 
 East Timor Portuguese 
 
depalatalization of /./ and /./ 
 East Timor Portuguese 
 
depalatalization of /./ and /./ 
 East Timor Portuguese 
 
allophone [.] of /a/ 
 Brazilian Portuguese 
 
nasalization of vowels preceding a nasal onset consonant 
 Brazilian Portuguese 
 
allophone [.] of /z/ 
 European Portuguese 
 
allophone [.] of /z/ 
 European Portuguese 
 
allophone [z] of /s/ 
 European and/or Brazilian  Portuguese 
 
Allophone [.] of /l/ 
 European Portuguese 
 
new word-initial consonant clusters 
 All three varieties of Portuguese 
 
new word-medial consonant clusters 
 European and/or Brazilian  Portuguese 
 
antepenultimate stress 
 European and/or Brazilian  Portuguese 
 
rightward stress shift 
 East Timor Portuguese 
 


 
The Tetun and Portuguese components of Tetun Dili are less strictly separated on phonological grounds than hitherto assumed. The evidence of separation includes: imported consonantal phonemes, see section 4; new word-initial and word-medial consonant clusters, see section 8; antepenultimate stress, see section 9. Indeed, these are all attested only in Portuguese loanwords. Also, Portuguese loanwords cannot serve as bases for prosodically motivated partial reduplication and truncated compounds (Avram, 2007, 2008). However, unlike other languages, e.g. Japanese in 
which loanwords are strictly separated from other lexical strata (Avram, 1993 and 2005), there is also evidence of partial integration of the two components of Tetun Dili, e.g. the extension of allophonic rules to words from the native stock, illustrated in 7.3 and 7.4. 
The variety of Tetun Dili which is most influenced by European and/or Brazilian Portuguese is illustrative of category (4) “strong cultural pressure: moderate structural borrowing” in Thomason & Kaufman’s (1988, pp. 74-75) scale of borrowing. The structural effects typical of this category include: new phonemes; new phones; new allophonic rules; new syllable structure features; new stress rules. These are precisely the structural effects amply illustrated in sections 4, 6, 7, 8 and 9 respectively. 
As repeatedly shown in sections 3 through 9, there is considerable inter-speaker variation in Tetun Dili phonology. In previous analyses of the Portuguese impact on the phonology of Tetun Dili, this variation has been correlated with the extent of Tetun Dili–Portuguese bilingualism (in particular, Greksáková, 2018, pp. 324-350; see also de Albuquerque, 2010b, 2011b, 2011c, 2014a). However, as shown in the present paper, it also reflects the coexistence of conflicting exo-normative and endo-normative orientations, the former towards European and/or Brazilian Portuguese, the latter towards East Timor Portuguese. This accounts for the occurrence of contradictory tendencies. The examples discussed in 6.1 and 7.2 are a striking example of the clash between conflicting norms. As shown in 6.1, under the influence of the East Timor variety of Portuguese, the nasal vowels in Portuguese loanwords undergo denasalization. On the other hand, some speakers appear to have borrowed the allophonic rule of Brazilian Portuguese, whereby stressed vowels preceding a nasal onset consonant are nasalized, as illustrated in 7.2. Similarly, while many speakers do not have /./ and /./ in their inventory of consonantal phonemes, as seen in 4, others borrowed the allophonic rules of European Portuguese whereby /s/ and /z/ are realized in coda position as [.] and [.] respectively, and – as shown in 7.3 – extend them even to words from the native stock. The situation is further compounded by the occasional occurrence of instances of spelling pronunciation, see 6.1. and 8.3, in which the phonetic realizations of some loanwords reflect their orthography in Portuguese. Summing up, the general picture that emerges is a complex one and the intricacies of inter-speaker variation cannot therefore be merely reduced to variation between a more Portuguese-like phonology and a more Tetun-Dili-like one. 
References 
de Albuquerque, D. B. (2010a). Elementos para o estudo da ecoliguística de Timor Leste. Domínios da lingu@gem, 4(1), 21-36. 
de Albuquerque, D. B. (2010b). Peculiaridades prosódicas do portugues falado em Timor Leste. ReVEL, 8(15), 270-285.  
de Albuquerque, D. B. (2011a). Esboço grammatical do Tetum Prasa. MA thesis, Universidade de Brasília. 
de Albuquerque, D. B. (2011b). O portugues de Timor-Leste: contribuiçoes para o estudo de uma variedade emergente. Papia, 21(1), 65-82. 
de Albuquerque, D. B. (2011c). O elemento luso-timorense no portugues de Timor-Leste. ReVEL, 9 (17), 226-243. 
de Albuquerque, D. B. (2012). Esboço morfossintático do portugues falado em Timor-Leste. Moderna Sprak, 1, 1-10. 
de Albuquerque, D. B. (2014a). Restriçoes métricas da língua Tetun no portugues falado em Timor-Leste: o acento e a variaçao. In J. S. Magalhaes (ed.), Linguística in Focus 10: Fonologia, 73-90. Uberlândia: Editora UFU. 
de Albuquerque, D. B. (2014b). A língua portuguesa em Timor-Leste: uma abordagem ecoliguística. PhD dissertation, Universidade de Brasília. 
de Albuquerque, D. B. (2015). Contatos linguísticos em Timor-Lest: mudanças e reestruturaçao. Percursos Linguísticos, 5(11), 68-89. 
de Albuquerque, D. B. (2018). Ensaios de ecolinguística aplicada. Brasília: Anderson Nowogrodzki da Silva. 
de Araújo e Corte-Real, B. (1990). A Contrastive Analysis between Tetun and English Consonants – A Preliminary Study of Some Phonological Features of Both Languages. BA thesis, Universitas Kristen Satya Wacana, Salatiga. 
Avram, A. A. (1993). Împrumuturile recente şi fonologia limbii japoneze. Studii şi cercetări lingvistice, XLIV(3), 191-200.  
Avram, A. A. (2003). Influenţa portugheză asupra limbii tetum. Paper presented at the Symposium of the “Iorgu Iordan – Al. Rosetti” Institute of Linguistics, 4-5 November 2003, Bucharest.  
Avram, A. A. (2005a). Contacte lingvistice şi limbi mixte bilingve. In S.-M. Ardeleanu, G. Moldoveanu, G. Jernovei (eds.), Limbaje şi comunicare. Colocviul Internaţional de Ştiinţe ale Limbajului, ediţia a VII-a, Cernăuţi 2003, 193-204. Suceava: Editura Universităţii din Suceava. 
Avram, A. A. (2005b). Fonologia limbii japoneze contemporane. Bucharest: Editura Universităţii din Bucureşti. 
Avram, A. A. (2007). Reduplication in Tetun Dili. In A. Cuniţă (ed.), Concepts trans- et interculturels/Concepte trans- şi interclturale, Lingvistica, 165-187. Bucharest: Editura Universităţii din Bucureşti. 
Avram, A. A. (2008). An overview of reduplication and compounding in Tetun Dili. Revue roumaine de linguistique, LIII(4), 427-448. 
Avram A. A. (2018). Some aspects of the Portuguese influence on the syntax of Tetun Dili. In C. Lupu, A. Ciolan & A. Zuliani (eds.), Omagiu Profesorilor Florica Dimitrescu şi Alexandru Niculescu la 90 de ani, 41-55. Bucharest: Editura Universităţii din Bucureşti. 
Chen, Y.-L. (2015). Tetun Dili and creoles: Another look. University of Hawai’i at Manoa Working Papers in Linguistics, Department of Linguistics, 46(7), 1-33. 
Costa, L. (2001a). Dicionário de Tétum-Portugues. Lisbon: Ediçoes Colibri.  
Costa, L. (2001b). Guia de conversaçao Portugues-Tétum. Lisbon: Ediçoes Colibri.  
Crowley, T. (1997). An Introduction to Historical Linguistics, 3rd edition. Oxford: Oxford University Press. 
das Dores, R. (1907). Diccionario teto-portugues. Lisbon: Imprensa Nacional. 
Esperança, J. P. T. (2001). Estudos de linguística timorense. Aveiro: SUL – Associaçao de Cooperaçao para o Dezenvolvimento. 
Greksáková, Z. (2018). Tetun in Timor-Leste: The Role of Language Contact in its Development. PhD dissertation, Universidade de Coimbra. 
Hajek, J. (2007). Language contact and convergence in East Timor: The case of Tetun Dili. In A. Y. Aikhenvald & R. M. W. Dixon (eds.), Grammars in Contact: A Cross-linguistic Typology, 163-178. Oxford: Oxford University Press. 
Hajek, J., & Tilman, A. V. (2008). East Timor phrasebook, 2nd edition. Footscray: Lonely Planet Publications. 
Hull, G. (2000). Historical phonology of Tetum. Studies in Languages and Cultures of East Timor 3: 158-212. 
Hull, G. (2002). Standard Tetum English Dictionary, 3rd edition. Winston Hills: Sebastiao Aparício da Silva Project & Instituto Nacional de Linguística (INL), Timor-Leste. 
Hull, G. (2006). Concise English-Tetum Dictionary. Disionáriu Inglés-Tetun. Winston Hills: Sebastiao Aparício da Silva Publications. 
Loch, A. & Tschanz, M. (2005). Kleines Wörterbuch Tetum – Deutsch Deutsch – Tetum. Hamburg: Helmut Buske. 
Manhitu, Y. (2007). Kamus Indonesia-Tetun Tetun-Indonesia.  Jakarta:  PT Gramedia Pustaka Utama. 
Massini-Cagliari, G., Cagliari L.-C. & Redenbarger, W. J. (2016) A comparative study of the sounds of European and Brazilian Portuguese: Phonemes and allophones. In W. L. Wetzels, J. Costa & S. Menuzzi (eds.), The Handbook of Portuguese Linguistics, 56-68. Malden, MA: Wiley Blackwell. 
Mateus, M. H. & D’Andrade, E. (2000). The Phonology of Portuguese. Oxford: Oxford University Press. 
McColl Millar, R. (ed.). (2015). Trask’s Historical Linguistics, 3rd edition. Routledge: London and New York. 
Operstein, N. (2010). Consonant Structure and Prevocalization. Amsterdam / Philadelphia: John Benjamins. 
Paradis, C. & Prunet, J.-F. (2000). Nasal vowels as two segments: Evidence from borrowings. Language, 76, 324-357. 
Ross, M. A. (2017). Attitudes toward Tetun Dili, A Language of East Timor. PhD dissertation, University of Hawai’i at MaŻnoa. 
Seara, I. C., Nunes, V.G. & Lazzarotto-Volcao, C. (2011). Fonética e fonologia do portugues brasileiro. Florianópolis: Universidade Federal de Santa Caterina. 
Smith, N. (1995). An annotated list of creoles, pidgins, and mixed languages. In J. Arends, P. Muysken & N. Smith (eds.), Pidgins and Creoles. An Introduction , 331-374. Amsterdam / Philadelphia: John Benjamins. 
Taryono, A. R., Ibrahim, A. S., Rusmadji, O. & Moehnilabib, M. (1993). Morfo-sintaksis Bahasa Tetum. Jakarta: Pusat Pembinaan dan Pengembangan Bahasa, Departemen Pendidikan dan Kebudayaan. 
Taylor-Leech, K. (2009). The language situation in Timor-Leste. Current Issues in Language Planning, 10(1), 1-68. 
Thomason, S. G. & Kaufman, T. (1988). Language Contact, Creolization, and Genetic Linguistics.. Berkeley: University of California Press. 
Thomaz, L. F. F. R. (2002). Babel Loro Sa’e. O problema linguístico de Timor-Leste. Lisbon: Instituto Camoes. 
Thomaz, L. F. F. R. (2010). Das Portugiesische auf Timor. Quo vadis, Romania?, 36, 16-46. 
Troeboes, M., Khristian, T., Mboeik, S. J., Maryanto, S. & Wibowo, S.  (1987). Struktur Bahasa Tetum. Jakarta: Departemen Pendidikan dan Kebudayaan. 
van Klinken, C. L. (1999). A Grammar of the Fehan Dialect of Tetun. An Austronesian Language of West Timor.  
Williams-van Klinken, C. (2011). Tetun Language Course, 3rd edition, with revised spelling. Dili: Peace Corps East Timor, Canberra: Pacific Linguistics. 
Williams-van Klinken, C. (2015). Word-Finder English-Tetum Tetun-Ingles, 2nd edition. Dili: Sentru Lingua, Dili Institute of Technology. 
Williams-van Klinken, C., Hajek, J. (2016). Tetu-gés: Influésia portugés ba estrutura Tetun. In S. Smith, N. Canas Mendes, A. B. da Silva, A. da Costa Ximenes, C. Fernandes & M. Leach (eds.), Timor-Leste: iha kontextu lokal, rejional no global / O local, regional e global / The local, the regional and the global / Lokal, regional dan global 2015, 32-36. Hawthorn: Swinburne Press. 
Williams-van Klinken, C. & Hajek, J. (2018). Language contact and functional expansion in Tetun Dili: The evolution of a new press register. Multilingua, 37(6), 613-647. 
Williams-van Klinken, C., Hajek, J. & Nordlinger, R. (2002a). Tetun Dili: A Grammar of an East Timorese Language. Canberra: Pacific Linguistics. 
Williams-van Klinken, C., Hajek, J. & Nordlinger, R. (2002b). A Short Grammar of Tetun Dili. Munich: Lincom Europa. 
Zanten, E. & Goedemans, R. (2007). A functional typology of Austronesian and Papuan  stress systems. In V. J. Heuven & E. Zanten (eds.), Prosody in Indonesian Languages, 63-87. Utrecht: LOT. 
 
 
MARKED GEMINATES AS EVIDENCE OF SONORANTS IN SYLHETI BANGLA:  
AN OPTIMALITY ACCOUNT 
Arpita GOSWAMI 
Kalinga Institute of Industrial Technology, India 
arpig99@gmail.com 
Abstract 
This paper analyzes the universal concept that sonorants are marked geminates in the gemination process of Sylheti Bangla (henceforth SHB). Evidence from SHB suggests that when SHB speakers confront borrowed words with sonorant initial or obstruent initial heterosyllabic clusters, it is invariably the sonorant that gets assimilated. In addition, SHB data indicates that when faced with choices between two sonorants of the heterosyllabic clusters, speakers opt for the less sonorous one for gemination. Given this phenomenon, the proposal that sonorant gemination is absent in SHB could not be the ultimate one as it receives additional support from the fact that SHB also possesses many underlying sonorant geminations. Based on this investigation the hierarchy of the constraints *GG*RR>>*LL*NN is proposed for analyzing the gemination process in SHB. Finally, this paper illustrates some additional constraints in the SHB gemination process found to be necessary.  
Keywords: gemination; sonorant; obstruent; constraints; optimality theory 
Povzetek 
Članek analizira univerzalni koncept, da so zvočniki zaznamovani soglasniki v procesu podvojevanja v silheti bengalščini (odslej SHB). Podatki iz SHB kažejo, da so v izposojenkah z raznozložnim soglasniškim zaporedjem, vedno zvočniki tisti, ki so podvrženi prilikovanju (asimilaciji). Soglasniško podvojevanje zaradi prilikovanja se vedno zgodi v prid manj zvočnega soglasnika. Posledično torej predlog, da v SHB ni podvojenih zvočnikov, ni ustrezen, saj je podvojena zvočnika pojavljata v globinski podstavi. Na podlagi raziskave predlagamo naslednjo hierarhija omejitev *GG*RR>>*LL*NN za analizo procesa podvojevanja v SHB. Članek v zaključku ponazarja nekatere dodatne omejitve v postopkih geminacije SHB, za katere je bilo ugotovljeno, da so potrebne. 
Ključne besede: podvojevanje; zvočnik; nezvočnik; omejitve; optimalnostna teorija 
1 Introduction 
One of the most significant discoveries in the field of loanword adoption is the speakers’ distinct propensity to modify the borrowed words employing a varied range of phonological phenomena such as epenthesis, deletion, gemination, etc. to obtain unmarked structures. This paper explores one such predominant phonological phenomenon of gemination process applied by the Sylheti speakers. Gemination has already been defined by several linguists. Catford (1977, p. 277), for example, views the articulation of gemination as involving “a higher articulatory effort accompanying the act of moving and holding the articulators to maintain a longer occlusion time for the geminate contoid”, whereas Davis (2011a) states that geminates or ‘double consonants’ contrast with their ‘singleton’ part. Following Ladefoged & Maddieson (1996), Pajak writes that “cross-linguistically, geminates are on average between one-and-a-half to three times as long as singletons” (2009, p. 269).  
Many languages across the world contain geminate consonants such as Arabic, Berber, Estonian, Finnish, Cypriot Greek, Hindi, Hungarian, Italian, Japanese, Malayalam, Persian, Saami, Swiss German, Turkish, etc. (Kubozono, 2017). Crosslinguistic evidence shows that the presence of gemination in the intervocalic position is very frequent, while it is rare when not adjoining to any vowel (Kubozono, 2017). Elucidating the reason Pajak (2009) claims that the contrast between singletons/geminates in the intervocalic position is perceptible, on the contrary, when gemination is adjacent to a consonant, this contrast is less perceptible. 
A rigorous investigation of SHB data is indicative of the fact that a certain number of geminated words emerge in SHB through the modification of borrowed words consist of obs+son or son+obs or son+son clusters. Another variation noticed in SHB gemination is derived from the borrowed words include a CV.CV or CV.CVC structure into a geminate structure CVC.CV or CVC.CVC. In such instances, the onset of the final syllable gets geminated and acts as a coda of the first syllable. In all the gemination processes, SHB follows the typological trend in admitting the occurrence of gemination only in the intervocalic position. Edge geminates are prohibited in SHB since the constraint *COMPLEX holds a prominent position in this variety of Bangla. Additionally, the facts of SHB gemination also demonstrate that it corroborates the cross-linguistically established view that sonorant sounds are less preferable than geminate consonants.  
The cause of the dispreference of sonorant geminates relies on the core principle of Adaptive Dispersion Theory (Lindblom, 1986; Flemming, 1996, 2004; Ito & Mester, 2006), which is “an attempt to model typology of phonological inventories as a set of elements evenly spaced (or ‘dispersed’) in an acoustic-perceptual way” (Ito & Mester, 2006, p. 666). According to Flemming, the selection of phonological contrast is based on three main principles: 1. maximize the number of contrasts, 2. maximize the distinctiveness of contrasts, and 3. minimize articulatory effort, adding that “the existence of such constraints implies that the well-formedness of a word cannot be evaluated in isolation, 
it must be evaluated regarding a set of forms that it contrasts with” (Flemming, 1996, p. 1). Further, in one of his other works Flemming (2004, p. 15) writes that “the auditory distinctiveness of the contrasts should be maximized so that the differences between words can easily be perceived by a listener, minimizing confusion”. If the realization of contrast between phonemes is insufficiently distinct, it can be neutralized or modified to make it more distinct. For instance, in the case of vowels if the vowels are well distributed in the acoustic zone they are considered as preferable phonemes, but those candidates whose dispersion in the acoustic space is partial, have less chance to be treated as phonemes in languages. Taking this theory into account, many previous works such as Kawahara (2007) and Kubozono (2017) explain that languages avoid sonorant geminates because, in the case of sonorant sounds, the segmental boundaries are not distinct which causes difficulties in perceiving the segmental duration of sonorants. Since the basis of a phonological geminacy contrasts is the constriction duration between singletons and geminates, and the constriction of sonorant segments is hard to perceive, as such they do not make a very perceptible minimal pair. To encapsulate, it could be generalized that as the contrasts between singleton and geminates sonorants are difficult to discriminate perceptually, languages prefer avoiding sonorant geminates. 
Turning now to SHB, it is noteworthy that the most geminable candidates in SHB are obstruents, nasals, while laterals are less geminable, and glides and rhotics are not geminable at all. Based on this hierarchy, the ranking of constraints proposed for SHB gemination is *GG *RR>>*LL *NN>>*OBSGEM. In this paper, I will illustrate all the variations of gemination that occur in SHB, and their relative constraints with the help of Optimality Theory (Prince & Smolensky, 1993/2004; Kager, 1999). Data for this research were collected from the spontaneous speech by Sylheti speakers from in and around the Dharmanagar district of North Tripura and transcribed. The collected data were cross-checked with the researcher’s native language’s knowledge and intuition. 
2 Sonorants are marked geminates 
The segmental composition of geminates has always been an interesting topic to linguistic research from the phonetic as well as phonological point of view. One of the most significant findings was the fact that in the case of gemination, languages display their preference for obstruents over sonorous segments. After having surveyed geminate consonants in many languages, Taylor (1995, p. 122) revealed that “[s]ince all 28 languages…. have at least one obstruent geminate…, if a language has at least one geminate sonorant, it will also have one geminate obstruent”.  
Having conducted a cross-linguistic survey to experiment with the nature of geminate consonants in languages of the world, Podseva (2002) hypothesized that languages display dispreference for sonorant geminates since ‘the sonorant geminates are easily confused with corresponding singletons’ and this problem occurs because 
‘sonorants are spectrally continuous with flanking vowels, and consequently their constriction duration is difficult to perceive’ (Kubozono, 2017). The following languages demonstrated in Table 1 are surveyed by Podseva (2002) to investigate the status of geminate sonorant in the languages of the world. 
 
Table 1: Status of geminate sonorant in the languages of the world 
 
 Nasals 
 Liquids 
 Glides 
 

 
  
 laterals 
 rhotics 
  
 
1. Finnish, Hindi, Icelandic, Karo Batac, Maithili, Persian, Ponapean, Somali, Tiyre, Toba, Batak  
 . 
 . 
 . 
 * 
 
2. Punjabi, Selkup, Yakut, Fula 
 . 
 . 
 * 
 * 
 
3. Chaha, Japanese, LuGanda, Maranungku 
 ..
 *.
 * 
 * 
 
4. !Xo´o~ 
 *.
 * 
 * 
 * 
 
5. Biblical Hebrew, Wolof 
 . 
 . 
 * 
 . 
 


 
 
While Podseva’s work was based on a hypothesis, Kawahara (2007) conducted an experimental study on the nature of geminate consonants. In his work, Kawahara demonstrated how languages across the world apply phonological processes such as degemination, occlusivization, coda nasalization, etc. to ignore sonorant gemination, and concluded that sonorants are less preferred segments for gemination. Explaining the reason behind the dispreference of sonorant geminates, he stated that the contrast of phonological geminacy is based “on a constriction duration difference between singletons and geminates”, and due to the “blurry transitions into and out of flanking vowels, sonorants have a disadvantage in signaling their duration” (Kawahara, 2007, p. 2). It is therefore difficult to perceive sonorant geminates accurately. Kawahara also pointed out that the blurriness of the segmental boundary is not the only reason behind the difficult perception of sonorant geminates. One of the further factors is amplitude and its changes, which ‘are steep for the stops but shallow for the sonorants’, and make the perception of the segmental boundary of sonorants more difficult. Yet another factor that inhibits the perception of sonorant boundary is the ‘stretched out’ of the cues of sonorant segments (Kawahara, 2007). 
Now I will briefly discuss how Kawahara (2007) cited examples from different languages’ application of phonological processes to resolve sonorant gemination which evinces sonorants are marked candidates for gemination. Luganda allows obstruent gemination as in /µ +kub/ = /kkubo/ ‘path’, but when consonants in the initial position of the syllable are liquids or glides, occlusivization is applied to avoid sonorant geminates as in µ -wangal › [gg.aanga] ‘nation’. Following Whitney (1889), he mentioned that Sanskrit completely disallows retroflex untrilled liquid [r] gemination. Due to this fact, in Sanskrit geminate approximants undergo degemination, for example, [punar+ramate] › 
[puna.ramate] *[punarramate]. However, fricative, stop gemination, and other types of sonorant gemination such as laterals, nasals are allowed in Sanskrit for example, asse. arkka. etc. In the line of Sanskrit, Greek also applies degemination to avoid sonorant geminates. Unlike Sanskrit, Greek also degeminates nasal gemination. In Japanese, when a mimetic suffix /-ri/ is placed with a floating mora, it causes gemination for examples, /bata-µ-ri/ › [batta-ri] ‘accidentally’, /poka-µ-ri/ [pokka- ri] ‘openly’. However, in the case of root-final syllables with liquids or glides, degemination takes place, and a coda nasal occurs as in /kira+N+ri/ › [kiNra-ri] ‘shiningly’. In Selayarese, gemination is formed when the root with voiceless obstruent as initial consonant is preceded by the prefix /ta./. such as /ta.+tuda/ .tattuda] ‘bump against’, /ta. + kalup/ .takkalup] ‘faint’, but when the root begins with nasals and liquids, the gemination gets blocked for instance, /ta. + muri/ .ta.muri] ‘smile’. 
The phenomena applied to ignore sonorant geminates cited in (Kawahara, 2007) are mentioned below. 
a. occlusivisation (Berber, Luganda) 

b. coda nasalization (Japanese) 

c. degemination (Greek, Sanskrit) 

d. floating mora flopping (Japanese) 

e. blocking of gemination (Ilokano, Selayarese) 


Misperception is the main reason that results in the phonological processes triggered by constraints against geminate sonorants. 
 
Table 2: List of phonological processes used in the languages of the world 
Processes 
 Language 
 Geminate types avoided 
 

 
  
 Obst. 
 Nasal 
 Lateral 
 Glide 
 
Occlusivization 
 Berber 
 . 
 . 
 . 
 * 
 
 
 LuGanda 
 . 
 . 
 * 
 * 
 
Nasalisation 
 Chaha, Endenzen &, Ezha 
 . 
 . 
 * 
 -- 
 
Coda nasalization 
 Japanese 
 . 
 . 
 -- 
 * 
 
Floating mora flopping 
 Japanese 
 ..
 *.
 -- 
 * 
 
Degemination 
 Sanskrit 
 ..
 . 
 . 
 * 
 
 
 Greek 
 ..
 * 
 * 
 * 
 
Blocking of gemination 
 Ilokano 
 ..
 ./* 
 ./* 
 * 
 
 
 Selayarese 
 . 
 * 
 * 
 -- 
 
. indicates the presence of sonorant sounds used as geminated ones in the mentioned languages 
* indicates avoidance of sonorants as gemination in the mentioned languages 
-- indicates the absence of sonorant sounds in the mentioned languages, so, it can not be stated whether they undergo gemination or not. 
./* indicates marginal use of sonorant sounds 
 


3 An Overview of Sylheti Bangla 
Sylheti Bangla is primarily spoken in the Sylhet District located in the North-Eastern region of Bangladesh. It is also spoken in the three states of India — Tripura (the North Tripura district), Assam (the Barak Valley), and Meghalaya. Outside Bangladesh or India, SHB is also widely spoken in the United Kingdom. For the current paper, Sylheti spoken by the people of North Tripura is surveyed and examined. Tripura is a state of Northeast India bordered by Bangladesh to the north, south and west, and the Indian states of Assam and Mizoram to the east. During the time of independence of Bangladesh (1971), an influx of inhabitants of Sylhet District with Sylheti tongue entered India due to the political turmoil in Bangladesh, and many of them as refugees made their residence in the North District of Tripura. These people were gradually rehabilitated in Tripura as citizens of India. For that reason, in the North Tripura District SHB is spoken by the people who ancestrally belong to the Sylhet District of Bangladesh, and in this way, the particular variety of Bangla became the sole language of communication in the North part of Tripura Especially in and around Dharmanagar.  
SHB falls in the south-east group of Bangla dialects. However, it was formerly written in its script, Sylheti Nagari, similar in style to Kaithi (a script that belongs to the main group of North Indian scripts used in Bihar). Many scholars also find the affiliation of Sylheti with the Kamrupi group due to some interesting characteristics of this dialect which are found only in the Kamrupi group. Other characteristics can be called the exclusive property of East Bangla. Hence, nowadays it is almost invariably written in Bangla script. Approximately 70% of the Sylheti vocabulary is considered to have derived from Arabic, Persian, Hindi, Assamese, and some of the other Bangla dialects.  
A close observation of SHB data reveals that a significant number of geminated words emerged in SHB through the modification of borrowed words. When SHB speakers confront consonant clusters appears in the word boundary combining of obstruent+sonorant, they tend to geminate obstruent candidate. The examples of SHB gemination are demonstrated as follows. 
 
Table 3: Gemination from obst+son medial clusters 
Borrowed words with 
 Gemination in SHB 
 Gloss 
 

at..ma 
 at..t.a 
 ‘soul’ 
 
kon.ya 
 xoin.na 
 ‘would be bride’ 
 
bon.ya 
 boin.na 
 ‘flood’ 
 
cok.ro 
 sak.ka 
 ‘wheel’ 
 
pod..mo  
 ..d..d..  
 ‘lotus’ 
 
c.ot..ro 
 sat..t.i 
 ‘umbrella’ 
 
.uk.ro 
 huk.kur 
 ‘Friday’ 
 
pot..ro  
 .at..t.a 
 ‘leaf’ 
 


The preference of obstruents over sonorants in Table 3 can be attributed to the Syllable Contact Law (Vennemann, 1988, Davis, 1998, Gouskova, 2000) which proposes that sonority creates a bad contact in case it rises across the syllable boundary. Henceforth, to respect this law, the onset of the final syllable gets assimilated to the former. Besides these examples, there are some other geminated words that emerged in SHB from the alternation of borrowed words consisting of heterosyllabic consonant clusters of sonorant + obstruent segments. Consider the following examples in Table 4. 
 
Table 4: Gemination from son+obs medial clusters 
Borrowed words with 
 Gemination in SHB
 Gloss 
 

kir.t.on 
 kit..t.on  
 ‘devotional song’ 
 
kar.t.ik 
 kat.t.ik 
 ‘name of Hindu God’ 
 
bor.d.i 
 b.d..d.i 
 ‘elder sister’ 
 
.ort.a 
 h.t..t.a 
 ‘bettele nut cutter’ 
 
kur..i 
 ku...i 
 ‘chair’ 
 
bor..a 
 bo...a 
 ‘name of a Bengali month’ 
 
d.ur.ba 
 d.ub.ba 
 ‘grass’ 
 


 
 
In the data set in Table 4, consonant cluster combinations are the examples of either falling or equal sonority, and thus obey the Syllable contact law. However, in these cases also like the previous one (data set 3) sonorant sounds get assimilated, and obstruents are susceptible to any change. There thus should be other reasons behind the dispreference of sonorant gemination in SHB which will be discussed later. 
From the example set (3, 4), it is evident that when the borrowed words with heterosyllabic cluster consist of a sonorant + obstruent combination, or an obstruent + sonorant combination, it is invariably the sonorant sound in the syllable which is more prone to assimilation. In SHB, we do not observe examples of glides and rhotics gemination. However, we can not conclude that SHB is completely devoid of sonorant geminates. For instance, when borrowed words are composed of a sonorant + sonorant sequence, the less sonorous one dominates the more sonorous one. The following examples in Table 5 illustrate this point. 
 
Table 5: Gemination from son+son medial clusters 
Borrowed words with 
 Gemination in SHB
 Gloss 
 

pur..o 
 .un.n. 
 ‘complete’ 
 
g.ur.ni 
 gun.ni 
 ‘whirl’ 
 
kon.ya 
 xoin.na 
 ‘daughter’ 
 
pur..i.ma 
 .un.ni 
 ‘full moon’ 
 


 
 
Apart from this, SHB also contains underlying nasal and lateral geminations as cited in the following examples in Table 6. 
 
Table 6: Lateral and nasal gemination in SHB 
Lateral and nasal geminates in SHB 
 Gloss 
 

gul.li 
 ‘bullet’ 
 
.il.la  
 ‘hill’  
 
gul.la 
 ‘round’ 
 
gin.na 
 ‘hate’ 
 
hun.n. 
 ‘zero’ 
 
ul.la.  
 ‘enjoyment’ 
 
al.la 
 ‘God’ 
 


 
 
Now the question arises why do SHB speakers disallow rhotic and glide gemination, and on the other hand allow nasal and lateral gemination. The Complexity Condition theory could help to elucidate this point. The theory states that if a segment's sonorous value is high, it indicates that the segment has greater complexity (Rice, 1992), and the complex segments are more prone to the violation. In the ladder of the sonority hierarchy, the least sonorous segment is obstruent and vowel carries the status of the most sonorous segment. The universally accepted sonority scale is provided in Figure 1. 
 
vowels > glides > liquids > nasals> obstruents
 


Figure 1: Modal Sonority Hierarchy (e.g. Clements, 1990; Kenstowicz, 2004) 
 
In the light of the Complexity Condition theory, it could be stated that rhotics and glides are more complex than laterals and nasals due to their greater sonority value. A similar phenomenon is viewed in the Pali language (Dutta, 2017). In Pali, the borrowed words from Sanskrit underwent phonological process gemination where the most 
sonorous sounds get assimilated, and the less sonorous sounds retain their existence in the syllable. Like SHB, in Pali also when the borrowed words consist of liquid or glide plus nasal or lateral it is always the rhotic or glide sounds that get assimilated, and lateral or nasal get priority over them. Some instances of Pali gemination are cited here. 
 
k
 k
 ‘work’ 
 (Dutta
 

mulj. 
 mull. 
 ‘price’ 
  
 


 
The above-mentioned Pali geminate instances point out that when the segment is more complex, it is more prone to violation. 
Another variety of gemination present in SHB emerges from the phonological alternation of borrowed words include a CV.CV or CV.CVC syllable structure. In such cases, the onset of the final syllable gets geminated and acts as the coda of the first syllable, as in Table 7. 
 
Table 7: Gemination from borrowed words with CV.CV/CV.CVC 
Borrowed words with 
 Gemination in SHB 

 Gloss 
 

gu.li 
 gul.li 
 ‘bullet’ 
 
go.d.i 
 g.d..d.i 
 ‘mattress’ 
 
ca.d.or 
 cad..d.or  
 ‘shawl’ 
 
je.t.a 
 zit..t.a 
 ‘win’ 
 
.i.la 
 .il.la 
 ‘hill’ 
 
f.a.ka 
 .uk.ka 
 ‘hole’ 
 
pa.ka 
 .ak.ka 
 ‘ripe’ 
 


 
4 Optimality Theory and SHB Gemination 
Optimality theory (Prince & Smolensky, 1993, 2004) is the latest development of classical generative phonology replacing rule-based models. It admits a universal set of constraints CON as ranked and violable. However, it is important to note that their ranking is not universal, and the differences give birth to cross-linguistic variation. In other words, languages differ from each other in giving priorities to some constraints over others. Due to such differences, the constraint which is minimally violated in one language may be maximally violated in another.  
A formal mechanism of UG is GEN which serves to generate a large group of logically possible competing candidates for a given input while the function of another formal mechanism EVAL is to evaluate each candidate applying some constraint 
hierarchy to identify the most harmonic or optimal candidate as the output of the language. The candidate which satisfies the higher ranking constraint of the language is considered an optimal candidate even though the candidate violates the lower-ranked constraints. Two main forces aim to decide the optimal candidate of a language: markedness and faithfulness constraints. Markedness constraints have no access to the input. They only evaluate the well-formedness of output candidates. On the other hand, faithfulness constraints have access to both input and output. Markedness constraints penalize candidates that violate their terms whereas faithfulness constraints penalize those candidates that have not been faithful to the input. When a candidate violates a constraint it is marked with an asterisk ‘*’, whereas fatal violation is represented by an exclamation mark ‘! ’. A pointing hand is used to mark an optimal candidate. 
The universal ranking of constraints for gemination cited in Podseva (2002) is *GG1 >> *LL >> *NN. With a slight modification this ranking of constraints appears in Kawahara (2007) is *GG >> *LL >> *NN >>* GEMOBS. The investigation of SHB gemination process exhibits that in SHB the most geminable candidates are obstruents, nasals and laterals are less geminable and glides and rhotics are completely prohibited as geminate consonants. So, the ranking of constraints for SHB gemination based on this hierarchy is *GG*RR>>*LL*NN>>*GEMOBS. Additional constraints necessary for this process are AGREECC, SYLLABLE CONTACT, and IDENT C/_V. AGREECC rules out the surface form in which adjacent consonants are not identical with the input. The positional faithfulness constraint IDENT C/_V violates the surface form whose features of the prevocalic segment in the output are different from the input.  
1 G – Glide; L – Liquid; N – Nasal; GEMOBS – Geminate obstruent 
 
Table 8: Representation of pod..mo > ..d..d.. in the optimality theory 
po
 AGREECC 
 SYLCONT 
 *GG 
 *RR 
 *LL 
 *NN 
 IDENT C/_V 
 *GEMOBS 
 

a) pod..mo 
 *! 
 *! 
  
  
  
  
  
  
 
b) ...d..d.. 
  
  
  
  
  
  
 * 
 * 
 
c) ..m.m. 
  
  
  
  
  
 *! 
  
  
 


 
 
The above Table 8 illustrates that the candidate with obstruent gemination is evaluated as optimal despite disobeying the constraints IDENTC/_V and *GEMOBS. This happens because it satisfies all the higher-ranked constraints. The candidate with sonorant gemination is eliminated by the constraint *NN, while the faithful candidate is eliminated by the constraint *AGREECC. Thus, the correct ranking of constraints is AGREECC, SYLCONT >> *GG >> *RR >> *LL >> *NN >> IDENT C/_V, GEMOBS. 
The above-mentioned constraints are identical regarding gemination and occur in the sonorant + obstruent medial clusters but their rankings are different. Here, 
constraint IDENTC/_V is a higher-ranked constraint since the prevocalic segment of the winning output is the same as the input.  
 
Table 9: Representation of kir.t.on > kit..t.on in the optimality theory 
ki
 AGREECC 
 IDENT C/_V 
 *GG 
 *RR 
 *LL 
 *NN 
 *GEMOBS 
 

a) kir.t.on 
 *! 
  
  
  
  
  
  
 
b) .kit..t.on 
  
  
  
  
  
  
 * 
 
c) kir.ron 
  
 *! 
  
 * 
  
  
  
 


 
 
Table 9 demonstrates that the candidate with an obstruent gemination appears to be the optimal candidate because the faithful candidate violates the higher-ranked constraint AGREECC, and the candidate with a sonorant gemination violates another higher-ranked constraint *RR. This justifies the ranking of the constraints AGREECC, IDENTC/_V, *GG >> *RR >> *LL >> *NN above *GEMOBS. 
Our concern is to demonstrate that when the SHB speakers encounter two sonorants as the elemental composition of the heterosyllabic clusters, the less sonorous sound gets priority over the more sonorous sound. Table 10 is set to analyze this phenomenon in the optimality theory framework. 
 
Table 10: Representation of pur..o > .un.n. in the optimality theory 
pur..o 
 AGREECC 
 IDENT C/_V 
 *GG 
 *RR 
 *LL 
 *NN 
 GEMOBS 
 

a) pur..o 
 *! 
  
  
  
  
  
  
 
b) ..un.n. 
  
  
  
  
  
 * 
  
 
c) .ur.r. 
  
 *! 
  
 * 
  
 * 
  
 


 
 
In the above table, it is evident that the surface form /.un.n./ is evaluated as an optimal candidate as it satisfies all the higher-ranked constraints whereas the faithful candidate /pur..o/ violates higher-ranked constraint AGREECC, and another surface form /.ur.r./ is ruled out by the higher-ranked constraints IDENT C/_V and *RR. So the constraints AGREECC, IDENTC/_V, *GG >> *RR >> *LL >> outrank the constraints *NN, *GEMOBS 
However, the above-mentioned constraints are not adequate for the explanation of the derivation from the CV.CV or CV.CVC syllable structure into the geminate syllable CVC.CV or CVC.CVC. To establish constraints of this gemination, we need to take into account metrical stress in SHB. The prominent stress pattern of SHB is disyllabic where the first syllable attracts stress, and SHB speakers prefer heavy syllables to be considered as stressed for example /'huk.na/ ('HL) ('CVC.CV) ‘thin’, /'gin.na/ ('HL) 
('CVC.CV) ‘hate’. In SHB, the CVC syllable is treated as a heavy syllable for its two moraic values. In the case of borrowed words consisting of CV.CV/CV.CVC structure, the first syllable is light. This leads to the transformation of the first syllable (stressed syllable to a heavy syllable) resulting into a CVC structure in SHB. This phenomenon necessitates the constraint stress by weight position (SWP), which eliminates the candidate that violates the principle that stressed syllable must be heavy. Alternatively, it can be said that this constraint assigns a violation if the stressed syllable is not heavy. Additional relevant constraints are MAX-IO, DEP-IO, and *GEM. MAX-IO assigns a violation if the sounds in the input do not have output correspondence. On the contrary, DEP-IO assigns a violation if the sounds in the output do not have input correspondence. The constraint *GEM disallows gemination.  
 
Table 11: Representation of gu.li > gul.li  in the optimality theory 
gu.li 
 SWP 
 MAX-IO 
 DEP-IO 
 *GEM 
 

a) gu.li 
 *! 
  
  
  
 
b) .gul.li 
  
  
 * 
 * 
 
c) gul 
  
 *! 
  
  
 


 
 
As seen in Table 11, the first surface form cannot be considered an optimal candidate due to its violation of a higher-ranked constraint SWP. The third surface form with the deletion of the final syllable of the input assigns a violation of the higher-ranked constraint MAX-IO. In the optimal surface candidate that is /gul.li/ violation of lower-ranked constraints occurs at the expense of satisfying the higher-ranked constraints SWP and MAX-IO. Henceforth, the constraint ranking required for this phenomenon is SWP, MAX-IO>>DEP-IO, *GEM. 
5 Conclusion 
This paper demonstrated how systematically SHB speakers adopt loanwords by modifying them into gemination. Preferring obstruents over sonorants, the SHB gemination process corroborates the universal view that sonorants are marked geminates. Because the sonorant segment creates less constriction duration with the singleton in the spectrum, the insufficient distinction causes misperception in the speakers’ minds driving them to avoid geminate sonorants. However, an interesting observation is that when the input consists of two sonorants, SHB speakers prefer the less sonorous one, which proves that sonorant gemination is not completely absent from SHB. Henceforth, in SHB the most geminable candidates are obstruents, followed by nasals and laterals, while glides and rhotics are not geminable.  
As far as the ranking of constraints within OT is concerned, it is noted that in the case of consonant clusters of different sonority, the order is AGREECC, SYLCONT >> *GG >> *RR >> *LL >> *NN >> IDENT C / _V, GEMOBS. In respect to consonant clusters of equal sonority, the order is AGREECC, IDENT C/_V >> *GG >> *RR >> *LL >> *NN >> *GEMOBS. Besides, it is also observed in SHB when the cluster consists of a sonorant + sonorant sequence, the nasal sound retains its position. The ranking of constraints for such variation is AGREECC, SYLCONT, IDENTC/_V >> *GG >> *RR >> *LL >> *NN. In respect to the gemination process where input forms of the CV.CVC or CVC.CV syllable structures are transferred into surface forms of a CVC.CVC or CVC.CV, the ranking of constraints is SWP, MAX-IO >> DEP-IO, *GEM.  
References 
Anwar, M. (2013). Sylhet: bhasha boichitra o shabda shampad (A phenomenon of sylheti: Language and word bank). Ittadi Grantha Prokash. 
Beckman, J. (1998). Positional faithfulness. Doctoral Dissertation. University of Massachusettes, Amherst. 
Blevins, J. (1995). The syllable in phonological theory. In J. Goldsmith (Ed.), The Handbook of Phonological Theory. USA: Blackwell Publishing. 
Boersma, P. (1998). Functional typology: Formalizing the interaction between articulatory and perceptual drives. The Hague: Holland Academic Graphics. 
Catford, J. C. (1977). Fundamental Problems in Phonetics. Edinburgh: Edinburgh University Press. 
Clements, G. N. (1990). The Role of the Sonority Cycle in Core Syllabification. In J. Kingston & M. Beckman (Eds.), Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech (pp. 283-333). Cambridge: Cambridge University Press. 
Davis, S. (1998). Syllable in optimality theory. Journal of Korean Linguistics, 23, 181-211. 
Davis, S. (2011). Geminates. In M. van Oostendorp et al. (Eds.), The Blackwell Companion to Phonology, (Vol. 2, pp. 873-897). Oxford: Wiley-Blackwell. 
Dutta, H. (2017). Strength asymmetries and Pali geminates: An OT account. The EFL journal 8. pp. 49-65.  
Flemming, E. (1996). Evidence for Constraints on Contrast: The dispersion theory of contrast. UCLA working papers in Phonology 1, pp. 86-106. 
Flemming, E. (2004). Contrast and perceptual distinctiveness. In B. Hayes, R. Kirchner & D. Steriade (Eds.), Phonetically based phonology (pp. 232-276). Cambridge: Cambridge University Press. 
Gouskova, M. (2002). Falling sonority onsets, loanwords and syllable contact. In M. Andronis, E. Debenport, C. Ball, H. Elston, & S. Neuvel (Eds.), CLS 37: The main session. Papers from the 37th Meeting of the Chicago Lingustic Society (Vol. 1, pp. 175-186). CLS. 
Ito, J., & Mester, A. (1999). The phonological lexicon. In N. Tsujimura (Ed.), The Handbook of Japanese Linguistics (pp. 62-100). Oxford: Blackwell. 
Ito, J., & Mester, A. (2006). Systematic markedness and faithfulness. In J. E. Cihlar et al. (Eds.), Papers from the 39th Annual Meeting of the Chicago Linguistic Society (pp. 665-689). Chicago: Chicago Linguistic Society.  
Kenstowicz, M. (1994). Phonology in Generative Grammar. Oxford: Blackwell. 
Kubozono, H. (2017). Introduction to the Phonetics and Phonology of Geminate Consonants. Oxford University Press. 
Ladefoged, P., & Ian, M. (1997). The Sounds of the World’s languages. UK: Blackwell. 
Liljencrants, J., & Lindblom, B. (1972). Numerical simulation of vowel quality systems. The  Role of Perceptual Contrast Language, 48, pp. 839-862. doi:10.2307/411991 
Lindblom, B. (1986). Phonetic universals in vowel system. In J. Ohala & J. Jaegar (Eds.), Experimental Phonology, 13-44. Orlando: Academic Press. 
Kawahara, S. (2007). Sonorancy and geminacy. In L. Bateman et. al (Eds.), University of Massachusetts Occasional Papers in Linguistics 32: Papers in OT.III, (pp. 145-186). Amherst, Mass.: GLSA  
Kubozono, H. (2017). Introduction to the phonetics and phonology of geminate consonants. Oxford University Press. 
Ohala, J. (1993). The phonetics of sound change. In C. Jones (Eds.), Historical Linguistics: Problems and Perspectives (pp. 237-278). London: Longman. 
Pajak, B. (2009). Contextual constraints on gemination: The case of Polish. BLS 35, 1, pp.269- 280. http://dx.doi.org/10.3765/bls.v35i1.3617 
Padgett, J. (2003). Contrast and Post-Velar Fronting in Russian. Natural Language & Linguistic Theory 21, 39-87. https://doi.org/10.1023/A:1021879906505 
Podseva, R. (2002). Segmental constraints on geminates and their implications for Typology. Paper presented at January 2002 Annual Meeting of the Linguistic Society of America, San Francisco, 3-6 Jan. 
Prince, A., & Smolensky, P. (1993/2004). Optimality theory: Constraint interaction in generative grammar. Oxford: Blackwell. 
Rice, K. (1992). On Deriving Sonority: A Structural Account of Sonority Relationships. Phonology, 9(1), 61-99. Retrieved from http://www.jstor.org/stable/4420046. 
Taylor, M. (1985). Some patterns of geminate consonants. The University of Chicago Working Papers in Linguistics 1, pp. 120-129. 
Vennemann, T. (1988). Preference laws for syllable structure and the explanation of sound change. Berlin: Mouton de Gruyter. 
STOP VOICING AND F0 PERTURBATION IN PAHARI 
Nazia RASHID 
University of Azad Jammu and Kashmir (UAJ&K), Pakistan 
naziarasheed09@gmail.com 
Abdul Qadir KHAN Ayesha SOHAIL Bilal Ahmed ABBASI 
UAJ&K, Pakistan UAJ&K, Pakistan UAJ&K, Pakistan 
aqkhan8873@yahoo.com ayesha.sohail68@gmail.com itsmeabbasi@gmail.com abdul.qadir@ajku.edu.pk 
Abstract 
The present study has been carried out to investigate the perturbation effect of the voicing of initial stops on the fundamental frequency (F0) of the following vowels in Pahari. Results show that F0 values are significantly higher following voiceless unaspirated stops than voiced stops. F0 contours indicate an initially falling pattern for vowel [a:] after voiced and voiceless unaspirated stops. A rising pattern after voiced stops and a falling pattern after voiceless unaspirated stops is observed after [i:] and [u:]. These results match Umeda (1981) who found that F0 of a vowel following voiceless stops starts high and drops sharply, but when the vowel follows a voiced stop, F0 starts at a relatively low frequency followed by a gradual rise. The present data show no statistically significant difference between the F0 values of vowels with different places of articulation. Place of articulation is thus the least influencing factor. 
Keywords: Pahari; perturbation; fundamental frequency; voicing; place of articulation 
Povzetek 
V študiji smo v paharščini raziskovali učinek zvočne premene (perturbacije) na osnovno frekvenco samoglasnika, ki se pojavi zaradi prisotnosti zvenečega zapornika pred samoglasnikom. Rezultati kažejo, da so vrednosti F0 na samoglasnikih bistveno višje po nezvenečih nepridihnjenih zapornikih v primerjavi z njihovimi nezvenečimi zaporniškimi pari. Potek F0 v primeru samoglasnika [a:] izkazuje prvotno padajoči vzorec po zvenečih in nezvenečih nepridihnjenih zapornikih. V primeru samoglasnikov [i:] in [u:] opažamo naraščajoči vzorec po zvenečih zapornikih ter padajoči vzorec po nezvenečih nepridihnjenih zapornikih. Rezultati se ujemajo z Umeda (1981), ki pravi, da se F0 samoglasnika po nezvenečih zapornikih začne visoko in močno pade, ko pa samoglasnik sledi zvenečemu zaporniku, pa ima F0 razmeroma nizko vrednost, čemur sledi postopni dvig. Toda tokratni podatki ne kažejo statistično pomembnih razlik med vrednostmi F0 na samoglasnikih v primeru različnih zapornikov glede na mesto artikulacije. Iz teh rezultatov zaključujemo, da mesto artikulacije predhodnega zapornika najmanj vpliva na F0 samoglasnika.  
Ključne besede: paharščina; zvočna premena (perturbacija); osnovna frekvenca; zvenečnost; mesto artikulacije 
1 Introduction 
1.1 Background of the study 
Consonantal perturbation of the fundamental frequency is an important phenomenon in the field of linguistics. It is worth investigating as it provides the basis for the theories of tonogenesis, the chronological development of tones in a language. F0 rising or falling is posited to contribute to the development of contrastive tones owing to the voicing distinction of consonants at the initial position (Chavez-Peon, 2005). In tonal languages, the same linguistic segment may convey different meanings if uttered with different tones. F0 contrast at the stop release also serves as the cue for the perception of stop laryngeal features (Whalen, Abramson, Lisker & Mody, 1993). When other cues are ambiguous, F0 serves for signaling the voicing distinction at the prevocalic position (Hanson, 2009; Kirby & Ladd, 2015). 
It is well recognized that in many languages, initial consonants characteristically perturbate the onset F0 of the vowels (Mirza, 1990). Li (1980) argues that voicing of consonants in the prevocalic position perturbates the F0 of the vowels. F0 at the onset of the vowel is associated with the phonological features of initial stops (Francis, Ciocca, Wong & Chan, 2006). 
 
1.2 Historical background of the Pahari language 
Pahari is a term used for a string of various dialects spoken in different regions including the great Himalayas and Nepal (Shakil, 2004). It is the mother tongue of the millions of mountain-dwelling people. Shakil (2004) claims Pahari is one of the ancient languages of central and South Asia. The languages of the sub-continent belong to Indo European group of languages. According to Masica (1991), Indo Aryan family is the sub-branch of Indo-European languages. Nigram (1972) divides Indo-Aryan languages between the eastern group and the central northern group. Pahari belongs to the central northern group of Indo Aryan family. Pahari language is spoken almost in the entire Azad Jammu and Kashmir (AJ&K). It is the mother tongue of most of the Kashmiri people.  
 
1.3 Pahari stops and vowels 
According to Khan (2011), there are twelve oral stops with four places of articulation in Pahari. These places are bilabial, dental, alveolar, and velar. Pahari stops exhibit a three-way laryngeal contrast as voicing, voiceless aspirated, and voiceless unaspirated. Voiced stops are /b, d., ., g/, voiceless unaspirated stops are /p, t., ., k/ and voiceless aspirated stops are /p., t., .., k./. There are twelve oral vowels in Pahari. Among them, six are long vowels [a:, a:, u:, i:, o:, e:], and six are short vowels [., e, a, ., o, .]. The present study deals with three long vowels [a:, i:, u:] taken from three dimensions; 
central, front, and back respectively. According to Khan (2014), [a:] is a central, mid, long unrounded vowel; [i:] is a high, front, long, unrounded vowel; and [u:] is a high, back, long, rounded vowel. 
2 Literature review 
2.1 Fundamental frequency of vowels 
Fundamental frequency perceived by the human ear as the pitch is one of the certain phonetic features associated with vowels. According to Xu and Xu (2003b), it serves as a chief speech variable that provides linguistic information and performs a vital role in discourse. Pommerening and Volkner (n.d) state that by changing the fundamental frequency, speakers convey significant linguistic and paralinguistic information to the listener. 
 
2.2 Stops and voicing 
Stops are the most important category of consonants. Schiefer (1986) states that in the majority of languages, the stop category can easily and more accurately be analyzed in contrast with all the other consonants. It is because of some laryngeal features associated with stops i.e. voicing, breathiness, and aspiration, etc. Stops are the only category that includes all these features so it is termed as a universal category. Some languages have a three-way laryngeal contrast as aspiration, voicing, and breathiness; on the other hand, some have a four-way contrast as in most of the Indo Aryan languages (Dutta, 2007). According to Khan (2012), Pahari stops exhibit a three-way laryngeal contrast as voicing, voiceless aspirated, and voiceless unaspirated. 
Voicing is an important phonological feature of stops that makes two categories of stops: voiced and voiceless. According to Chen (2011), the main consonantal distinction lies in voicing which has a close association with the F0 perturbation. Carne (2008) states that voicing distinction at initial consonants results in intrinsic perturbations in the F0 of the following vowel. There is a great influence of voicing on the acoustical characteristics including the fundamental frequency of the vowels (House & Fairbanks, 1953). 
 
2.3 Stop voicing and F0 perturbation of vowels 
In most languages when vowels are preceded by consonants, the fundamental frequency of the vowels is affected by the voicing of consonants (Lofqvist, 1975). According to Hanson (2009), the initial few tens of milliseconds are considered to be influenced by the voicing properties of the preceding consonants. In some of the 
languages, the effect extends further even near to the end of the vowel length but this is less frequent. However, it is agreed that F0 of a vowel at onset is significantly higher when it follows voiceless consonants and lower when it follows voiced ones. 
As the F0 movement along the vowel contour is concerned, Wong and Xu (2007, p. 1293) claim that there are two opposite views. One view is ‘rise-fall dichotomy’ and the other is ‘no-rise view’. The first view suggests that F0 is lowered after voiceless consonants and raised from a lower onset after voiced ones. According to Silverman (1986), the association of the consonant voicing and the manner of F0 movement is called ‘rise-fall dichotomy’. The ‘no-rise view’ suggests that F0 is lowered after all stops. 
Stop voicing and F0 perturbation has been found in the majority of the languages and has been documented widely. Major studies show that voiced stops lower whereas voiceless stops raise the F0 of the vowels. A general trend of the high F0 after voiceless stops and lower F0 after voiced stops was observed by Shimizu (1989) in five Asian languages (Japanese, Korean, Burmese, Thai, and Hindi). 
F0 perturbation has been observed more at the beginning part of the vowel that lessens along the vowel length. Evidence from House and Fairbanks (1953) on the association of stops with the following vowel F0 shows that average F0 was lower after a voiced consonant and higher after the voiceless consonants and it was also observed that this difference in F0 occurred at the onset of voicing instead of occurring throughout the vowel. The greatest effect was also observed by Carne (2008) at the onset that diminished across the duration of a vowel. 
3 Methodology 
3.1 Research design 
The study is purely quantitative. The speakers were provided with a word list prepared by the researcher. Quantitative analysis includes the acoustic measurements of the F0 using Praat prosody pro 5.3.2. Data were spread on Microsoft excel to obtain the required values and to present the results in the form of tables and figures. Different statistical procedures and tests were applied on SPSS to check the significance of the results. 
 
3.2 Participants 
Six adult native speakers (three males, three females) of Pahari with ages ranging from 20-50 years were selected randomly. Their education level was between intermediate and masters. All the participants were born and in raised district Bagh and are permanent residents of this area. None of the participants reported any account of 
language impairments or any ailment that would have affected his speaking during recordings. They had normal voices and normal communicative ability. 
 
3.3 Stimuli 
A list of 36 monosyllabic words in the CVC context was prepared that contained Pahari stops at the initial position. Each stop was followed by vowels [a:], [i:] and [u:] leading to twelve different combinations for the single vowel. These vowels are the edge vowels that show most of the characteristics acquired by the vowel segments. They are taken from three positions; front, center, and back. Real words were chosen despite this constraint, there were five gaps in the stimuli. 
 
3.4 Data collection procedure 
The selected words were recorded on Praat with a frequency of 44100 Hz. A silent room was chosen to record the language samples in a neutral pitch. A high-quality Shure SM10A-CN low impedance microphone was also employed to facilitate the recordings and to avoid background noise. The participants were asked to utter each word three times with a pause after each utterance. The repetitions of each speaker were saved as wave files on Praat. 
The recorded sounds were edited on Praat prosody pro 5.3.2. The waveforms of each recorded word were segmented by marking boundaries manually across the target vowels. This procedure involved the identification of the onset of the vowel portion by the beginning of voicing after the burst of the stop. The boundaries were determined by repeatedly listening to the recordings and by the continuous inspection of the waveform. Each repetition of all the participants was labeled separately and then all were assembled. The speech analyzing software automatically located the required values of F0 by identifying the mean F0 and mean-norm F0. These values were taken in Hertz (Hz) and the corresponding time locations were recorded in milliseconds (ms) when documented on Microsoft Excel. 
Segmented portions were measured by employing the programmed Praat scripts. Fundamental frequency values were measured at the onset after the release of the stop and along with twelve intervals of the tonal contour via autocorrelation function of Praat. The required data were spread on Microsoft Excel to take the mean F0 and mean-norm F0 values. Onset F0 was taken just at the beginning of the voicing of the vowel after the burst of the stops. The data were presented in tabular form and the results were highlighted in the form of figures by employing MS Excel. 
 
3.5 Data analysis procedure 
After taking the F0 values on Microsoft excel, SPSS was used for statistical analyses. Consonant was the independent variable and F0 was the dependent variable. For statistical analysis, the data were gone through the independent t-test and one-way ANOVA. After obtaining all the assumed values, each research question was answered. Stop voicing effect was measured by comparing voiced stops with the voiceless counterparts. For F0 contours, each vowel was analyzed separately in the context of the twelve stops. It was also checked whether the place of articulation for stops, such as bilabials, dentals, alveolars, and velars had any difference in the F0 on the following vowel. 
4 Results  
Perturbation by the voicing of initial stops on the F0 of the following vowels was analyzed by taking the onset F0, mean F0, and F0 contours within the first 100ms. 
 
4.1 Onset F0 
Onset F0 values of [a:, i:, u:] preceded by voiced and voiceless unaspirated stops were measured to find the maximum effect of preceding stops on vowel F0. It was suggested by House and Fairbanks (1953) that the greatest effect of preceding consonant occurs at the onset of voicing that decreases along with the vowel. Following results were obtained: 
 
Table 1: Onset F0 of [a:, i:, u:] following voiced and voiceless un-aspirated stops 
Stops 
 Onset F0 [a:] 
 Onset F0 [i:] 
 Onset F0 [u:] 
 
p 
 164.09 
 173.728 
 162.812 
 
b 
 150.14 
 154.588 
 154.54 
 
  t. 
 167.469 
 180.197 
 176.285 
 
  d. 
 156.43 
 157.503 
 156.51 
 
. 
 169.479 
 166.923 
 171.441 
 
. 
 155.779 
 160.26 
 163.156 
 
k 
 167.783 
 178.415 
 178.43 
 
g 
 154.137 
 154.414 
 154.893 
 


 
 
 

Figure 1: Onset F0 of [a:, i:, u: ] following voiced and voiceless unaspirated stops 
 
If the onset F0 of the three vowels following voiced and voiceless unaspirated stops are compared, it is found that the F0 values of [a:] following voiceless unaspirated stops range from 160 to 170 Hz, whereas that of [i:] range from 165 to 180 Hz and that of [u:] are between 162-178 Hz. It is also found that the onset F0 of [a:] following voiced stops range from 150-160 Hz, onset F0 of [i:] are in the range of 154-160 Hz, and that of [u:] range from 155-163 Hz. This shows that there is no big difference in the onset F0 of the three vowels preceded by voiced stops. On the other hand in an environment of voiceless unaspirated stops, F0 values of [i:] are the highest, and that of [a:] are the lowest. This shows that vowels' intrinsic pitch plays a role in the context of voiceless unaspirated stops. As it is not the focus of the study, further investigation is left for future studies.  
 
4.2 Mean F0 
Mean F0 of the vowels was analyzed to find F0 differences preceded by voiced and voiceless stops. The following results were obtained. 
 
Table 2: Mean F0 of [a:, i:, u:] following voiced and voiceless unaspirated stops 
Stops 
 Mean F0 [a:] 
 Mean F0 [i:] 
 Mean F0 [u:] 
 
P 
 161.704 
 172.954 
 180.492 
 
b 
 156.355 
 168.183 
 174.201 
 
  t. 
 164.774 
 178.527 
 181.866 
 
 d. 
 159.338 
 165.929 
 169.11 
 
. 
 167.854 
 173.495 
 177.472 
 
. 
 157.635 
 168.237 
 174.702 
 
k 
 164.781 
 179.341 
 180.513 
 
g 
 156.835 
 167.067 
 171.283 
 


 

Figure 2: Mean F0 of vowels following voiced and voiceless unaspirated stops 
 
 
4.3 F0 contours of vowels following voiced and voiceless unaspirated stops 
F0 contours of vowels are also shaped by the voicing effect of the previous stops. These contours were marked by taking F0 values for each vowel over the first twelve-time intervals from the onset. According to Mirza (1990), these twelve periods constitute approximately 100ms that is considered as an adequate time to exhibit any change in the F0 of vowels by preceding stops. Mohr (1971) claims that the influence of the preceding consonant on F0 is limited to the early portion of the vowel and does not run across the entire vowel length. It has also been found by Umeda (1981) that the effect of the preceding stop continues for 100ms on the F0 of the following vowel. Moreover, in tonal languages, F0 perturbation sustains for a shorter duration as compared to non-tonal languages. The effect of all the stops on the F0 track of each vowel was marked and the following patterns were found: 
 

Figure 3: Average F0 contours over all speakers in /p/ versus /b/, /t./ versus /d./, /./ versus /./, and /k/ versus /g/ at word initial position followed by vowels [a:, i:, u:]. X-axis represents the normalized time and y-axis represents the fundamental frequency in Hertz (Hz) 
 
 
4.3.1 F0 contours of [a:] 
F0 path of [a:] within twelve intervals of time following /p/ and /b/ depicted in Figure 3 shows that /p/ raises the fundamental frequency to 164 Hz that is much higher than that of /b/ (150 Hz). When examined in the vowel duration F0 of /pa:/ sharply falls to 158 Hz during the initial five-time intervals and rises again gradually to 166 Hz in the next intervals. F0 of /ba:/ slightly lowers to 148Hz and then gradually rises to 166 Hz. The figure indicates that there is a sharp fall of 6 Hz from a raised onset in the 
environment of voiceless unaspirated stop and a narrower fall of 2 Hz from a lowered onset in case of voiced stops when data from all participants are included. 
Similarly, F0 contour after /t./ also shows a steep fall of 6 Hz from a higher onset level and rises again to 169 Hz. Conversely the onset F0 of [a:] following /d./ slightly falls from a lower onset for about 1 hertz and then rises to 167 Hz. Here the fall after the voiceless stop is steeper again. F0 contours of the vowel [a:] following /t./ and /d/ show that F0 after /d./ falls sharply for 6 Hz from a raised onset before rising again. On the other hand, F0 after /./ is 156 Hz that is lower than that of /.a:/ and after a slight fall of 1 Hz, it rises gradually to 163 Hz. F0 trajectory of /ka:/ versus /g / shows a similar fall rise pattern. After /k/, there is a fall of 7 Hz from a raised onset and after /g/, there is a fall for 3 Hz from a lower onset. 
 
4.3.2 F0 contours of [i:]  
In Figure 3, F0 contours of [i:] following voiced and voiceless unaspirated stops are also represented. These contours show that F0 of /pi:/ falls for 4 Hz during the first five-time intervals. F0 of /bi:/ starts from 154 Hz that abruptly rises to indicate a straight contour up to 177 Hz. Here voiced stops generate a gradual rising pattern. Likewise, the F0 path of Hz) to 174 Hz having a fall of 6 Hz and rises to 184 Hz. Conversely, instead of a fall rise pattern, /d./ continuously raises the F0 track of the following [i:] that starts from 157 Hz to 173 Hz. It is demonstrated by the F0 contours of [i:] following /t./ and /d./ that F0 of the vowel following /t./ is higher (166 Hz) that raises gradually up to 176 Hz. Likewise, F0 after voiced /./ is 160 Hz that is lower than that of /./. It also rises gradually to 176 Hz. F0 contours after /., ./ are somewhat different from the previous contours. These contours rise after both the stops although F0 is lower for voiced than after the voiceless stop. On the other hand, the F0 contour of /ki:/ shows a fall rise pattern (a fall of 4 Hz from 178 to 174 Hz) whereas that of /gi:/ shows a continuous rise from 154 Hz to176 Hz.  
Data from all the participants show that there is a steep fall of F0 track from a raised onset in case of voiceless unaspirated stops (except for /t/) and a gradual rise from a lower onset in case of all voiced stops. 
 
4.3.3 F0 contours of [u:] 
Figure 3 also presents the F0 contours of [u:] following voiced and voiceless unaspirated stops. It is clear that F0 of /pu:/ gradually rises from 162 Hz to 188 Hz and that of /bu:/ also rises continually from 154 Hz to187 Hz. Both the contours show a rising pattern; from a lower onset in case of voiced stop and from a higher onset in case of a voiceless stop. The perturbation of the F0 by /t./ and /d./ indicate that the F0 path following /t./ slightly lowers for about 1 Hz and then rises. Contrarily, /d./ constantly raises the F0 track of [u:] from 156 Hz to 179 Hz. F0 contours of [u:], following /./ and /./ display 
that F0 path of /.u:/ falls initially for 2 Hz. F0 after voiced /./ is 163 Hz that rises gradually. F0 course of /ku:/ shows a fall rise pattern in which it shows a fall for 5 Hz from 178 Hz to 173Hz, whereas that of /g/ shows a continuous rise from 154 Hz to181 Hz. 
 
4.4 Statistical analysis and conclusions 
To check the significance of the hypothesis, the data were statistically analyzed on SPSS. For this purpose paired sample t-test was applied to examine the difference between the F0 differences after voiced and voiceless unaspirated stops. Two stops in each pair were similar in all features except voicing. F0 of vowels obtained at different intervals were undergone the statistical analysis. Table 3 describes the statistical analysis based on t-values and p-values of the obtained data. 
Correlation coefficient and mean difference were also analyzed to check the resemblance between F0 values of vowels preceded by the minimal pairs of stops. In assimilating the F0 where the significance between voiced and voiceless unaspirated stops is less than or equal to 0.05, it is considered that the F0 differences are significant. In some cases where the significance is greater than 0.05, the results show that stops have no significant effect on the F0 of the vowel. Pair sample t-test presents the following results:  
 
Table 3: Statistical Analysis of the Voicing Effect on the F0 of [a:, i:, u:] 
 
 Mean F0 
 std. d 
 pairs 
 Correlation 

 Mean difference 
 t-value 
 p-value 
 
/a:/ 
 

p 
 161.173 
 2.857 
 pa: -ba: 
 0.753 
 6.540 
 4.671 
 0.001 
 
b 
 154.633 
 6.162 
 
t. 
 164.423 
 2.967 
 t.a: -- d.a: 
 0.817 
 6.072 
 7.259 
 0.000 
 
d. 
 158.351 
 4.439 
 
. 
 167.153 
 3.771 
 .a:--.a: 
 0.931 
 10.376 
 22.976 
 0.000 
 
. 
 156.777 
 3.140 
 
k 
 164.281 
 3.477 
 ka:--ga: 
 0.801 
 8.469 
 10.706 
 0.000 
 
g 
 155.812 
 4.176 
 
/i:/ 
 
p 
 172.578 
 0.815 
 pa: -ba: 
 0.695 
 6.323 
 3.175 
 0.011 
 
b 
 166.255 
 2.470 
 
t. 
 164.632 
 1.867 
 t.a: -- d.a: 
 0.716 
 13.347 
 10.072 
 0.000 
 
d. 
 164.632 
 1.867 
 


 
 Mean F0 
 std. d 
 pairs 
 Correlation 

 Mean difference 
 t-value 
 p-value 
 

. 
 172.015 
 0.957 
 .a:--.a: 
 0.944 
 5.117 
 4.960 
 0.001 
 
. 
 156.777 
 3.140 
 
k 
 166.898 
 1.887 
 ka:--ga: 
 0.844 
 13.440 
 9.079 
 0.000 
 
g 
 165.096 
 2.373 
 
/u:/ 
 
p 
 176.219 
 2.671 
  
pa: --ba: 
  
0.987 
  
4.895 
  
5.151 
  
0.001 
 
b 
 171.324 
 3.482 
 
t. 
 180.462 
 1.547 
  
t.a: -- d.a: 
 0.977 
 13.441 
 12.833 
 0.000 
 
d. 
 167.021 
 2.506 
 
. 
 175.852 
 1.396 
  
.a:--.a: 
 0.982 
 4.595 
 3.016 
 0.005 
 
. 
 173.258 
 2.503 
 
k 
 179.430 
 1.595 
 ka:--ga: 
 0.798 
 10.318 
 5.969 
 0.000 
 
g 
 169.112 
 2.708 
 


 
The rows in front of [a:] show the significance of the obtained results in a voiced voiceless context. The first row of Table 3 reveals that /pa:/ and /ba:/ are less correlated as evident from their correlation coefficient of 0.753. It means that vowel [a:] following this pair of stops differs in F0 frequencies. There is a maximum difference (6.540) in the mean of the pair. The significant difference is depicted by (t=4.671) and (p<0.05). Similarly, /t./ and /d./ have greater mean difference (6.072) and significant (t=7.25 and p< 0.05). Comparing /.a:/ and /.a:/ on similar lines depicts that this pair of consonants has less correlation and less mean difference but there is a significant difference between their F0 as shown by (t=22.976) and (p<0.05). The case with /ka:/ and /ga:/ is pretty similar as evident from (t=10.706) and (p<0.05). This pair has a highly significant difference in the F0 values.  
The second group of rows depicts the statistical analysis F0 of [i:] on similar grounds. /pi:/ and /bi:/ are less correlated as their correlation coefficient (0.659) shows. There is a difference in their mean (6.323) as well. The significant difference is represented by (t=3.175) and (p<0.05). A similar comparison of /t.i:/ and /d.i:/ depicts that this minimal pair has a significant difference between the F0 as shown by its (t=10.072) and (p< 0.05). Pair /.i:/ and /.i:/ also have a greater mean difference (5.117). Its t-value (4.960) and (p< 0.05) also illustrate the significant difference. Corresponding is the case of /ki:/ and /gi:/ as apparent from (t=9.079) and (p<0.05). This pair is less correlated as its correlation coefficient (0.844) shows. The mean difference (13.440) is also great. 
The third group of rows reveals that minimal pair /pu:/ and /bu:/ also has less correlation (0.987). There is a maximum difference of 4.895 in their mean. This pair has 
(t=5.151) and (p<0.05). Likewise /t.u:/ and /d.u:/ have large mean difference 13.441, less correlation coefficient 0.977, and significant (t=12.833) and (p <0.05). There is also a significant difference between the F0 of /.u:/ and /.u:/ as shown by its (t=3.016) and (p<0.05). /ku/ versus /gu:/ difference is also significant as depicted by t=5.969 and p<0.05. The statistical analysis confirms that there is a significant difference between the F0 of vowels preceded by voiced and voiceless unaspirated stops. 
Besides influencing the onset F0, preceding stops influence shaping the entire F0 contours of the vowels. The close examination of the F0 track of vowel [a:] shows that the F0 falls during the first five intervals and rises again during the next intervals after all stops. It is also observed that there is a steep from raised onset level after voiceless unaspirated stops /p, t., ., k/ and a shallower fall from the lowered onset after voiced stops /b, d., ., g/. A very similar F0 pattern of vowels having an initially falling and then rising pattern after voiceless consonants were observed by Lea (1973).  
Literature reveals two views about stop voicing and F0 perturbation; ‘rise-fall dichotomy’ and ‘no-rise view’ (Wong & Xu, 2007, p. 1293). F0 contours of [a:] align with ‘no-rise view’. This view states that F0 declines, after all, stops including voiced and voiceless stops. F0 contours of [a:] show the same pattern of F0 falling after both types of stops. Ohde (1984) also found ‘no-rise view’ as the F0 was falling in almost all the contexts. After voicing onset a considerable fall was observed for both voiced and voiceless stops and he also added that F0 after voiced stops was slightly falling which is exactly explicated by the F0 paths of [a:] after voiced stops. 
For paths of vowel /i:/ after voiceless unaspirated stops /p, t., k/ show a similar pattern as that of [a:]. These stops raise the onset fundamental frequency of [i:] to a higher level which abruptly shows a steep fall and rises again after the first five intervals. Korean too shows an abrupt fall after voiceless tense stops (Shimizu, 1989). However, the F0 path after /./ shows dissimilarity in the current study as it shows a continuous rising pattern. It also falsifies the ‘no-rise view’ that F0 falls after both types of stops. On the other hand, F0 after voiced stops /b, d., ., g/ shows a continuous rising pattern from a lower onset. F0 contours of [i:] after voiced stops are in great alignment with the general rising trend found by other studies. Except for /./ all stops confirm the ‘rise-fall dichotomy’. Quite similar results were found by Umeda (1981, p. 350) who found that F0 of a vowel following voiceless stops starts high and drops sharply, but when the vowel follows a voiced stop, F0 starts at a relatively low frequency followed by a gradual rise. 
F0 contours of [u:] following voiced and voiceless unaspirated stops are also comparable with that of the vowel [i:] having an exception of /p/ that gradually raises the F0 instead of lowering. It shows a negation of ‘rise-fall dichotomy’. All the other voiceless unaspirated stops show the falling pattern from the raised onset. It is seen that except for /k/, the fall is not much steeper. There is a slight lowering of the F0 and then a rising pattern is observed. /k/ causes a bit steep fall. On the other hand, all the 
voiced stops generate a continuous rising pattern of F0 from a lowered onset. According to Shimizu (1989) in Japanese and Hindi, F0 curves after voiced stops also show a continuous rising pattern. 
F0 contours of [u:] following voiced and voiceless unaspirated stops are also comparable with that of the vowel [i:] having an exception of /p/ that gradually raises the F0 instead of lowering. It shows a negation of ‘rise-fall dichotomy’. All the other voiceless unaspirated stops show the falling pattern from the raised onset. It is seen that except for/k/, the fall is not much steeper. There is a slight lowering of the F0 and then a rising pattern is observed. /k/ causes a bit steep fall. On the other hand, all the voiced stops generate a continuous rising pattern of F0 from a lowered onset. According to Shimizu (1989) in Japanese and Hindi, F0 curves after voiced stops also show a continuous rising pattern. 
To conclude, it may be said that voicing is a distinctive feature in Pahari and the results show that voicing of initial stops has a strong influence on the following vowel F0. There is a significant difference between the F0 values of three vowels following voiced and voiceless stops. F0 is raised by voiceless unaspirated stops and lowered by voiced stops at the onset. Moreover, the F0 contours of the vowels are also shaped by the influence of the preceding stop. F0 track of the vowel [a:] is lowered after both types of stops that align with ‘no-rise view’. On the other hand, F0 paths of [i:] and [u:] show a continuous rising after voiced stops and lowering after voiceless stops confirming the ‘rise-fall dichotomy’ with two surprising and even unpredicted results of /.i:/ and /pu:/. These two F0 paths constantly rise instead of falling. 
So, Pahari voiceless and voiced stops induce a high and low pitch on the following vowel respectively. This pitch distinction at the prevocalic position is attributed to change the tone of the vowel, hence the tone of the entire utterance. Tonal variation is considered to be one of the factors responsible for tonogenesis in a language. Pahari stops with voicing distinction have a strong tone inducing effect. 
References 
Ahmed, S. (2002). A Comparative study of English and Pahari (Unpublished M.A. Thesis). National University of Modern Languages, Islamabad. 
Bradshaw, M. M. (1999). A crosslinguistic study of consonant-tone interaction. Doctoral dissertation, The Ohio State University. 
Carne, M. J. (2008). Intrinsic consonantal F0 perturbation in 3-way VOT contrast and its implications for aspiration-conditioned tonal split: Evidence from Vietnamese. In INTERSPEECH, 2294-2297. 
Chavez-Peon, M. E. (2005). The effects of implosives and prenasalized stops on pitch in Shona. The Journal of the Acoustical Society of America, 117(4), 2461-2461. 
Chen, Y. (2011). How does phonology guide phonetics in segment–F0 interaction?. Journal of Phonetics, 39(4), 612-625. 
Dutta, I. (2007). Four-way stop contrasts in Hindi: An acoustic study of voicing, fundamental frequency and spectral tilt. Ph.D. dissertation, University of Illinois at Urbana Champaign. 
Francis, A., Ciocca, V., Wong, V., & Chan, J. (2006). Is fundamental frequency a cue to aspiration in initial stops?. The Journal of the Acoustical Society of America, 120(5), 2884. 
Hanson, H. M. (2009). Effects of obstruent consonants on fundamental frequency at vowel onset in English. The Journal of the Acoustical Society of America, 125(1), 425-441. 
House, A. S., & Fairbanks, G. (1953). The influence of consonant environment upon the secondary acoustical characteristics of vowels. The Journal of the Acoustical Society of America, 25(1), 105-113. 
Karnai, M. K. (2007). Parahri aur Urdu: Ik Taqabali Jaiza. Islamabad: National Language Authority. 
Khan, A. Q. (2014). An Acoustic Study of Pahari Oral Vowels. Lingua Posnaniensis, 56(2), 29-40. 
Khan, A. Q. (2012). Phonology of Pahari: A study of segmental and suprasegmental features of Poonch Dialect (Unpublished Doctoral Dissertation). University of AJ&K, Muzaffarabad. 
Khan, A. Q. (2011). An Acoustic Study of VOT in Pahari Stops. Kashmir Journal of Language Research, 14(1), 111-128. 
Kirby, J., & Ladd, D. R. (2015). Stop voicing and F0 perturbations; evidence from French and Italian. 18th International Congress of Phonetic Sciences, Glasgow, UK. 
Lea, W. A. (1973). Influences of phonetic sequences and stress on fundamental frequency contours of isolated words. The Journal of the Acoustical Society of America, 53(1), 346. 
Lehiste, I., & Peterson, G. E. (1961). Some basic considerations in the analysis of intonation. The Journal of the Acoustical Society of America, 33(4), 419-425. K“ D”B Institute of History and Philology Academia Sinica 51, 1-13. 
Lofqvist, A. (1975). Intrinsic and extrinsic F0 variations in Swedish tonal accents. Phonetica, 31(3-4), 228-247. 
Masica, C. P. (1991). The indo-aryan languages. Cambridge University Press. 
Mirza, J.S. (1990). F0 perturbation effects of prevocalic stops on Punjabi tones. Proceedings of the Australasian Speech Science and Technology Association, 90, 400-405. 
Mohr, B. (1971). Intrinsic variations in the speech signal. Phonetica, 23(2), 65-93. 
Nigram, R. C. (1972). Language handbook on mother tongue in census (Census of India, 1971). New Delhi: Government of India (Census Centenary Monograph No.10). 
Ohde, R. N. (1984). Fundamental frequency as an acoustic correlate of stop consonant voicing. The Journal of the Acoustical Society of America, 75(1), 224-230. 
Pommerening, K., & Volkner, M. Consonant-dependent F0 variation. Retrieved from http://www.ipds.uni-kiel.de/pub_exx/aipuk/aipuk_37/37_4_PommereningVoelkner.pdf 
Sapir, S. (1989). The intrinsic pitch of vowels: theoretical, physiological, and clinical considerations. Journal of Voice, 3(1), 44-51. 
Schiefer, L. (1986). F0 in the production and perception of breathy stops: Evidence from Hindi. Phonetica, 43(1-3), 43-69. 
Shakil, M. (Ed.).(2004). Chitka. International Pahari Literary Society. 
Shimizu, K. (1989). A cross-language study of voicing contrasts of stops. America, 66(4), 1001-1017. 
Silverman, K. (1986). F0 segmental cues depend on intonation: The case of the rise after voiced stops. Phonetica, 43(1-3), 76-91. 
Umeda, N. (1981). Influence of segmental factors on fundamental frequency in fluent speech. The Journal of the Acoustical Society of America, 70(2), 350-355. 
Whalen, D., Abramson, A., Lisker, L., & Mody, M. (1993). F0 gives voicing information even with unambiguous voice onset times. The Journal of the Acoustical Society of America, 93(4), 2152. 
Wong, Y. W., & Xu, Y. (2007). Consonantal perturbation of F0 contours of Cantonese tones. In Proceedings of the 16th International Congress of Phonetic Sciences, 1293- 1296. 
Xu, C. X., & Xu, Y. (2003b, April). F0 perturbations by consonants and their implications on tone recognition. Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP'03). IEEE, 1, 1-456. 
Xu, Y. (2006). Principles of tone research. Proceedings of International Symposium on tonal aspects of languages, 3-13. 
 
 
WORD STRESS SYSTEM OF THE SARAIKI LANGUAGE 
Firdos ATTA 
Lasbela University of Agriculture, Water and Marine Sciences, Pakistan 
firdosmalghani@gmail.com 
Abstract 
This study presents an Optimality-Theoretic analysis of Saraiki word stress.  This study presents a first exploration of word stress in the framework of OT. Words in Saraiki are mostly short; secondary stress plays no role here. Saraiki stress is quantity-sensitive, so a distinction must be made between short and long vowels, and light and heavy syllables. A metrical foot can consist of one heavy syllable, two light syllables, or one light and one heavy syllable. The Foot structure starts from right to left in prosodic words. The foot is trochaic and the last consonant in Saraiki words is extra metrical. These generalizations are best captured by using metrical phonology first and Optimality constraints later on. 
Keywords: Saraiki, quantity-sensitive, Optimality Theory, trochaic structure, Metrical Phonology 
Povzetek 
Ta študija predstavlja analizo besednega naglasa v sarajskem jeziku (sarajščina, angl. Saraiki) v okviru optimalnostne teorije. Besede v sarajščini so večinoma kratke; sekundarni stres ne igra nobene vloge. Besedni naglas je količinsko občutljiv, razlikujemo med kratkimi in dolgimi samoglasniki ter lahkimi in težkimi zlogi. Stopica je lahko sestavljena iz enega težkega zloga, dveh lahkih zlogov ali enega lahkega in enega težkega zloga, v zapisani prozodični besedi se začnejo na desni in se širijo proti levi. Stopica je vedno trohejska in zadnji soglasnik sarajski soglasnik v prozodični besedi je zunaj metričen. Omenjene posplošitve je najbolje zajeti tako, da najprej uporabimo metrično fonologijo in zatem omejitve optimalnostne teorije. 
Ključne besede: Saraiki, količinska občutljivost, optimalnostna teorija, trohejska zgradba, metrična fonologija 
1 Introduction 
The analysis of stress remains a ‘hot debate’ in phonology. Stress refers to the phonetic prominence of one or more syllables in the prosodic word.  One syllable in the prosodic domain of a word often seems more prominent than others, where phonetic prominence can be indicated by different phonetic cues: pitch, length, and loudness, or a combination of these. Cross-linguistic variation concerning stress makes it complicated to analyze: factors that play a role are, among others, the stress domain, syllable weight, the role of edges, and whether or not secondary stress occurs (see; Beckman (1986); Halle and Vergnaud (1987); Hayes (1982, 1995), among many others). In the past, such factors were analyzed by ‘parameter settings’ (Hayes, 1980), but this approach has largely been replaced by OT constraints taking over these functions. 
Kager (1999) lists several cross-linguistic properties of word stress: (i) culminativity, i.e. words tend to have only a single peak, (ii) demarcativity, i.e. stress is usually located at a word margin, (iii) rhythmicity, i.e. stress usually alternates and (iv) quantity-sensitivity, which refers to the fact that in some languages a heavy syllable in a word (i.e. a syllable with a long vowel, or a closed syllable) attracts stress. In other, quantity-insensitive languages, weight is irrelevant for stress assignment. Quantity-insensitive stress can be further divided into two categories: either stress is fixed on some syllable at or near the edge or it is rhythmically assigned. Tryon (1970) provides an example from the Australian language Maranungku, which has a rhythmic stress pattern. In this language, primary stress is located on the first syllable and secondary stresses are assigned to odd-numbered syllables thereafter. In some cases, a final syllable is always stressless, for example in Pintupi (Hansen & Hansen, 1978). 
A wide variety of stress systems are reported in the context of fixed stress systems and free stress systems. Turkish is one of the documented languages which have fixed primary stress at the final syllable of the word (Inkelas, 1999; Sezer, 1981). Likewise, Finnish places stress on the syllable in the initial position (Anttila, 1997), without taking into account the syllable weight and syllable structure. Hence such languages are insensitive towards quantity, keeping an edge-oriented stress system. However, there are also languages with weight edge-oriented stress systems, such as the Murik language (Kager, 2004). In the domain of free stress systems, languages carry stress on random positions within a word. In such languages, morphology might influence the prosodic structure, as, in the Pashto language (Shafeev, 1964). Saraiki appears to be an edge-oriented quantity-based stressed language. Not all details are known, and the influence of morphology has not yet been well analyzed. The phonetic cues of stress also do not appear to be quite the same (but some basic notions are given) as in a stress-timed language like English. The phonetic cues, pitch, duration, and intensity are considered as the basic notions of stress in English. Nevertheless, in Saraiki stress ‘pitch rise and rising intensity’ are the phonetic cues (Atta, van de Weijer, and Zhu, Accepted). Thus, this article should be seen as the first step towards an analysis of the Saraiki stress, 
phonologically. Other studies in the literature are related to different aspects of Saraiki language (see Atta, 2019; Shackle, 1976). Saraiki belongs to the family of Indo-Aryan family and this study is limited to the variety of the Saraiki, viz. central Saraiki, spoken in Pakistan. The OT constraints we will use will be discussed in the following sections. 
This article is arranged as follows: in the next section, a brief introduction to Saraiki syllable structure is given. This will be elaborated here with a specific view of the function of syllable structure for stress assignment. The next part covers the analysis of Saraiki word stress within the OT framework. The last section concludes it.  
2 Syllable structure and the status of moras in Saraiki 
The role of syllable structure and syllabification is fundamental in shaping the stress system of quantity-sensitive languages. Saraiki is rich in syllable structure; the following are the possible syllable structures in Saraiki: 
 
(1) 
 V/VV 
 /./ 
 ‘come’ 
  
 /.o/ 
 ‘come in’ 
 

 
 CV/CVV 
 /tu~/ 
 ‘you’ 
  
 /piu/ 
 ‘father’ 
 
 
 VC/VVC 
 /u.h/ 
 ‘camel’ 
  
 /.okh/ 
 ‘difficulty’ 
 
 
 CVC 
 /.e./ 
 ‘sit’ 
  
 /kh.s/ 
 ‘snatch’ 
 
 
 CCVC 
 /.ruk/ 
 ‘run’ 
  
 /.ru./ 
 ‘break’ 
 
 
 CVCC 
 /limb/ 
 ‘plaster’ 
  
 /p.nd./ 
 ‘distance’ 
 
 
 CCV 
 /kh.i/ 
 ‘stop’ 
  
 /kri~/ 
 ‘will do’ 
 
 
 VCC 
 /.mb/ 
 ‘mango’ 
  
 /uns/ 
 ‘love’ 
 
 
 CCVCC 
 /.r.x./ 
 ‘tree’ 
  
 /d.rust./ 
 ‘right’ 
 


 
Saraiki prohibits ‘CCC’ in initial and final position and structures of ‘VVCC’ or ‘CCVV’ are not permitted. What is crucial is that Saraiki has a phonemic contrast between long and short vowels. In the examples below long vowels are indicated by length mark (:) while short vowels are given without this length mark. The following examples illustrate this: 
 
(2) 
 pi:. 
 ‘pain’ 
  
 pi.
 ‘saint’ 
 

 
 t.u:l 
 ‘long’ 
  
 t..l 
 ‘determined’ 
 
 
 m.l 
 ‘goods’ 
  
 m.l 
 ‘dirt’ 
 


 
Though, the quality difference (tense/lax)1 is used in English to represent the (phonetic) contrast of long and short vowels without any length mark and in Saraiki, the peripheral vowels are longer than the central vowels (Shackle, 1976). However, in Saraiki the vowels are differentiated based on quantity (long/short) with length marks. So, long vowels in Saraiki have two morae and short vowels have one mora. Likewise, the distinction between short and long vowels is commonly made in terms of mora in metrical phonology: short vowels have one more, long vowels have two (Hayes, 1995): 
1 The vowel distinction is normally called as long and short in British English, however, in North America Tense and Lax are common. In English long-short and tense-lax go together and in other languages, it might be independent.  
 
(3) 
  

 


 
Finally, in most languages closed syllables count as equally heavy as syllables with long vowels. In terms of mora, both are therefore represented with two morae: 
 
(4) 
  

 


 
Hence, Saraiki has a potential weight contrast between light and heavy. Quantity here refers to either the weight or the length of the syllable. In metrical phonology, the moraic theory (Hayes, 1982) is widely used to assign a weight to the syllable as it is a crucial element in stress assignment in many quantity-sensitive languages. This theory suggests that in syllable structure, the onset does not carry weight while the nucleus always does and the coda might. In this way, syllables are distinguished between light and heavy (McCarthy, 1986). As suggested by McCarthy, open syllables with a short vowel are always considered as light (i.e. have one mora), whereas closed syllables may be heavy or light subject depending on the language: in some languages these count as heavy (two morae), in other languages they count as light (one mora). Languages in which they are heavy are said to have “weight by position”. 
In Saraiki, concerning syllable structure, it is of interest that no second syllable (which is usually also the final syllable) is without an onset. Sometimes gemination occurs to satisfy this onset requirement, for instance, /.mm../ ‘mother’ and /.bb./ ‘father’. Sometimes to satisfy the stress requirements gemination is noted too (see Shackle, 1976, p.27).  
Since syllables in Saraiki are either open or closed in moraic representation, Saraiki can differentiate syllables in terms of their weight, based on the phonemic contrast between long and short vowels and syllable structure.  A moraic representation to clarify the idea is illustrated here: 
 
(5) 
  

 


 
Thus, the above moraic representation suggests that a mono-moraic syllable is light (L), a bimoraic one is heavy (H). Standard German (Hall, 2002) distinguished three or more than three moraic syllables, which are known as super-heavy syllables.  
Some elements do not take part in prosodic structure, therefore, such prosodic units are considered as extra metrical in the initial or final position of the prosodic word. The concept of extrametricality was first introduced by Liberman and Prince (1977) and comprehensively elaborated by Hayes (1995) later on; 
a) Elements like a syllable, foot, and the segment can be extra metrical. b) Extrametricality occurs on the right or left edge of a word. c) The right edge is unmarked for extrametricality. 
Though these rules apply in many languages such as English (Hayes, 1982), Arabic (McCarthy, 1979), etc., questions may be raised in some situations. For instance, in quantity-sensitive language, in the CV.CVC structures the last C may be extrametrical but in the CV.CVV the last V (or the mora of a vowel) may not be, although the weight of both syllables is equal. If the right edge is unmarked for extrametricality, then for a trochaic language, a mora of a final VV might be extra metrical as it has no role in the prosodic structure. We are not trying to fix this issue here as this is beyond our scope of study and this requires further theoretical investigation. Moreover, in Saraiki, we do not have such final syllable structures to face the ambiguity.  For the time being, we are following the existing practice of extrametricality. Thus, we need to be careful 
about the role of extrametricality in different languages, and we will examine its role carefully in Saraiki. 
Taking into consideration the observations of McCarthy (1979) for Arabic concerning syllable weight, we assume that in Saraiki all open syllables are light and closed syllables may be light or heavy Whereas closed syllables with VV (long vowel or diphthong) or VCC are heavy. Generally, the moraic representations of words cited from Saraiki are presented below: 
 
(6) 
  

 


 
The last example requires some discussion as this contains three moras. According to Hayes (1980), a foot can contain maximally two moras. Here the point of interest is a super-heavy syllable in Saraiki. As moraic theory demands that only two moras can make a foot therefore, the last mora is considered as extra metrical. The above example also shows that in Saraiki, the right edge of the prosodic word is extrametrical. 
Since biconsonantal clusters are absent in medial position (especially as coda but might be the onset of next syllable) of the word. Such clusters, when they occur in medial position are split between two syllables i.e, /kh...../,[*kh.....] ‘cot’ and /su.`..~..r.~/ , [*su.`..~..r.~] ‘moringa tree’. An interesting fact regarding syllable structure at medial position is that consonant cluster becomes the onset of the following syllable only if the preceding syllable has a long vowel (peripheral vowel) as the nucleus. Thus the division of consonant cluster at medial position suggests that extrametricality might play a role at the right edge. We will therefore assume that in Saraiki the last consonant in CVC and VCC is extrametrical (<C>). This means that in Saraiki we find only two types of syllables, light (L) and heavy (H), and their moraic representations look like this: 
 
(7) 
  

 


 
After having established the above syllable structure, we move on towards foot construction in the next section. 
2.1 Foot construction and stress assignment in Saraiki 
In prosodic structure, the foot is crucial for stress assignment. There are two main types of feet, trochees (strong weak) and iambs (weak strong), which are further divided into subtypes, as proposed by Hayes (1995). Here ‘L’ denotes a light syllable and ‘H’ stands for a heavy syllable. 
 
(8) 
 (a) 
 (. `.) 
 (L L) (L `H) 
 syllabic iamb 
 

 
 (b) 
 (µ `µ) 
 (H) 
 moraic iamb 
 
 
 (c) 
 (`. .) 
 (L L) (H H) 
 syllabic trochee 
 
 
 (d) 
 (`µ µ) 
 (H) 
 moraic trochee 
 


 
Feet are represented by parentheses and the stress mark in the foot indicates its trochaic or iambic nature (Cohn & McCarthy, 1998; Selkirk, 1980). The hierarchy of prosodic categories is given as:  
 
(9) 
 Prosodic word 






 


 
Generally, whether monosyllabic words are light or heavy, they are always stressed therefore, no need to list such words in Saraiki for stress assignment.  
Now let us turn to disyllabic words. Data concerning stress is given in (10). Note that a trill sometimes occurs as a free variant of tap/flap and as a syllabic consonant after dental plosives (Atta, van de Weijer, and Zhu, 2020). Therefore, one can observe these three forms in the data below. 
 
(10) 
 a. disyllabic words with CV.CV 
 

 
 `p..s. 
 ‘side’ 
 (`L L) 
 
 
 `p..l. 
 ‘cold’ 
 (`L L) 
 
 
 `ph..l. 
 ‘door’ 
 (`L L) 
 
 
 `kh..l. 
 ‘ford’ 
 (`L L) 
 
 
 b. Disyllabic words with VCCCV or VVCV or CVCCV 
 
 
 `i~:.d.. 
 ‘his/her’ 
 (`H H) 
 
 
 `uth.thi 
 ‘wake up’ 
 (`H H) 
 
 
 `us.t..i 
 ‘clever’  
 (`H H)  
 
 
 `it..l. 
 ‘so much’ 
 (`H H) 
 
 
 `kh..... 
 ‘cot’ 
 (`H H) 
 


 
From the inspection of the above data, we noted that the foot type of Saraiki is a moraic trochee. Moraic representations of examples from (10) are given in (11). 
 
(11) 
 (a) 
 has two moras; 
 

 
  
  

 
 
 (b) 
 has three moras (2+1)  
 
 
  
  

 
 
 (c) 
 three moras (1+2) 
 
 
  
  

 


As we expected, foot structure is quantity-sensitive in Saraiki since heavy syllables construct a foot by themselves. These final heavy syllables also lead us to fix the direction of feet construction: this process starts from the right edge of the prosodic word as it is obvious from the syllable structure of disyllabic words below: 
Disyllabic words with CV.CVV<C> or CVC.CVV<C> 
ph..`loo <.> ‘explore’ L (`H) m..`roo<.> ‘twist’ L (`H) s.k.`roo<.> ‘crispy’ H (`H) m.r.`d...<r> ‘dead’ H (`H) ...`loo<.> ‘waterspout’ L (`H) 
To summarize so far, the following characteristics of Saraiki stress have been discovered: 
24. Saraiki is a quantity-sensitive language since heavy syllables cannot serve in the weak position of a stress foot. 

25. In the case of two light syllables stress falls on the left (`L L). 

26. If the foot structure is (Schmidt), the heavy syllable will attract stress. 

27. Syllables with schwa or light syllables never attract main stress and heavy syllables always do. 

28. The foot is trochaic and feet are assigned from right to left 


If these considerations are correct, we predict that stress would fall on the medial syllable in trisyllabic words. Data for such words are given in (12), noting that there are far fewer examples of this than disyllabic words. 
 
(12) 
 Trisyllabic words with V
 

 
 u.`...l.. 
 ‘haste’ 
 L (`L L) 
 
 
 su.`..~..r.~ 
 ‘moringa tree’ 
 L (`L L) 
 
 
 ...`...ri 
 ‘broom’ 
 L (`L L) 
 


 
We see that our prediction is borne out. In fact, concerning trisyllabic words, there are no counterexamples (e.g. with different syllable structures) in Saraiki.  
 
(13) 
  

 


 
Now let’s turn to the OT analysis of these examples. 
3 OT analysis of Saraiki Stress 
This section shows how the Saraiki stress system is captured in Optimality Theory. The above characteristics of Saraiki stress can be ‘translated’ into a metrical constraint ranking. Before an analysis of Saraiki stress, we would like to introduce some of the relevant constraints. For example, in languages in which stress is subject to weight sensitivity, the constraint WSP (weight to stress position) is high ranked. This constraint is defined as follows: 
 
(14) 
 WSP
 


 
Likewise, foot construction (whether based on syllables or moras) is an essential part of stress assignment. Feet typically consist of two units (see also above). This is captured by the constraint FOOT BINARITY (FT.BIN):  
 
(15) 
 FOOT BINARITY
 


 
We saw that consonant extrametricality played a role in Saraiki stress. When a language has extrametrical units, it violates WBP and MAX-IOµ and satisfies *FINAL-C-µ and *3µ (only for VCC# and VVC# but not for VC#). All these constraints are defined as follows: 
 
(16) 
 *3µ 
 “no three moras in one syllable” 
 

 
 WBP 
 “a coda consonant is moraic” (Hayes, 1989) 
 
 
 *FINAL-C-µ 
 “the final mora is extrametrical” (Hayes, 1989) 
 
 
 MAX-IOµ 
 “output must contain maximum input moras” 
 


 
We start our analysis with monosyllabic words. Keeping in mind the basic elements of prosodic structure (µ) as given above, we assumed that the word limb ‘plaster’ has two moras. However, OT is free to consider other candidates (‘freedom of generation’), 
e.g. with three moras (i.e. without extrametricality, or with two feet, or even without stress). Such candidates will fail because of other constraints, in particular *3µ and a general constraint that requires prosodic structure. The purpose to analyse monosyllabic words is to clarify the status of moraic feet in Saraiki. 
 
(17) 
 Input: /liµmµbµ/ 
 *3µ 
 FT. BIN 
 *FINAL-C-µ 
 MAX
 WBP 
 

 
 a. (liµmµbµ) 
 *! 
 * 
 * 
  
  
 
 
 b. .(liµmµ)<b> 
  
  
  
 * 
 * 
 


 
The first candidate breaches the high ranked constraints and is thus excluded from winning. The second contender, although it has two violation marks, emerges as the winner. This suggests that these two constraints are ranked low in the prosodic constraint hierarchy of Saraiki. The high position of *3µ confines structures like -VVC and -VCC- to word-medial position as extrametricality only occurs at the right edge. The examples from Saraiki, /us.t.ri/ ‘clever’ and /.o.t.ra/ ‘poor’ reflect the position of *3µ constraint in the framework of OT. 
 
(18) 
 Input: 
 *3µ 
 FT. BIN 
 

 
 a.   ( uµsµt.µ.riµ) 
 *! 
  
 
 
 b. .( uµsµ.t.riµ) 
  
  
 
 
 Input /.µoµt.raµ/ 
  
  
 
 
 a. . (.µoµ.t.raµ) 
  
  
 
 
 b.   (.µoµt.µ.raµ) 
 *! 
  
 


 
Let’s now analyze a disyllabic word with a simple CV.CV structure. Rhythmically, this simple structure has two possible outputs i.e. stress on the ultimate or the penultimate. Since the stress is on the left syllable, a left-headed foot must be involved. OT expresses this with a single constraint ‘FOOT-FORM trochee’: 
 
(19) 
 FOOT
 


 
Since we already argued that the constraint ‘FT BIN’ is high ranked so, the interaction of the two constraints FT.BIN and ‘FOOT-FORM trochee’ is illustrated as follows: 
 
(20) 
 FOOT BINARITY
 

 
 Input /p.µs.µ/ 
 FT.BIN 
 FT-FORM trochee 
 
 
 a. (`p.µ) s.µ 
 *! 
  
 
 
 b. . (`p.µ.s.µ) 
  
  
 
 
 c.  (p.µ`s.µ) 
  
 *! 
 


 
Let’s inspect why some applicants are defeated. Candidate (c) incurs a violation of the FT-FORM constraint, whereas the ‘a’ contender fatally violates FT-BIN.  The ‘b’ candidate satisfies both these constraints and comes out as the winner. If we compare the two winners in the above two tableaux, a slight difference in the foot formation is noted, the winner in (18) (liµmµ<b>) obeys moraic foot binarity (a foot consists of two morae and have stress on the left mora) while the second one in (20) (`p.µ.s.µ) obeys both foot binarity and moraic binarity. One strong reason in this regard is that there are no monosyllabic words with a single mora (i.e. a short vowel) in Saraiki. This follows from the analysis proposed so far. Since the prosodic words have a foot, and a foot is binary (either in terms of moras or of syllables), a monosyllabic word may have two moras. It then also follows that a word with a closed syllable (short vowel followed by a consonant), is bimoraic. This proves that Saraiki is a language that has “weight-by-position” (cf. above) 
Let’s test our analysis so far on another category of disyllabic words that have structures like CVC.CVVC or CV.CVVC. These kinds of data are special as the analysis will help to look at different issues related to Saraiki stress. The first notable thing is the stress assignment on such words i.e. (H H) and (L H). Previously, we saw only one kind of words i.e. (L L), therefore no dispute is noted, our coming discussion will deal with words having other than (L L) structure. Examples for such structures and their moraic representation are given below: 
 
(21) 
 ph..`loo <.> 
 ‘explore’ 
 L (`H) 
 

 
 m..`roo<.> 
 ‘twist’ 
 L (`H) 
 
 
 s.k.`roo<.> 
 ‘crispy’  
 H (`H) 
 
 
 m.r.`d...<r> 
 ‘dead’ 
 H (`H) 
 


 
(22) 
 Moraic representation 
 

 
  

 


Recall the characteristics of Saraiki stress: it appeared to be quantity-sensitive which means heavy syllable will attract stress (the constraint WSP is ranked high). Hence, in case of an unequal weight (L H) for quantity sensitive languages, it is easy to predict stress assignment while in the case of equal weight (H H) of syllables a competition is noted. In the first tableau, the word ‘limb’ violates foot binarity so the last ‘C’ is considered as extrametrical to avoid this violation. It indicates that the last ‘C’ in CVVC’ and CVC is considered as extrametrical in Saraiki. This means that such structures violate WBP and MAX-IO-µ as given above in (18). Two characteristics quantity and trochaic stress, suggest the superiority of right edge alignments in ‘LH’ structures. The alignment constraint for the right edge in OT is ALL FT-R and the constraint PARSE SYL demands all syllables must be parsed into feet; these are given in (22). Let’s take a word with (LH) structure first for analysis: 
 
(23) 
 ALL-FT-R 
 “all feet must be right aligned
 

 
 PARSE-SYL 
 “syllables must be parsed into feet” 
 
 
 Input:/..µ.loµµ.µ/ 
 *3µ 
 FT.BIN 
 FTFORM 
 *FINAL-C-µ 
 ALL FT–R 
 WSP 
 MAX-IO-µ 
 PARSE-SYL 
 WBP 
 
 
 a. ...µ.(`loµµ)<.> 
  
  
  
  
  
  
 * 
 * 
 * 
 
 
 b. (..µ.`loµµ)<.> 
  
  
 *! 
  
  
  
 * 
  
 * 
 
 
 c. (..µ).loµµ<.> 
  
 *! 
  
  
 * 
  
 * 
 * 
 * 
 
 
 d. (`..µ.loµµ)<.> 
  
  
  
  
  
 *! 
 * 
  
 * 
 
 
 e. (..µ.`loµµ.µ) 
 *! 
  
 * 
 * 
  
  
  
  
  
 


 
The first candidate emerges as optimal and has violations of three low ranked constraints. This winner also suggests that it is only necessary for the foot to follow foot binarity either at the syllable or moraic level. The satisfaction of FT-FORM requires regenerating feet on the moraic level. As the extrametrical consonant is associated with the next syllable in Saraiki, it suggests the structure is something like a stressed to unstressed syllable. Therefore, the optimal winner means that binary feet are favored while leaving the remaining syllable unparsed. The second candidate though has three violation labels but is not a winner. Since none of the other candidates survives under this constraint ranking as they bear fatal violations. The constraint ranking so far is depicted as: 
 
(24) 
 *3µ,
 


 
While taking this constraint ranking a word of structure [H H] is scrutinized. These kinds of words have five possible feet structures; (i) (`H) (`H), (ii) H(`H) (iii) (H `H) (iv) (`H)H, and (v) (`H H), where the preferred structure is H(`H) when the foot is 
regenerated on moraic level, in Saraiki. Now the point of concern is to find out the reasons, on what basis the rest of the structures are not favored? As discussed earlier, Saraiki is trochaic so those structures which oppose it are categorically ruled out in Saraiki as (H `H). Since the structure (`H ) (`H) bears stresses clashes so language dislikes it and (`H)H violates another constraint ALL-FT.R so dispossessed. The rest of the candidates, H(`H), and (`H H) have no solid reasons for eviction, at surface level.  The stress assignment in quantity sensitive languages is subject to quantity and rhythmicity (Kager, 2004). Though quantity is the main factor to attract stress in a quantity-sensitive language in some situations rhythm comes into play as in case of (H H) structures. Extrametricality in Saraiki is not limited to regulate foot structure only but rather it helps to determine the rhythmic structure of prosodic words which has strong-weak rhythmicity. To regulate such structures, OT introduced ‘RHTYPE-T (feet have initial prominence)’ and RH-CONTOUR (a foot end on strong-weak contour at moraic level) as constraints WSP is not enough to handle the situation. In reality, these rhythmic constraints are related to the vowel quantity. WSP is only affected when ‘L’ syllable received stress in the presence of ‘H’ but a violation of WSP in ‘HH’ could not help to select either one ‘H’ or the other in Saraiki. Thus, concerning the above data stress is noted only on long vowels (never on short vowels) in Saraiki. All the examples of structure ‘CVC.CVVC’ and ‘CV.CVVC’ have short vowel unstressed. So a constraint ‘*LONG-V unstressed’ dominates WSP. With the addition of this constraint we look at the winner of next tableau: 
 
(25) 
 *LONG
 “no short vowels stressed in the presence of long vowel” 
 

 
 Input: /s.µkµ.roµµ.µ/ 
 *3µ-. 
 FT.BIN 
 FTFORM trochee 
 *FINAL-C-µ 
 All FT–R 
 *LONG-Vunstressed 
 WSP 
 MAX-IO-µ 
 PARSE-SYL 
 WBP 
 
 
 a. .s.µkµ.(`roµµ)<.> 
  
  
  
  
  
  
  
 * 
 * 
 * 
 
 
 b. (s.µkµ.`roµµ)<.> 
  
  
 *! 
  
  
  
 * 
 * 
  
 * 
 
 
 c. (`s.µkµ).roµµ<.> 
  
  
  
  
 *! 
  
  
 * 
 * 
 * 
 
 
 d. (`s.µkµ.roµµ)<.> 
  
  
  
  
  
 *! 
 * 
 * 
  
 * 
 


 
Before fixing the label of the optimal winner, let’s analyze the defeated candidates first. The second participant though has four violation tags and is rejected because of the fatal violation of foot form which is not conforming to the language requirements. The ‘c’ candidate is defeated at its first step by incurring the violation of foot direction. Though the last candidate follows the basic prosodic structure of language it meets a fatal violation. In Saraiki, stress is never assigned to a syllable with schwa or syllables that have short vowels, in the presence of long vowels. It is also common in many languages as in Dutch (van Oostendorp, 2012). Thus the 'd’ candidate could not be the winner. The first contender has three violation marks but is the winner. These are not 
the minimal violations incurred by the first participant as compared to the violations of any other competitors but lack any fatal violation. A parallel look at candidates ‘a’ and 'd’ presents the involvement of one constraint, based on which one is a winner and the other is not. This is ‘*LONG-Vunstressed’ which is responsible to evaluate the optimal winner in such syllable structures. Thus the role of WSP is confusing as suggested by Kager (2004), who suggested in (H H) foot WSP is violated either the stress falls on the first syllable or second. However, this concept is not clear in some situations: it is obvious, one foot can carry one stress, and automatically the violation of WSP occurred where the other syllable remained unstressed. It can be only possible if syllable foot binarity stands low in ranking in language. The matter of fact is this constraint is ranked high in Saraiki. Thus the motivational factor in Saraiki is not the WSP rather vowel quantity determines the stress in case of equal syllable weight. Thus we can get the final ranking hierarchy for disyllable words in Saraiki language as follows: 
 
(26) 
 *3µ,FT.BIN,FT
 


 
Since any constraint ranking represents the language as a whole, it should be equally applicable in all words of the language. Initially, we extend this to words with three syllables. As discussed earlier, the structure of three-syllable words is very simple and they are limited in number. These words are limited to CV.CV.CCV and CV.CV.CV (there is no counterexample at monomorphemic) and attract stress on the penult. Under the same constraint ranking a word from this category is given in the tableau below:  
 
(27) 
 Input:/
 *3µ-. 
 FT.BIN 
 FTFO
 All FT–R 
 *FINAL-C-µ 
 *LONG
 WSP 
 MAX-IO-µ 
 PARSE
 WBP 
 

 
 a. ...µ. (`..µ.riµ) 
  
  
  
  
  
  
  
  
 * 
  
 
 
 b. ( ..µ.`..µ).riµ 
  
  
 *! 
 * 
  
  
  
  
 * 
  
 
 
 c. ..µ. (..µ.`riµ) 
  
  
 *! 
  
  
  
  
  
 * 
  
 


 
The constraint ranking, for three-syllable words, appears to be appropriate like it was with the disyllable structures. The analysis looks as simple as the syllable structure itself is.  Candidate ‘a’ appears as optimal as it has the minimum violations. The rest of the contenders bear fatal violations of high ranked constraints, and thus rejected from the winning.  
To summarize the above analysis, we come up with the conclusion that Saraiki word prosody has the following constraint ranking and characteristics: 
*3µ>>FT.BIN>>FT-FORMtrochee>>All-FT–R>>*FINAL-C-µ>>*LONG-Vunstressed>>WSP >> MAX-IO-µ >>PARSE-SYL>>WBP 
29. Saraiki is a trochaic and quantity sensitive language. 

30. No short vowel is stressed in the presence of a long vowel. 

31. The right edge of the prosodic word must coincide with the right edge of the grammatical word. 

32. Words have only one foot: there is no secondary stress. 


4 Conclusion 
Saraiki word stress can be analyzed by using metrical phonology as well as in the context of OT in a straightforward way. The results of both theories i.e., metrical phonology and OT, lead to the conclusion that the language has a trochaic stress system and falls in the category of quantity-sensitive languages: feet are constructed based on moras. Consonant extrametricality functions at the right edge of the word. In case different syllables might bear the stress, the ones with long vowels win. Finally, stress is morphologically derived words and at sentence level requires further exploration. 
References 
Anttila, A. (1997). Deriving variation from grammar: A study of Finnish genitives. In R. v. H. Frans L. Hinskens, W. Leo Wetzels (Eds.), Variation, change and phonological theory (pp. 35-68). Amsterdam & Philadelphia: John Benjamins Publishing. 
Atta, F. (2019). Phonetics and Phonology of the Saraiki language: a descriptive exploration and an analysis from the perspective of Optimality Theory (Ph.D. dissertation), Shanghai International Studies University, Shanghai.    
Atta, F., Weijer, J. v. d., & Zhu, L. (2020). Illustrations of the IPA: Saraiki. Journal of the International Phonetic Association. 
Beckman, M. E. (1986). Stress and non-stress accent (Vol. 7). Holland/Riverton: Foris publications. 
Broselow, E. (1992). Parametric variation in Arabic dialect phonology. Paper presented at the Perspectives on Arabic linguistics IV, Amsterdam & Philadelphia. 
Cohn, A., & McCarthy, J. (1998). Alignment and parallelism in Indonesian phonology. Working Papers of the Cornell Phonetics Laboratory 12, 6, 53-137.  
Halle, M., & Vergnaud, J.-R. (1987). Stress and the cycle. Linguistic inquiry, 18(1), 45-84. 
Hall, T. A. (2002). The distribution of superheavy syllables in Standard German. The Linguistic Review, 19(4), 377-420.  
Hansen, K. C., & Hansen, L. E. (1978). The core of Pintupi grammar. Australia: Institute for Aboriginal development Alice Springs. 
Hayes, B. (1980). A Metrical Theory of Stress Rule. (Ph.D. dissertation), MIT was published in 1985 by Garland Press, New York.    
Hayes, B. (1982). Extrametricality and English stress. Linguistic inquiry, 13(2), 227-276.  
Hayes, B. (1989). Compensatory lengthening in moraic phonology. Linguistic inquiry, 20(2), 253-306.  
Hayes, B. (1995). Metrical stress theory: Principles and case studies. Chicago: University of Chicago Press. 
Inkelas, S. (1999). Exceptional stress-attracting suffixes in Turkish: representations versus the grammar. In R. Kager, H. v. d. Hulst, & W. Zonneveld (Eds.), The prosody-morphology interface (pp. 134-187). Cambridge: Cambridge University Press. 
Kager, R. (1999). Optimality Theory (1st ed.). Cambridge: Cambridge University Press. 
Kager, R. (2004). Optimality Theory (2nded.). Cambridge: Cambridge University Press. 
Liberman, M. Y., & Prince, A. (1977). On stress and linguistic rhythm. Linguistic inquiry, 8(2), 249-336.  
McCarthy, J. (1979). On stress and syllabification. Linguistic inquiry, 10(3), 443-465.  
McCarthy, J. (1986). OCP effects: Gemination and antigemination. Linguistic inquiry, 17(2), 207-263.  
Prince, A., & Smolensky, P. (1993). Optimality Theory London: Blackwell. 
Selkirk, E. O. (1980). The role of prosodic categories in English word stress. Linguistic inquiry, 11(3), 563-605.  
Sezer, E. (1981). On non-final stress in Turkish. Journal of Turkish Studies, 5, 61-69.  
Shackle, C. (1976). The Siraiki Language of central Pakistan: a reference grammar. London: School of Oriental and African studies university of London(SOAS). 
Shafeev, D. (1964). A short grammatical outline of Pashto (Vol. 33). Bloomington: Indiana University. 
Tryon, D. T. (1970). An Introduction to Maranungku (Northern Australia). Canberra, Australian National University.  
Van Oostendorp, M. (2012). Quantity and the Three-Syllable Window in Dutch Word Stress. Language and linguistics compass, 6(6), 343-358.