Géraldine Walther University Paris Diderot* UDK 81'366 MEASURING MORPHOLOGICAL CANONICITY 1. INTRODUCTION The question of regularity within morphological paradigms has been formerly addressed within approaches falling in the scope of Canonical Typology. This work follows in the tradition of those approaches. It adopts a strictly lexicalist approach to morphology (Karttunen 1989; Bresnan 1982), and more specifically a lexeme-based approach in the sense of (Matthews 1974). As defined in (Fradin 2003) and references therein, the lexeme is considered to be an abstract entity, defined by its morphophonological properties, its meaning and its morphosyntactic category.1 Concrete forms are built by inflectional morphology. We in particular follow the Word and Paradigm approach, e.g. Matthews, Aronoff and Stump (Matthews 1972; Zwicky 1985; Anderson 1992; Aronoff 1994; Stump 2001) and adopt a representation of forms based on the notions of stems and exponents as used by (Robins 1959) and (Matthews 1974): stems are what remains of a form once all exponents have been removed from it. In our approach, (concrete) forms are built from (abstract) lexemes through (form) realisation rules which are applied to them. Such rules are defined in inferential-realisational models such as Paradigm Function Morphology (PFM) (Stump 2001) or Network Morphology (Corbett/Fraser 1993). The aim of this paper is to provide a means for assessing the notion of morphological canonicity through original measures developed within our new morphological framework PARSLI. In particular, we introduce original measures for non-canonical phenomena such as heteroclisis, deponency, defectiveness and overabudance. 2 CANONICAL INFLECTION The concept of canonical typology can be traced back to Corbett as in (Corbett 2003). It represents an attempt to better understand what exactly differs from a hypothetic ideal canonical stage in the different occurrences of non-canonical phenomena. Note that in this approach, canonical inflection must not be mistaken for prototypical inflection. Canonical inflection is rare. It corresponds to an ideal state, seldom, if ever, met, but that constitutes a purely theoretical space from which deviant phenomena can be formally distinguished (Corbett 2007). * Author's address: Laboratoire de Linguistique Formelle, Université Paris Diderot, 175 rue du Chevaleret, 75013 Paris, France. Email: geraldine.walther@linguist.jussieu.fr 1 In particular, we also consider morphosyntactic information such as argument structure to be specified within the lexicon. Canonical inflection is supposed to represent complete regularity, as well as an ideal correspondence between form and function such that different forms can most efficiently be distinguished from each other. In particular, it is a notion that affects both the relation between the cells of a given lexeme's paradigm and the corresponding cells belonging to two different lexemes' paradigms. Canonical inflection is thus defined through the comparison of both the cells of one given lexeme and the lexemes themselves. Each cell in a given paradigm canonically shares the lexeme's stem but varies in terms of exponence depending on the morphosyntactic feature structure2 expressed by the given form. On the other hand, across lexemes, the stems will canonically vary from one paradigm to the other while the exponents will remain the same according to their expressed morphosyntactic feature structures. These relations are illustrated through Tables 1 and 2. In this work, we consider an inflectional paradigm canonical if it satisfies the criteria given in Table 3 (Corbett 2007). To these criteria we add the ones in Table 4 that further define canonical paradigm shape.3 As stated for example in (Corbett 2007), deviation from these criteria leads to non-canonical paradigmatic properties. A paradigm is considered canonical if it matches above mentioned criteria. The more it deviates from these criteria, the less canonical the paradigm. However, existing work on canonicity does not provide quantitative means to assess the degree of canonicity of a lexeme's paradigm. Such quantitative measures are the new feature we propose in section 4. We present measures for four types of non-canonical inflection phenomena, namely deponency, heteroclisis, defectiveness and overabundance. These measures are computed within the inferential realisational model for inflectional morphology PAR3U. In the following section, we first provide a short description of the major features of this model. A complete formal description can be found in Appendix B. Section 4 then goes on with presenting the measures of canonicity developed within PAR3U that allow for quantitatively assessing the canonicity of a given paradigm. FEATURES 1st p. Sing 2nd p. Sing 3rd p. Sing 1st p. Pl 2nd p. Pl 3rd p. Pl LEXEME 1 steml -ma stem1 -sa stem1 -ta stem1 -mo stem1 -so stem1 -to LEXEME 2 stem2 -ma stem2 -sa stem2 -ta stem2 -mo stem2 -so stem2 -to same different Table 1: Comparison over the cells of a given lexeme. 2 Or what (Stump 2001) would refer to as morphosyntactic property sets. 3 However, among the additional criteria, criterion 1 derives directly from criterion 2 in (Corbett 2007) and criterion 3 can be seen as derived from criterion 3 in (Corbett 2007). FEATURES 1st p. Sing 2nd p. Sing 3rd p. Sing 1st p. Pl 2nd p. Pl 3rd p. Pl LEXEME 1 stem1 -ma steml -sa steml -ta steml -mo steml -so steml -to LEXEME 2 stem2 -ma stem2 -sa stem2 -ta stem2 -mo stem2 -so stem2 -to same different Table 2: Comparison across lexemes. COMPARISON ACROSS CELLS OF A LEXEME COMPARISON ACROSS LEXEMES 1 composition/structure same same 2 lexical material (= shape of stem) same different 3 inflectional material (= shape of inflection) different same 4 outcome (= shape of inflected word) different different Table 3: Criteria for Canonical Inflection according to (Corbett 2007). CANONICAL INFLECTION 1 STEMS AND FEATURES There is no "mismatch between form and function" (Baerman 2007). Each lexeme has exactly one stem that combines with a series of exponents. 2 COMPLETENESS There exists exactly one form corresponding to the expression of a specific morphosyntactic feature structure. 3 INFLECTION CLASS All forms of a lexeme are built from one single inflection class. Table 4: Additional criteria for Canonical Inflection. 3 INTRODUCING PARSLI The name PARSLI stands for "PARadigm Shape and lexicon Interface". PARSLI is a formal model designed for representing morphological information stored within the (morphological) lexicon on the one hand and (morphological) grammar on the other and giving a description of each lexeme of a given language with regard to its own paradigm structure. It is the paradigm structure that accounts for the various non-canonical inflectional phenomena mentioned above. 3.1 Defining the relevant notions In PARSLI a lexeme is considered from the point of view of its formal participation in the inflectional process. Thus, we do not consider any specific semantics or possible derivational properties. In other words, we are here interested in the behaviour of what Fradin and Kerleroux refer to as inflectemes (Fradin/Kerleroux 2003), as opposed to lexemes, and for which a (very) simplified definition could be "a lexeme minus its semantic and argument-structural information." At this stage of its development, PARSLI does not make any claims about how exactly the realisation of the forms should be modelled. It solely focuses on the distribution of the morphological information between the morphological lexicon and the morphological grammar. The realisation of the forms by the realisation rules contained within the morphological grammar can be represented by any suitable independent inferential-realisational formalism. 3.2 Describing the PARSLI model of inflectional morphology PARSLI represents an inflecteme I4 through seven defining elements: 1. the set of morphosyntactic feature structures I can express, 2. the lexeme's morphosyntactic category, 3. an inflection pattern, 4. a stem pattern, 5. a transfer rule for stem selection, 6. a transfer rule for form realisation, 7. a pattern representing the paradigm PARSLI relies on the concept of inflection class. Note that the definition of an inflection class in PARSLI is not the traditional one, that is a particular paradigm type. In PARSLI, an inflection class is defined as a function associating morphosyntactic feature structures with corresponding realisation rules, i.e., a way to apply specific exponents corresponding to a given morphosyntactic feature structure. Each inflection class is partitioned into one or more inflection zones which are the core of PARSLI's representation of inflection. As shown below, it is the selection of these inflection zones that determines an inflecteme's paradigm shape. Inflection classes are the default associations of inflection zones that allow for computing the default paradigm structures of a language.5 Using inflection zones from different inflection classes results in heteroclisis, as shown in Section 4 below. Similarly, PARSLI also uses the concept of stem class, i.e., a function associating morphosyntactic feature structures with corresponding stem formation rules. Each stem class is partitioned into stem zones. As will be shown below, these elements allow for the realisation of one of a given lexeme's form corresponding to a given morphosyntactic feature structure. In order 4 I.e. the morphological part of a lexical entry. 5 Usually this corresponds to the most frequent combination for a language's lexical items. to illustrate the different steps of form realisation, we here outline the derivation of forms for the Italian adjectival inflecteme caro dear. However, the complete process will be clearest after reading the formal definitions and illustrations in Appendixes B and C. MASC FEM SG karo kara PL kari kare Table 5: The paradigm of caro dear in Italian. The Italian inflecteme caro can express four distinct morphosyntactic feature structures: {gender masc, number sg] {gender masc, number pl] {gender fem, number sg] {gender fem, number pl] A traditional representation of that paradigm would be as in Table 5. Stem formation 1. Each inflecteme being associated with a specific stem pattern, this stem pattern selects one particular stem zone corresponding to the morphosyntactic feature structure that is to be expressed. This stem zone is further used to obtain the stem formation rule. •For caro there is only one unique zone, associated with all four possible morphosyntactic feature structures. 2. The (stem formation) transfer rule associated with the inflecteme computes the morphosyntactic feature structure that should be given as an input to the computed stem formation rule, given the morphosyntactic feature structure that is meant to be expressed for this given inflecteme; •In the case of the inflecteme caro, the stem formation transfer rule is the identity function, i.e., its output equals its input. 3. The transformed feature structure is then associated with a specific stem formation rule through the inflection zone computed at step 1 above; • This stem formation rule is the same for all morphosyntactic feature structures applicable to the inflecteme caro. it will always compute the same stem regardless the morphosyntactic feature given as an input. 4. The stem formation rule computes the correct stem form for the inflecteme (this rule may be expressed formally with any suitable realisation based formalism). • The inflecteme caro has only one possible stem [kar]. Inflection 1. In parallel, the inflection pattern associated with the inflecteme selects a specific inflection zone for the form realisation corresponding to a specific morphosyntactic feature structure; • Given an input feature {gender fem, number sg] this zone will also be the inflection zone associated with the plural forms of the feminine. Whether or not this zone is the same for the other forms of the paradigm depends on the general structure of the language.6 2. The (form realisation) transfer rule associated with the inflecteme computes the morphosyntactic feature structure that should be given as an input to the computed realisation rule, given the morphosyntactic feature structure that is meant to be expressed for this given inflecteme; •In the case of the inflecteme caro, the form realisation transfer rule is the identity function, i. e., its output equals its input, just as for the stem formation transfer rule. 3. The transformed feature structure is then associated with a specific realisation rule through the inflection zone computed at step 1 above; •in the case of the inflecteme caro for the input feature structure {gender fem, number sg], this form realisation rule specifies the adding of the feminine singular exponent [a] to the stem. Form generation 1. Finally, the realisation rule obtained in inflection 3 is applied to the stem computed in Stem formation 4 and the transformed morphosyntactic feature structure obtained in inflection 2. It computes the correct form for a given input feature structure of the inflecteme. This realisation rule may be expressed formally with any suitable realisation based formalism. • in the case of the inflecteme caro for the inputfeature {gender fem, number sg], the realised form is thus [kara]. Note that transfer rules most often default to the identity function. Whenever they differ from the identity function, they express "a mismatch between form and function" as (Baerman 2007) puts it. They are used for modelling deponency. 6 For a more detailed representation thereof, see the representation of heteroclisis in Section 4. Inflection zones used by a given inflecteme I, i.e., the set of zones associated with it by its inflection rule, are called its inflection pattern. They build the inflecteme's paradigm. The set of I's stem zones is called its stem pattern. Inflection classes are defined as the most natural combination of inflection zones, i.e., those that are used together by a majority of inflectemes. They are default inflection patterns. Sometimes a given morphosyntactic feature structure can be associated with more than one stem zone by a stem pattern and more than one inflection rule by the inflection pattern. However, in such a case, nothing enforces that each stem zone can be equally combined with each inflection zone. The situation is even worse when transfer rules differ from the identity function. Therefore, we need a way to express the possible combinations of stem zones and inflection zones: the combinations are what we call subpatterns. These subpatterns are 4-tuples consisting of a stem zone, an inflection zone and two transfer rules. They express the possible combinations for a given inflecteme. A subpattern requires that the sets of morphosyntactic feature structures associated with the two zones have a non-empty intersection. The set of a given inflecteme's subpatterns is the inflecteme's pattern. In the following section, we will show how the measures developed within PARSLI allow for measuring the canonicity of paradigms in terms of y deponency, heteroclisis, defectiveness and overabundance.7 4 EXPRESSING AND MEASURING NON-CANONICAL PARADIGM SHAPES WITH PARSU In this section we present non-canonical phenomena affecting paradigm structures and the associated measures. 4.1 Stem alternations, allomorphy and suppletion Suppletion comes in two types: stem suppletion and form suppletion (Boyé 2006). Stem suppletion occurs whenever, inside a paradigm, the forms' exponents remain regular, but their stems vary. This is for example the case for the French verb aller to go which has four different stems, all-, v-, i- and aill-. Form suppletion corresponds to cases where a whole form is inserted in a paradigm cell that should canonically be filled by a certain stem and the exponent corresponding to this specific cell. Form suppletion is described in (Bonami/Boyé 2002) for the French verbe être to be in the present indicative. For this verb, the 1st person plural form sommes, for example, is unique in not using the regular 1st person plural exponent -ons that canonically appears with corresponding forms of other verbs (see Table 6). 7 For a list of the symbols used in the more formal definitions, please refer to Appendix A. SINGULAR PLURAL pi suis sommes p2 es êtes p3 est sont Table 6: Form suppletion in the present indicative paradigm of French être 'to be' 4.1.1 Formal definition of allomorphy Let I =(K1, CI, s3, f3, TfyTxy Pi) be an inflecteme, its stem pattern s3 associates (at least) one stem zone Zs3j{ to a morphosyntactic feature structure { 2 K^.8 A stem selection rule s allows for formally representing morphomic (in the sense of (Aronoff 1994)) structures in stem selection, such as can be observed for Latin verbs. 4.1.2 Example: Latin verb stems In Latin, the distribution of the three existing stems available for all Latin verbs is morphomic in the sense that all verbs use the same stem pattern. This stem pattern is partitioned into three stem zones. Tables 7 and 8 give a schematic representation of the three stem zones. STEM1 STEM3 STEM2 STEM1 STEM3 Table 7: Stem zones in the Latin Table 8: Stem zones in the Latin active (sub-) paradigm passive (sub-) paradigm STEM ACT. SUBPARADIGM PASS. SUBPARADIGM STEM1 imperf. finite imperf. finite STEM2 perf. finite STEM3 active future part. passive past part. perf. finite (periphr.) Table 9: Morphomic combinations between morphosyntactic features and Latin verb stems 8 Usually Çs3,{ associates all compatible morphosyntactic feature structures with one unique stem formation rule. 4.2 Deponency Some Croatian nouns use singular forms to express plural, as shown in the data presented in (Baerman 2006). This mismatch between form and function is what, following Baerman (Baerman 2007), we name deponency. As shown in (Baerman 2006), Croatian nouns are inflected according to a number of different declension classes. Some classes that are relevant for our discussion are shown in Table 10. The data shows that the nouns dete 'child' and tele 'calf inflect in the plural according to the singular pattern of respectively the A-STEM and ISTEM inflection classes. Using singular inflection to express the plural results in this mismatch between form and function.9 (feminine) a-stem (feminine) i-stem žena 'woman' stvar 'thing' singular plural singular plural NOM žen-a Zen-e stvar stvari ACC Zen-u Zen-e stvar stvari GEN žen-e Zen-a stvari stvari DAT Zen-i Zen-ama stvari stvar-ima INS Zen-om Zen-ama stvari stvar-im Table 10: Croatian noun inflection (feminine) a-stem (feminine) i-stem žena 'woman' stvar 'thing' singular plural singular plural NOM dete deca tele telad ACC dete decu tele telad GEN deteta dece teleta telad DAT detetu deci teletu teladi (ma) INS detetom decom teletom teladi (ma) Table 11: Croatian deponent noun inflection 4.2.1 Formal definition of deponency Let 1=(KJ, CJ, s3, fj, Tfy3, Txy Pj) be an inflecteme. The "mismatch between form and function" stated by (Baerman, 2007) to be the definition of deponency occurs whenever the morphosyntactic features expressed by a given inflecteme's form f do not match the morphosyntactic features { usually expressed by the realisation rule ' used to build that form of this given inflecteme. 9 For more data on deponency, the reader may refer to the large database put together by the Surrey Morphology Group: http://www.smg.surrey.ac.uk/deponency. As mentioned above, within PARSLI this means that the transfer rule differs from the identity function, or, in other words, the morphosyntactic feature structure Txj ({) expressed by f differs from the morphosyntactic feature structure {that has been associated by the appropriate inflection zone ^ through f3. More precisely, a given inflecteme I is said to be deponent iff there exists at least one form f in its paradigm P built in way such that { = TXi({). Let Pki = {K1,i, ..., K,n] be the smallest partition of K such that fa associates each { 2 Kli with the same K^r10 An inflecteme is considered to be semi-deponent iff for at least one element of Kli C Pki but not all, the restriction TXa\Kai of to K is the identity function. 4.2.2 Deponency index For the non-canonical phenomenon of deponency we can thus compute a measure of canonicity. We call this measure the deponency index. The deponency index of an inflecteme I is defined as the number of elements of the form K i in PKa such that T \K,i = id: Da = Ik, e \k,, = id}| Hence, an inflecteme is deponent iff D3 > 0. An inflecteme is semi-deponent iff |PkI > > 0. Conversely, a non-deponent inflecteme veries D3 = 0. 4.2.3 Example: Croatian nouns The Croatian data presented above can be modelled within PARSLI with a transfer rule. Hence, an inflecteme building its paradigm as described above entails a transfer rule T~xa within its definition for which txa ({number plural})={number singular}. Hence it is a semi-deponent noun: the transfer rule differs from the identity function for the morphosyntactic feature structures containing the attribute-value pair {number plural}. For those containing {number singular}, T~xa =id. In other words, for both these two nouns, the smallest partition of K such that f3 associates each { 2 K1,i with the same K^ consists of two subsets of K, one for singular feature structures (for which no deponency occurs) and one for plural ones. 10 In the case of overabundant inflectemes, this does not concern a unique zone but the same set of zones. Thus, the deponency index for these nouns is equal to 1. Since the total number of elements of that partition is 2, these lexemes are semi-deponents. MASCULINE ANIMATE chlap 'boy' MASCULINE INANIMATE dub 'oak' MASCULINE HETEROCLITE orol 'eagle' SINGULAR PLURAL SINGULAR PLURAL dub-y dub-ov dub-om dub-y dub-och dub-mi SINGULAR orol orl-a orl-ovi orl-a orl-ovi orl-om PLURAL orl-y orl-ov orl-om orl-y orl-och orl-ami NOM chlap chlap-i dub GEN chlap-a chlap-ov dub-a DAT chlap-ovi chlap-om dub-u ACC chlap-a chlap-ov dub LOC chlap-ovi chlap-och dub-e INS chlap-om chlap-mi dub-om Table 12: Heteroclisis in Slovak masculine animal names inflection 4.3 Heteroclisis Heteroclisis refers to the phenomenon where a lexeme's paradigm is built out of (at least) two, otherwise seperate, inflection classes. Examples of heteroclisis are (some) Slovak animal nouns. Indeed, in Slovak, most masculine animal nouns are inflected as masculine animate nouns in the singular, whereas they may (and for some lexemes, must) inflect as masculine inanimate nouns in the plural (except in specific cases, such as personification, which triggers the animate inflection even for plural forms) (Zauner 1973). Compare for example the inflection of chlap 'boy', dub 'oak' and orol 'eagle' in Table 12.11 4.3.1 Definition If all inflection zones associated with a given inflecteme I belong to the same inflection class Z, the inflecteme is canonical in the dimension of inflection class constitution. Conversely, if its inflection zones belong to at least two distinct inflection classes, the inflecteme is said to be heteroclite. 4.3.2 Heteroclicity Index We define an inflecteme's heteroclicity index H as the number of zones used to build an inflecteme's paradigm that are partitions of distinct inflection classes. In other words, this represents the number of inflection classes involved in the building of that inflecteme's paradigm. More precisely, is defined in the following way: 11 Both chlap and dub have a regular inflection: chlap belongs to the standard inflection class for masculine animate stems ending with a consonant, whereas dub belongs to the standard inflection class for masculine inanimate stems ending with what is called a hard or neutral consonant in the Slavic linguistic tradition. hi = e x3}| -1 where % stands for the inflection class to whose partition % belongs, and Xj for the set of zones associated with I through its inflection pattern. Thus, I is heteroclite iff Hj > 0. 4.3.3 Example 1: Slovak animal nouns For the Slovak animal nouns described in Table 12, the inflection zone used for building the singular forms of the noun orol 'eagle' is an element of the partition of the inflection class associated with animate nouns such as chlap 'boy', while the inflection zone used for the plural forms of such animal nouns belongs to the partition of the inflection class associated with inanimates like dub 'oak'. The heteroclicity index of such nouns is horúl = I I % 2 xorol] I - 1 = I %Fanimate, pl I + I %Fanimae sg I - 1 = 1 + 1 - 1 = 1 4.3.4 Example 2: Croatian nouns Similarily, the Croatian nouns from Table 11 show inflection patterns producing the inflection zones listed in Table 13. These tables show that the corresponding Croatian nouns are not only deponent, but also heteroclite. Thus, several-canonical phenomena may sometimes occur simultaneously in non-canonical paradigms. INFLECTION CLASS A: NEUTER B: (FEMININE) C: (FEMININE) STEM IN -ET STEM IN -A STEM IN -I dete 'child' SG: ? A,sg PL: ? B,sg tele 'veal' SG: ? A,sg PL: ? C,sg Table 13: Nominal inflection of Croatian heteroclite nouns 4.4 DEFECTIVENESS Defectiveness (Baerman et al. 2010) refers to lexemes which display empty (missing) cells in their paradigm. Sometimes languages contain lexemes for which expected forms are simply unexisting; native speakers would always try avoiding having to build the corresponding forms. This is for example what we can observe with some French verbs such as paître to graze for which there are no past tense forms available apart from the imperfect. Another example are the pluralia tantum described below. 4.4.1 Formal definition A paradigm is considered defective iff there is at least one morphosyntactic feature structure { belonging to the set of the morphosyntactic feature structures of the category of an inflecteme I which f3 does not associate with any inflectional zone Ç. One can also say that an inflecteme I is defective iff the set K of its morphosyntactic feature structures does not cover the set of the morphosyntactic feature structures of its category. Example: Pluralia Tantum Another example are the nouns called pluralia tantum which only exist in the plural, cf. English trousers, French vivres food supplies or Slovak vianoce Christmas. Let us take the example of the French pluralium tantum I vivres food supplies. If we only consider the number features, we get the following defined morphosyntactic feature structures: k = {number plural} while = knom = {number singular, number plural} and 4.5 Overabundance The obvious counterpart to defectiveness is the concept of overabundance. Overabundance occurs when cells of a paradigm contain more than one form. The notion has been introduced by Thornton and is discussed in (Thornton 2010) for Italian. Canonical overabundance characterises the case where cell mates of one given cell compete, without any morphological feature12 permitting to choose one over the other. Table 14 shows examples thereof for Italian verbs. CELL-MATE 1 CELL-MATE 2 'languish' 3PL.PRS.SUBJ languano languiscano 'possess' 3PL.PRS.SUBJ possiedano posseggano 'possess' 3SG.PRS.SUBJ possieda possegga 'possess' 1SG.PRS.SUBJ possiedo posseggo Table 14: Overabundance in Italian (Thornton, 2010) In French, an example is given by the verb asseoir 'to sit' that has two different forms in most cells as shown in Table 15.13 All French verbs in -ayer also exhibit systematic overabundance (see Table 16). Indeed, for some cells, these verbs may use two competing stems (in -ay- and in -ai-) and therefore have two different inflected forms, which are morphologically equivalent (although semantic, pragmatic, soci-olinguistic and other constraints may interfere). 12 Or any other type of feature. 13 See for example (Bonami/Boye 2010) for a longer discussion thereof. IND.PRES SINGULAR PLURAL IND.PRES SINGULAR PLURAL Pi p2 p3 assois assieds assois assieds assoit assied assoyons asseyons assoyez asseyez assoient asseyent P1 P2 P3 balaye balaie balayes balaies balaye balaie balayons balayez balayent balaient Table 15: Overabundance in French asseoir 'to sit' 4.5.1 Formal definition A paradigm is considered overabundant iff there is at least one morphosyntactic feature structure { belonging to the set K£3 of the morphosyntactic feature structures of the category of an inflecteme I which f3 associates with more than one inflection zone. In that case, f3 is a generic binary relation and not a function. f3({) = S, where |S| > 1. Example: Italian overabundant verbs Table 14 shows examples of overabundant Italian verbs.14 In this case, the inflecteme languire has a inflection pattern fLANGUIRE which associates the morphosyntactic feature structure { ={ 3pl.prs.subj] with two inflection zones, each producing a different realisation rule '1 and '2. These two rules thus give rise to two distinct forms within the paradigm PLANGUIRE expressing {: f1=languano and fo=langwscano. 4.6 Canonical Inflection From the definitions of non-canonical phenomena above, we can deduce the following definition of Canonical Inflection. Canonical inflection corresponds to the case where the inflection pattern f3 of an inflecteme I associates the morphosyntactic feature structures belonging to the set of morphosyntactic feature structures K for which I is defined with inflection zones that constitute the complete set of elements contained within the partition of one unique inflection class Z In particular, this entails that for all morphosyntactic feature structures {, the inflection pattern f3 associates { with one unique element of the partition of Z Moreover, the stem pattern associates every { 2 K with a stem zone Z belonging to a stem class r containing only this one stem zone Z and that produces a unique stem formation rule g, whatever {. In other words, I has a unique stem. Table 16: Overabundance in French balayer 'to sweep' 14 The data is borrowed from (Thornton 2010). Finally, the transfer rule T^ is the identity function and the set of morphosyntactic feature structures K defined for I equals the set of morphosyntactic feature structures KCl defined for I's morphosyntactic category C3 2 C. The same holds for the transfer rule 5 CONCLUSION We have presented PARSLI, a formal model of inflectional morphology. PARSLI being completely formalised, it can be implemented. Such an implementation would allow for the comparison of complete morphological descriptions with regard to their complexity. Indeed, previous experiments on complexity evaluation with PARSLI and its implementation within the Alexina lexical framework (Sagot 2010) have already been conducted (Sagot/Walther 2011). The usefulness of PARSLI to build morphological descriptions with reduced descriptive complexity has also been shown in (Walther/Sagot 2011). But most importantly, in the domain of Canonical Typology, PARSLI contains original measures that allow for quantitatively assessing the canonicity of paradigms in the sense of the qualitative caracterisation proposed by the approaches developed within Canonical Typology (Corbett 2003). A SUMMARY OF THE NOTATIONS IN PARSLI We use the folowing notations in the formal definitions: • A morphosyntactic feature structure will be noted • an inflection rule f, • an inflection class Z, • an inflection zone • a stem selection rule s, • a stem class r, • a stem zone Z, • a transfer rule T - a stem transfer rule - an inflection transfer rule • and a pattern P. B A FORMAL DEFINITIONS OF THE PARSLI MODEL The next two appendixes provide the actual formalisation underlying the PARSLI model and a formalised representation of paradigm building within the model. B.1 Phonological material An elementary sequence of phonological material e is a segmental or suprasegmental combination of sounds. The set of all elementary sequences of phonological material is noted E. B.2 Morphosyntactic features B.2.1 Features and feature structures In this document, we define a morphosyntactic feature structure { as a set of attribute-value pairs. PARSLI makes no strong assumptions about how the feature structures are organised with regard to one another. B.2.2 Morphosyntactic categories The set of all feature structures used in a given complete morphological description of an inflecteme is noted K An inflecteme I will be assigned a category depending on the feature structure it has information about: an inflecteme I from a category C will cover a subset K of the morphosyntactic feature structure set KC C Kspecific to that category. The set of all categories is noted C. B.3 Stems B.3.1 Definition A stem r is an elementary sequence of phonological material. The set of all stems is noted R. r 2 R C E B.3.2 Stem formation rule Stem formation is expressed through stem formation rules. A stem formation rule g is a function from Kto E which takes a specific morphosyntactic feature structure { 2 Kas an input so as to produce a phonological material e' expressing that feature. g : K - E g ({) = e' The set of all stem formation rules is noted X. B.3.3 Stem class A stem class r is a function from Kr C Kto X. B.3.4 Stem zones Let r be a stem class defined over a set Kr of morphosyntactic feature structures. For each r a unique partition of Kr is defined, whose members are noted Krk, such that:15 Kr = b Kr , k k 15 "l_l" denotes the union of disjoint sets. A stem zone Z for r is then defined as a pair Z = (Kr k r ) where Kr k is one element of the partition. Let Z = (Kk ,r) be a zone for r. We define the operators ~ and " as follows: Z is the second element of Z, i.e., its stem class r, and Z is the first element of Z, i.e., the corresponding element of the partition of Kr. The set of zones for a stem class r is noted Z(r). The set of all stem zones for all stem classes is noted Z Z = U Z (r) r B.3.5 Stem pattern A stem pattern is a binary relation s associating an element from a given Ks C K with one or more stem zones. A given morphosyntactic feature structure { 2 Kf will be associated through s with stem zones of the form Z = (Kr k, r ). From there we can retrieve the stem formation rule g g X corresponding to a given { 2 K£: If ({, Z) 2 s A Z = (Kk, r ), then, provided we are given a certain { 2 Kr k (be it equal to { or not), one of the corresponding stem formation rule g verifies g = Z(^) = r ({') B.4 Inflection B.4.1 Realisation rule Inflection is expressed through realisation rules. A realisation rule ' is a function from E x Kito E which takes specific phonological material e16 as an input so as to produce a modified phonological material e' in order to express a specific morphosyntactic feature structure { 2 K ' : E x K- E ' (e,{) = e' The set of all realisation rules is noted O. B.4.2 Inflection class An inflection class Z is a function from KF C Kto ©. 16 Namely the stem produced by the corresponding stem formation rule. B.4.3 Inflection zones Let Z be an inflection class defined over a set Kf of morphosyntactic feature structures. For each Z is defined a unique partition of KF, whose members are noted Kf k, such that: Kz = b KZ,k k An inflection zone % for Z is then defined as a pair % = (KZ, k ,Z where Kf k is one element of the partition. Let % = (KZk,Z) be a zone for Z. We define the operators ~ and " as follows: % is the second element of %, i.e., its inflection class Z, and % s the first element of %, i.e., the corresponding element of the partition of WF. The set of zones for an inflection class Z is noted X(Z). The set of all inflection zones for all inflection classes is noted X X = u x(z) Z B.4.4 Inflection pattern An inflection pattern is a binary relation f associating an element from a given Kf CKwith one or more inflection zones. A given morphosyntactic feature structure { G Kf will be associated through f with inflection zones of the form Z = (KZ k ,Z). From there we can retrieve the inflectional function ' G © corresponding to a given { G Kf: If ({, %) G f A % = (KZ,k,Z) then, provided we are given a certain { G Kf k (be it equal to { or not), one of the corresponding inflectional function ' verifies ' = ~ ({') = Z ({') B.5 Transfer rules We define a transfer rule Tas a function from its domain KT G Kto K. Given an inflecteme I, there are two types of transfer rules. One T^ for stem formation and one T^ for inflection. B.6 Pattern B.6.1 Subpattern A subpattern is defined for a given inflecteme I. It is a 4-tuple consisting of a stem zone Z, an inflection zone % and two transfer rules, and TXj. To be valid, a subpattern requires that the set of morphosyntactic feature structures Z 2 Kand % 2 Kassociated respectively with Z and % have a non-empty intersection. Z n 0 B.6.2 Pattern A pattern P is the set of all valid subpatterns defined for a given inflecteme 3 B.7 Inflectemes B. 7.1 Definition Formal definition of an inflecteme: An inflecteme j is a 7-tuple (Kj, Cj, sj, fj, T0T TXj, Pi), where • K is the set of morphosyntactic features { expressable by j • Cj is J morphosyntactic category, and Cj 2 C, where Cis the set of morphosyn-tactic categories that exist in a morphological description for a given language, • s3 is a stem pattern, a binary relation from Kj to Zsj, the set of stem zones compatible with J; Zsj C Z where Zis the set of all stem zones in a morphological description of a given language, • fj is a inflection pattern, binary relation from Kj to f, the set of inflection zones according to which a given inflecteme is inflected; Xfj C X where Xis the set of all inflection zones in a morphological description of a given language, • is a transfer rule, i.e., a function defined over at least all morphosyntactic feature structures { 2 Kj, such that ({) belongs to the set of morphosyn-tactic features realised through the stem zones defined for KI. • TXj is a transfer rule, i.e., a function defined over at least all morphosyntactic feature structures { 2 Kj, such that TXj ({) belongs to the set of morphosyn-tactic features realised through the inflection zones defined for KI; • Pj is a pattern, i.e., a set of subpatterns defined as a 4-tuple of the form (Z{ T0j, TXj), where Z{ is a stem zone associated with a given morphosyntactic feature structure {through s3 and an inflection zone associated to {through fj. B.8 Paradigms Let (K, C3, S3, f3, T03, Pi) be an inflecteme. B.8.1 Forms A form f is a combination of elementary sequences of phonological material. It expresses a set of morphosyntactic features { for the inflecteme 3 =(KJ, Cj, S3, fj, Tx3, P3) and is obtained from a stem r of 3 by the realisation rule ' corresponding to one of the appropriate inflection zones obtained through the inflection pattern fj. ' is then equal to f = '(rj,{') where {is the output of ({). From there, we can also express f in the following way: f = ~ (rj ,TXj({)) = ~ (~(T03({),TXj({)) B.8.2 Definition A paradigm P3 of a given inflecteme 3 is the set of all form-morphosyntactic feature structure pairs (f,{) such that { e K and f = ~(~(T03 ({),TX3 ({)) B.8.3 Formal definition of canonical inflection From the definitions of non-canonical phenomena above, we can deduce the following definition of Canonical Inflection. Definition of Canonical Inflection such that V{ £ K, fr^, Z) which means that |{f 2 Xf3}| -1 = 0 9r, such that V{ £ Kj, fj(-{, T); where T is a function independant from { and Tx3 = id and = id and K = Ki. C BUILDING A PARADIGM WITH PARSLI In this section we give a short example of how PARSLI can be used to model the building of a given inflecteme's paradigm. As an illustration we shall use the simple case of an Italian adjectival paradigm, the paradigm of the inflecteme caro dear. C.1 Definition of the inflecteme caro within the lexicon The inflecteme caro is defined within the lexicon as the 7-tuple CARO = (KCARO> CCARO> SCARO' fCARO> CARO' ^XcARO' ^CARO^ where kCARO = ({gender masc, number sg], {gender masc, number pl], {gender fem, number sg], {gender fem, number pl], ]) and the inflecteme's morphosyntactic category is ccaro = adjective Let us note Z the unique stem zone used for the building of this Italian adjective form. The stem pattern sCARO of caro associates each possible morphosyntactic feature structure defined for caro with this unique stem zone Z. sCARO = {({gender masc, number sg], Z), ({gender masc, number pl], Z), ({gender fem, number sg], Z), ({gender fem, number pl], Z)] Let us note £masc and £,fem the inflection zones used for the building of Italian adjective forms. The inflection pattern fCARO of caro associates any morphosyntactic feature structure defined for caro with either of these two inflection zones, depending on the corresponding gender feature.17 fCARO = {({gender masc], ^asc), ({gender fem], £ fem)] The inflecteme caro does not display form-function mismatches. Its transfer rules hence equal the identity function. CARO = id ^XCARO = id Having computed all the necessary elements we can now express the inflecteme's pattern PCaro. PCARO = {(Z, I masc ,t^caro ,tXcaro ), (Z, fem, caro ,tXcaro)] 17 Stating the existence of two inflection zones for Italian adjectives has been decided on the properties of some Italian nouns. It is however clear that this is a descriptional choice made by the author and that other representations would be possible as well. C.2 Building the paradigm of the infecteme caro The stem zone Z is the unique element of the default stem class for Italian adjectives. It associates each morphosyntactic feature structure within K^q with a unique stem formation rule g. Hence we can compute the stem formation rule for the inflecteme caro. 2 KCaro , G ({) = rcARO where rCARO = [kar] The two computed inflection zones Zmasc and Zfem each produce two form realisation rules. The form realisation rules allow for building the four forms belonging to the inflecteme's paradigm. 'masc({GENDER masc, number sg}) = rCAR0+mmasc—sg where rCAR0+mmasc—sg = [karo] 'masc({gender masc, number pi}) = rCAR0+mmasc —pi where rCARO+m masc —pi = [kari] 'masc([°ender fem, number sg}) = rcARo+m/em—sg where rcARo+m/em—sg = [kara] 'masc([°ender fem, number pi}) = rcARo+m/em—pi where rcARo+m/em—pi = [kare] Thus, the paradigm PCAR0 of the Italian adjective caro is: PCAR0 = {([karo], {gender masc, number sg}), ([kari], {gender masc, number pi}), {([kara], {gender fem, number sg}), ([kare],{gender fem, number pi})} References ANDERSON, Stephen R. (1992) A morphous Morphoiogy. Cambridge, UK: Cambridge University Press. ARONOFF, Mark (1994) Morphoiogy by Itseif. MIT Press. BAERMAN, Matthew (2006) Deponency in serbo-croatian. Online Database: http://www.smg.surrey.ac.uk/deponency/Examples/Serbo-Croatian.htm. Typological Database on Deponency. Surrey Morphology Group, CMC, University of Surrey. BAERMAN, Matthew (2007) "Morphological Typology of Deponency." In: M. Baerman/G. G. Corbett/D. Brown/A. Hippisley (eds.) Deponency and Morphoiogicai Mismatches, volume 145, p. 1-19. The British Academy Oxford University Press. BAERMAN, Matthew/CORBETT, Greville G. /BROWN Dunstan (2010) Defective Paradigms. Oxford, UK: Oxford University Press. Proceedings of the British Academy 145. BONAMI, Olivier/BOYE Gilles (2002) "Suppletion and dependency in inflectional morphology." In: F. V. Eynde/L. Hellan/D. Beerman (eds.) The Proceedings of the HPSG '01 Conference. Stanford, USA: CSLI Publications. BONAMI, Olivier/BOYE Gilles (2010) "La morphologie flexionnelle est-elle une fonction?" In: I. Choi-Jonin/M. Duval/O. Soutet (eds.) Typoiogie et comparatisme. Hommages offerts a Aiain Lemarechai, p. 21-35. Louvain, Belgique: Peeters. BOYÉ, Gilles (2006) "Suppletion." In: K. Brown (ed.) Encyclopedia of Language and Linguistics (2nd ed.), volume 12, p. 297-299. Oxford, UK: Elsevier. BRESNAN, Joan (ed.)(1982) The Mental Representation of Grammatical Relations. MIT Press. CORBETT, Greville G. (2003) "Agreement: the range of the phenomenon and the principles of the Surrey database of agreement." Transactions of the philological society 101: 155-202. CORBETT, Greville G. (2007) "Canonical typology, suppletion and possible words." Language 83: 8-42. CORBETT, Greville G./FRASER Norman (1993) "Network Morphology: a DATR account of Russian nominal inflection." Journal of Linguistics 29: 113-142. FRADIN, Bernard (2003) Nouvelles approches en morphologie. Paris, France: Presses Universitaires de France. FRADIN, Bernard/KERLEROUX Françoise (2003) "Troubles with lexemes." In: G. Booij/ A. R. Janet de Cesaris/Sergio Scalise (eds.) Selected papers from the Third Mediterranean Morphology Meeting, Topics in Morphology, p. 177-196. Barcelona, Spain: IULA-Universitat Pompeu Fabra. KARTTUNEN, Lauri (1989) "Radical lexicalism." In: M. R. Baltin/A. S. Kroch (eds.) Alternative Conceptions of Phrase Structure, p. 43-65. Chicago: University of Chicago Press. MATTHEWS, Peter H. (1972) Inflectional Morphology: a Theoretical Study Based on Aspect of Latin Verb Conjugation. Cambridge, UK: Cambridge University Press. MATTHEWS, Peter H. (1974) Morphology. Cambridge, UK: Cambridge University Press. ROBINS Robert H. (1959) "In defense of WP." Transactions of the Philological Society 1959, p. 116-144. SAGOT, Benoît (2010) "The Lefff, a freely available, accurate and large-coverage lexicon for French." In: Proceedings of the 7th Language Resource and Evaluation Conference. Valletta, Malta. SAGOT, Benoît /WALTHER Géraldine (2011) "Non-canonical inflection : data, formalisation and complexity measures." In: Proceedings of the workshop Systems and Frameworks in Computational Morphology (SFCM 2). Zurich, Switzerland. STUMP, Gregory T. (2001) Inflectional Morphology. A Theory of Paradigm Structure. Cambridge, UK: Cambridge University Press. THORNTON, Anna M. (2010) "Towards a typology of overabundance." Presented at the D écembrettes 7, Toulouse, France. WALTHER, Géraldine/SAGOT, Benoît (2011) "Modélisation et implementation de phénomènes non-canoniques." Revue Traitement Automatique des Langues 52(2). ZAUNER, Alfonz (1973) Praktická prírucka slovenského pravopisu. Martin, Slovakia: Vydavate lstvo Osveta. ZWICKY, Arnold M. (1985) "How to describe inflection." In: M. Niepokuj et alii (eds.) Proceedings of the Eleventh Annual Meeting of the Berkeley Linguistic Society, p. 372-386. Berkeley, USA: Berkeley Linguistic Society. Abstract MEASURING MORPHOLOGICAL CANONICITY The question of regularity within morphological paradigms has been formerly addressed within approaches falling in the scope of Canonical Typology (Corbett 2003). The aim of this paper is to provide a means for assessing the notion of morphological canonicity through original measures developed within our new morphological framework PAR3U. In particular, we introduce original measures for non-canonical phenomena such as heteroclisis, deponency, defectiveness and overabundance. We introduce PAR3U a new model for inflectional morphology using an inferential-reali-sational approach (Matthews 1974; Zwicky 1985; Anderson 1992). Our model precisely provides a formal representation of the lexicon/grammar interface. It relies on a formal definition of a lexical entry and a complete formal apparatus for computing all relevant form realisation rules for each lexeme, including stem formation rules. Realisation rules themselves may be expressed through any suitable realisation-based formalism (e.g. PFM or Network Morphology). We introduce several formal innovations such as inflection zones, that constitute partitions of given inflection classes. They are in particular used in modelling heteroclisis. Povzetek MERJENJE MORFOLOŠKE KANONIČNOSTI Vprašanja pravilnosti morfoloških paradigem so se že lotevali pristopi, ki sodijo v okvir kanonične tipologije (Corbett 2003). Cilj pričujočega članka je prispevati izvirne načine, ki bodo na podlagi meril, ki smo jih izdelali znotraj našega novega morfološkega modela PARSU, omogočali ovrednotiti pojem morfološke kanoničnosti. Še posebej pozorno pa smo vpeljali nove načine merjenja nekanoničnih pojavov, kot smo npr. heterokliza, deponentnost, nezapolnjnenost, prenapolnjenost. V članku predstavljamo PARSU, ki je nov oblikoslovni model, ki se opira na inferenčno-uresničitveni pristop (Matthews 1974; Zwicky 1985; Anderson 1992). Naš model ponuja prav formalno predstavitev slovarsko-slovničnega vmesnika. Temelji na formalni definiciji leksi-kalne iztočnice in popolnem formalnem aparatu, ki omogoča izpeljavo vseh relevantnih obli-koslovnih uresničitvenih pravil za vsak leksem, kamor sodijo tudi pravila oblikovanja osnove. Uresničitvena pravila lahko oblikujemo znotraj katerega koli ustreznega formalnega modela (na primer, teorija paradigmatskih funkcij ali morfologija mrež /ang. Network Morphology/). Vpeljemo vrsto formalnih novosti, na primer pregibna območja (ang. inflection zones), ki tvorijo dele posameznega pregibnega razreda. Posebej koristni so pri modeliranju heteroklize.