Karmen Pižorn UDK [811.111'243:373.3]:37.091.279.7 University of Ljubljana* DOI: 10.4312/linguistica.54.1.241-259 THE DEVELOPMENT OF A CEFR-BASED SCALE FOR ASSESSING YOUNG FOREIGN LANGUAGE LEARNERS' WRITING SKILLS 1 INTRODUCTION In the last two decades, many countries across the globe have begun providing foreign language instruction at primary schools, typically with English as the target language. However, English-dominant countries have also started to introduce the statutory provision of foreign language education in primary schools (DfES 2002; Evans/Fisher 2012), with the aim of ensuring that their citizens become efficient lifelong language learners. As foreign language instruction at primary school has gained popularity worldwide, educational researchers, language specialists and policymakers have expressed concern over the accountability of these programmes, and especially about the inadequate training of their teachers. Unfortunately, there are still many countries that lack appropriately trained teachers. In Vietnam, for instance, Nguyen (2011) reports that most primary school English teachers are not formally trained to teach English at the primary school level. Even where there are enough teachers, such as in Bangladesh or Nepal, many are not adequately trained, nor do they have adequate English language skills (Hamid 2010; Phyak 2011). Hasselgreen, Carlsen and Helness (2004) found that even teachers trained as language specialists expressed a great demand for training in various areas of assessment, such as "defining criteria" and "giving feedback". Thus it seems that teachers involved in primary school foreign language teaching require assistance and support in both teaching and in assessing young foreign language learners, especially when it comes to giving appropriate feedback. While learners seem to be well-motivated for communicative, humanistic and learner- and content-based teaching approaches, their language progress needs to be monitored and assessed. Some educational systems (Finland, Sweden etc.) avoid traditional large-scale achievement tests at the primary school level and, instead, strongly promote classroom-based (teacher) forms of assessment. It has been noted that the application of teacher assessment appears to vary tremendously from teacher to teacher (Goto Butler/Lee 2010). On the whole, however, teachers need to assess the performance of individual students in a way that leads to further learning. In this way, teachers are able to improve their own instruction and satisfy the different needs of young language learners. It is the purpose of this article to describe the process of developing an assessment Author's address: Pedagoška fakulteta, Univerza v Ljubljani, Kardeljeva ploščad 16, 1000 Ljubljana, Slovenia. E-mail: karmen.pizorn@pef.uni-lj.si. * 241 instrument which should support foreign language teachers in assessing writing skills and giving helpful feedback in such a way that learners will be able to develop their language proficiency. 2 ASSESSMENT FOR LEARNING Assessment covers all of those activities performed by teachers that enable the measurement of the effectiveness of teaching and learning processes. Any kind of assessment should provide a reliable answer to the question Have the students learnt what they were supposed to learn? There are three main purposes of assessment: (1) to make schools and teachers accountable for their work, (2) to issue certificates confirming students' attainment, and (3) to advance student learning and help them to improve (usually termed "assessment for learning" or "formative assessment") (Black/Harrison/ Lee/Marshall/Wiliam 2004: 10). The present article focuses mainly on the third of these purposes, which reflects the main aim of the Assessment of Young Learner Literacy (AYLLIT) project, on which this article reports. Assessment for learning has been defined as "any assessment for which the primary aim is to fulfil the purpose of enhancing students' learning" (Black/Harrison/Lee/Mar-shall/Wiliam 2004: 10). The information derived from the assessment process should be applied by teachers and students alike. In other words: An assessment functions formatively to the extent that evidence about student achievement is elicited, interpreted and used, by teachers, learners or their peers to make decisions about the next steps in instruction that are likely to be better, or better founded, than the decisions they would have made in the absence of that evidence (Wiliam 2011: 43). General principles that underlie assessment for learning, and thus enable students to improve, include: • the provision of helpful and constructive feedback to students; • the active involvement of students in their own learning; • teacher adjustments to future instruction, based on the outcome of the results of the assessment; • making learners aware of the success criteria that need to be met, in order to do well in the assessment activity (Faxon-Mills/Hamilton/Rudnick/Stecher 2013). Assessment for learning is essential for several reasons: (1) a thoughtful and well-informed classroom assessment practice ensures that students are able to achieve their educational potential; (2) formative ways of assessing students take into account variation in students' needs, interests and learning styles, and attempt to integrate assessment and learning activities; (3) a number of research studies have shown that the use of assessment to develop students' future learning makes a substantial difference, not only to students' attainment but also to their attitude towards learning, their engagement with the subject matter, and their motivation to strive for better results at school 242 (Black/William 1998; Hattie 2012; Murphy 1999); and (4) assessment for learning is viewed as closely related to instruction, and is needed to help teachers make decisions about learning and teaching processes. However, the success of any assessment process depends on the effective selection and use of appropriate tools and procedures, as well as on the proper interpretation of students' performance. 3 ASSESSING WRITING SKILLS OF YOUNG LANGUAGE LEARNERS AND THE IMPORTANCE OF VALID FEEDBACK Writing seems to be a straightforward and easy skill to assess. It provides the teacher with documentation of what the student can produce at a given time. Corrective feedback on errors may be given, and the writing may be discussed with the student and retained, to allow for subsequent comparisons between earlier or later performances. However, without a thoughtful, planned and systematic way of carrying out this assessment, it may have little formative value and can lead to imprecise summative information. For example, Stobart (2006: 141) explored conditions that may prevent the assessment from leading to further learning, and underlined the quality of feedback as critical. He established five preconditions for valid feedback to occur in the classroom: (1) it is clearly linked to the learning goals; (2) the student is able to understand the success criteria; (3) it gives an indication, at appropriate levels, on how to bridge the gap; (4) it focuses on the task, rather than the student; (5) it challenges and inspires students to do something about their progress, and it is achievable. In order to help students develop their writing skills, teachers need to be able to provide appropriate (corrective) feedback. This should be based on criteria shared between the teacher and the students (Bitchener/Ferris 2012: 124). Moreover, the student should be able to assess his/her own performance using the set criteria, and to assess his/ her progress by placing a piece of writing at a level or target point that consists of a description (descriptors) and a sample (benchmark), which illustrate the level in question. For both teacher and students, it is vital that the descriptors are interpreted in the same way. However, it is not just criteria that influence students' progress in FL writing, but also the tasks set by the teacher. Writing tasks have to be designed in a way that allows students to demonstrate their writing abilities. For example, if the task is not close to a young learners' life experiences and interests, it will not stimulate them to show their true communicative competence. The AYLLIT project, therefore, focused on the following vital issues: (1) the development of criteria, (2) the design of guidance for teachers on giving feedback to students, and (3) the design of guidelines for preparing writing tasks (AYLLIT 2007-2011). 4 WHY THE LINK WITH THE CEFR? The Common European Framework of Reference (CEFR) is a useful and increasingly-known and used tool for the assessment of languages in the classroom. It has two broad aims: to act as a stimulus for reform, innovation and reflection, and to provide 243 Common Reference Levels, to assist communication across institutional, regional and linguistic boundaries. Before the publication of the CEFR, dialogue about levels of language competence was hindered because each school, institution, testing centre or ministry described, targeted and achieved language levels in their own terms. The CEFR helps to overcome these barriers by providing a common framework for the description of levels, course planning, assessment and certification. It is used for specifying content (what is taught and assessed) and stating criteria (how performance is interpreted). It is a frame of reference, and must be adapted to fit a particular context. Linking to the CEFR means relating the features of a particular context of learning and teaching to it. Not everything in the CEFR is relevant to any given context, and there are features that may be important for a particular context, but which are not addressed by the CEFR. This is particularly true for young learners, who are not very well-covered in the descriptive scales, as these scales were developed with adults in mind, and do not take into account the cognitive stages prior to adulthood. Adaptation of the CEFR for young learners has been undertaken in many ways: for example, through primary school versions of the European Language Portfolio (ELP), and through national language curricula (Little 2006; Pizorn 2009). However, this is not a straightforward task. Papp and Salamoura (2009) report that assessors attempting to relate Cambridge Young Learners of English examinations to the CEFR found it difficult to map young language learners' performances and tasks against CEFR scales and descriptors. One of the main, and most influential, parts of the CEFR is its descriptive scheme, which embraces general competence (knowledge or skills, know-how or existential competence, and the ability to learn) and communicative language competence (linguistic, pragmatic, socio-linguistic and sociocultural). It distinguishes four categories of language activity (reception, production, interaction and mediation), four domains of language use (personal, public, occupational and educational), and three types of language use (situational context, text type, and conditions or constraints) (CEFR 2001; Little 2007). For the purposes of classroom assessment, it is necessary to be able to establish not only which tasks the student can perform but also, and importantly, how well s/he can perform them. One of the principal aims of this project was, therefore, to adapt the already-existing ELP scales, with their functional focus, by producing a CEFR-based scale with a linguistic focus. 5 THE PRE-PHASES OF THE AYLLIT PROJECT The AYLLIT materials were developed in three phases. The first, known as ECML's Bergen "Can Do" project, resulted in a scale that was a forerunner of the AYLLIT scale. The second phase was a preliminary project undertaken immediately prior to the AYL-LIT project, and the third phase was the AYLLIT project itself. In the first phase, two CEFR-based scales of descriptors were developed in Norway for the assessment of writing, as part of the National Testing of English (NTE) in 20042005 (Helness 2012). The first focused on the functional aspect of writing, while the second had a linguistic focus and was not task-specific. The latter consisted of four ca- 244 tegories - textual structure, grammar, words and phrases, and spelling and punctuation - and was primarily based on the CEFR scales of descriptors. The bands of descriptors were only formulated for whole levels (A1, A2, B1, B2), and shaded areas between these levels were given (A1/A2). Next, teachers were asked to rate the scripts using the scale. Hasselgreen (2013) reports that, on the English tests for Grade 10, the inter-rater correlation between experts and teachers was 0.81. For Grade 7, the raters were generally close in their ratings: 34% were in complete agreement, while 40% differed by only half a CEFR level. Hasselgreen (2013) gives further evidence for using the scale, by reporting on teachers' perceptions of the usefulness of the training in the use of the scale, with only 3% answering that it was not useful, while all of the others found it (very) applicable. Teachers also commented that the scale would be very useful for classroom assessment of students' writing. Thus the NTE scale proved to have a high degree of near-agreement in placing students on a CEFR-based scale, and was regarded as useful to teachers in assessing writing. However, the NTE scale was not ready to be used in the AYLLIT project, due to the levels included, and the fact that the descriptors were primarily intended for testing purposes, rather than classroom assessment. The second phase refers to a preliminary project carried out one year before the AYLLIT project, involving two Grade 5 classes (10-11 years old) in Norway. The project had two purposes. The first was to identify what students of this age could be expected to write, and what kind of assessment tools teachers would find useful. The second was to adapt the NTE scale into a form that both teachers and the project leader would find better-suited to the classroom assessment of students' writing. On the NTE scale, each band of descriptors represented a whole CEFR level, from A1 to B2. According to the research findings (Hasselgreen 2013), it was agreed that the level B2 may be cognitively beyond the reach of students at this age. Furthermore, it was felt that, in order to provide meaningful feedback and allow progress to be shown, descriptors at in-between levels should be provided. As a result, the scale was revised and six bands of descriptors (A1, A1/A2, A2, A2/B1, B1, and above B1) were included. The decision was also made to adjust the categories to include some indication of the functions a student may be expected to perform. These categories were renamed Overall Structure and Range of Information, Sentence Structure and Grammatical Accuracy, Vocabulary and Choice of Phrase, and Misformed Words and Punctuation. This work resulted in a pre-scale leading to the final AYLLIT project scale. 6 THE AYLLIT PROJECT 6.1 Introduction The third phase refers to the AYLLIT project itself, which was part of the 2008-2011 medium-term programme of the European Centre for Modern Languages (ECML), and was aimed at designing CEFR-linked guidelines and materials for primary school foreign language teachers to use in their classroom assessment of their students' reading and writing skills. The guidelines and materials for teachers were finalised following a workshop with participants from 30 European countries. Although research in the AYLLIT project 245 was qualitative, the AYLLIT material was thoroughly discussed and revised in the project group, and with teachers and students from all of the participating countries, until it was perceived to be appropriate for the context of classroom assessment. The AYLLIT project team consisted of four experts representing Lithuania, Norway (coordinator), Slovenia and Spain. Two classes of students (aged 9 at the beginning) and their teachers in each country took part in the project, over a two-year period. The common foreign language for the main part of the project was English. In each of the four countries, it was assumed that children, at this stage, are able to read and write English. There was close cooperation and regular contact between the team members and the teachers in their respective countries. The role of the teachers was to be closely-involved in the whole process: administering, assessing and commenting on writing tasks, and collecting the reactions of students. The role of the team members was to draft and assign writing tasks and procedures, to revise the scale of descriptors using samples of students, to assess students' scripts already assessed by teachers, to send scripts to schools abroad, and to collect comments from teachers using the materials. The data consisted of tasks designed and revised by team members and teachers, as well as students' writing scripts, teachers'/experts' comments, and ratings of students' texts by teachers and experts. Finally, before finalising the materials, a workshop with 30 participants (most of whom were not part of the project) was organised. 6.2 AYLLIT writing process Curricula for literacy in English in the four countries proved to be quite diverse. However, concerning foreign language writing skills, students were expected to be able to write communicatively, and at some length, on personal topics, in a descriptive and narrative way. Learners at this age should do tasks that are intrinsically motivating and challenging (McKay 2006: 250-251; Wilford 2000: 1). Cameron (2001: 156) argues in favour of writing for real communication. The idea that children are motivated when they are encouraged to talk about themselves, and to share such information with their peers from other countries through writing, was crucial to the way writing was conducted in the AYLLIT project. The writing tasks that the team designed for students reflected "can do" statements for the appropriate levels in the countries' ELPs. The initial tasks were descriptive in nature, such as introducing oneself, and sending letters and postcards from the students' towns, with attached drawings. They did not require language ability higher than around A2 on the CEFR scale, which was a fairly typical upper level for the students involved in the project. Later, the tasks became more narrative in nature, such as describing one's summer holidays. Thus, students were able to demonstrate their ability as far as B1, or slightly beyond. The students wrote three or four tasks per year. Guidelines, with rough procedural steps, were prepared for the teachers. The students were first involved in the pre-writing phase, in the form of classroom discussion and/or Power Point presentations, which helped to activate students' schematic knowledge. The pre-writing stage requires more activities on the activation of the schematic knowledge than the other two stages: the writing stage and the post-writing stage (revising and editing). In the pre-writing pro- 246 cess, the teacher should consciously activate the students' content and formal schemata (Zheng/Dai 2012: 86). After the first stage, the students received feedback and guidance from their teachers, and revised their texts to make them suitable to be sent to students from another country—for example, the Norwegian students sent their texts to the Slovene students, the Slovenes sent theirs to the Spanish, the Spanish to the Lithuanian, etc. Thus, as well as being a potential source of pleasure and discovery, writing can be a major source of language development. The actual assessment of the scripts was undertaken by the students' own teacher and a corresponding expert. 6.3 Revision of the assessment scale and feedback profile The revision of the scale of descriptors was the other major task of the AYLLIT project (see Appendix 1). The most significant revisions occurred as a result of analysing individual students' writings. Sets of three or four scripts were collected longitudinally, from a large number of students over a two-year period. A selection was then made of several of these sets, representing different students, countries and relative levels. The texts were then closely analysed, with the team members constantly referring to the drafted descriptors, and trying to answer the question, What has Student A demonstrated in his/her most recent text that s/he did not demonstrate in the previous text? In this way, valuable insight was gained into the development of the individual student's writing ability and his/her language progress. In revising the scale, other materials were used, including school curricula, comments collected from teachers, and the team members' own experiences in using the descriptors. It was also essential to ensure that the essence of the CEFR levels was preserved. Similar to the findings by Papp and Salamoura (2009: 17), it was identified that a number of students were only able to copy words or write phonetically (see Figure 1) and did not satisfy the criteria for the A1 level. It was therefore necessary to introduce a new level labelled "Approaching A1", which in some other educational contexts is referred to as the pre-A1 level (Negishi/Takada/Tono 2012). MAJNEJMIZ XXXX (a boy) AJLIV IN XXXX AJM 10 JERZ OLD AJM IN 4 KLAS AJHEB 1BRADR END 1 SISTER AJHEB PEC: 1DOG, 2 KEC, 6 BRC IND 4 FIS. Figure 1: Example of a feedback profile In order to give appropriate feedback, teachers need to be aware of the assessment criteria and learning goals. They also have to understand how to recognise and judge what constitutes writing ability, how students develop in writing, and how to use this feedback in such a way that it will actively help students to improve. Moreover, 247 teachers need to be able to assess the overall level of students' writing ability, so that students can see how they are progressing. In the AYLLIT project, teachers were asked to decide on a rough level, and only refer to the part of the scale that extended slightly above and below the selected level. It was recommended that the teacher shade all of the descriptors that seemed to apply to the student's script, in order to construct a writing profile that demonstrated the student's writing abilities. By being presented with only the relevant part of the scale, the student was able to observe the degree to which s/he had developed his/her writing skills, compare his/her own writings, and identify where s/he was heading, without being pressured by the group's achievement. This profile was intended to be used as a basis for giving feedback to students, and making learners aware of the success criteria (Faxon-Mills/Hamilton/ Rudnick/Stecher 2013: 419). The feedback was intended to reflect the four scale criteria (Overall Structure and Range of Information, Sentence Structure and Grammatical Accuracy, Vocabulary and Choice of Phrase, and Misformed Words and Punctuation) and to draw the students' attention to what they could already do, and to what further work remained to be done. Teachers were also strongly encouraged to provide feedback, in spoken interaction with the student, in the most encouraging and positive way. This is in line with a study carried out by Bitchener and Ferris (2012), which found that the combination of written and conference feedback had a significant effect on the accuracy levels of specific grammatical structures. Furthermore, Fluckiger/Vigil/ Pasco/Danielson (2010) claim that such feedback is typically formative and, as such, is intended to help students to develop, not merely to grade their performance in a task. The absence of a summative grade can reduce student anxiety and encourage risk-taking, as students perceive their errors merely as part of a work in progress. In addition, teachers were advised to give the student corrective communicative tasks related to the key weaknesses disclosed. A sample of writing, accompanied by its profile and written feedback, is given in Figure 2. Summer Holiday (a girl) This is about my summer holiday. First i travelled to xxx (a city) in xxx (a country), for one week. I travelled with my mom, dad and my hamster. But then we found out that we couldn't take the hamster with us to Denmark. But fortunately we found a nice girl who worked in the animal hospital. She offered to take care of my hamster for one week, while we were in Denmark. We travelled with car and boat to Denmark. We rented a holiday house in Denmark. It was a nice house. After one or two days we drove to a beautiful beach. It was very windy. It is not mountains in Denmark so the wind just blew everywhere. Then we went to Legoland. It was so incredible! Many LEGO houses .... So cool! And a big, cool Rollercoster. It rained that day so I didn't do so much. Then we went to Odense zoo. It was fun but the animals had to little space to walk and play! And after a while we travelled to Germany. Just for a short visit. Then at the last day in Denmark we went to see the famous Moonfish. Then we travelled back to Bergen. From xxx 248 Levels Overall Structure and Range of Information Sentence Structure and Grammatical Accuracy Vocabulary and Choice of Phrase Misformed Words and Punctuation Above B1 Is able to create quite complicated texts, using effects such as switching tense and interspersing dialogue with ease. The more common linking words are used quite skilfully. Sentences can contain a wide variety of clause types, with frequent complex clauses. Errors in basic grammar only occur from time to time. Vocabulary may be very wide, although the range is not generally sufficient to allow stylistic choices to be made. Misformed words only occur from time to time. B1 Is able to write texts on themes which do not necessarily draw only on personal experience and where the message has some complication. Common linking words are used. Is able to create quite long and varied sentences with complex phrases, e.g., adverbials. Basic grammar is more often correct than not. Vocabulary is generally made up of frequent words and phrases, but this does not seem to restrict the message. Some idiomatic phrases used appropriately. Most clauses do not contain mis-formed words, even when the text contains a wide variety and quantity of words. A2/B1 Is able to make a reasonable attempt at texts on familiar themes that are not completely straightforward, including very simple narratives. Clauses are normally linked using connectors, such as and, then, because, but. Sentences contain some longer clauses, and signs are shown of awareness of basic grammar, including a range of tenses. Vocabulary is made up of very common words, but is able to combine words and phrases to add colour and interest to the message (e.g., using adjectives). Clear evidence of awareness of some spelling and punctuation rules, but mis-formed words may occur in most sentences in more independent texts. This is quite a long narrative text, which has complicating factors, such as the episode with the hamster and how it was resolved. There is good linking, e.g., after a while, including the use of adverbs such as fortunately. We get a clear sense of what happened and her reactions, including her reservations: It was fun but the animals had to little space. She provides reasons for things: It is not mountains in Denmark so the wind just blew everywhere. Her grammar is generally correct, apart from it/there error, and travelled with car. The text lacks a certain fluency, with many very short sentences which are not well linked to the adjacent ones. The vocabulary seems sufficient to allow her to fully tell her story, and there are a few quite idiomatic phrases, such as she offered to take care of.... Her spelling is good, the only errors being 'i'and to (too). Figure 2: Sample script for B1 level, example of profile and feedback form 249 The actual assessment process of students' scripts was performed by teachers and experts. All scripts were assessed by a teacher, team member and coordinator, independently. It should be noted that the difference between levels assigned to a student's script rarely exceeded half a CEFR level, or one level in the scale. As Hasselgreen (2013: 426) notes, "Any bigger differences tended to be sporadic rather than systematic, and the three raters were all given access to each other's ratings, which acted as a form of training for all involved". In conjunction with this, a workshop was organised, attended by 30 participants from 30 countries, all of whom were directly involved in primary school language education. The focus of the workshop was to validate the scale of descriptors, discover its potential usefulness in assessing texts, and try out its appropriateness as a basis for providing feedback. The participants were asked to deliver texts written by their students. Working in groups of five, the participants were first familiarised with the CEFR. They were asked to assign isolated AYLLIT descriptors to the levels set by the AYLLIT writing assessment scale. The participants agreed with the levels assigned to the descriptors by the AYLLIT team and, thus, this activity served as a validating procedure of the descriptors/levels, as they all proved to be recognisable as belonging to the intended CEFR levels. Next, the participants were asked to assign seven texts (selected as benchmarks) to each of the AYLLIT levels, thus relating the descriptors to real texts. It was clear that the participants mostly agreed with the levels assigned by the AYLLIT team, as the overall levels never differed by more than one level above or below the level assigned by the AYLLIT team. This activity was followed by the participants working in groups with their own texts, and assigning them to the AYLLIT levels. They found this activity very useful and were able to identify appropriate descriptors in the AYLLIT scale that mirrored their students' achievement in writing. Prior to the central workshop, an online workshop took place, in which participants, with no training other than reading the material provided, rated scripts according to the AYLLIT levels. It was not surprising that there was little agreement in rating the scripts, which underlines the importance of training teachers in the use of assessment scales (Becker/Pomplun 2006: 720). The second part of the workshop aimed at providing feedback using the AYLLIT profiles. The AYLLIT team first designed samples of AYLLIT feedback (eight scripts with feedback), after which the samples were discussed in smaller groups of participants. The discussion within the groups proved beneficial in composing the final version of the AYLLIT scale, feedback profiles and guidelines. 7 OUTCOMES OF THE AYLITT PROJECT The outcomes of the AYLLIT project consist of assessment material and guidelines for its use (Hasselgreen/Kaledaite/Pizorn/Maldonado 2011 and the ECML/AYLLIT project website [AYLLIT, 2007-2011]). The key achievement is the scale of descriptors (Appendix 1), accompanied on the website by eight sample texts ranging from pre-A1 to above B1 levels. Each text is linked to its feedback profile. The guidelines 250 for assessing writing are found in Chapter 2 of the handbook (Hasselgreen/Kaledaite/ Pizom/Maldonado 2011), where teachers can find information on the assessment of young language learners' literacy, writing processes in primary school, their own needs regarding the assessment of writing, and the use of the materials and methods in the classroom. Teachers can learn how to construct a profile of the student's writing based on the AYLLIT scale, how to use this profile to stimulate learners to improve their writing abilities, how to give corrective feedback (see Figure 3), and how to use the criteria in self-assessment. As experienced teacher trainers themselves, AYLLIT team members believe that many teachers prefer face-to-face training. Thus, a step-by-step guide for teacher trainers, who wish to give workshops to novice and inexperienced teachers, is available as part of the online downloadable handbook. Summy! My summar holiday. Aim hvas in Mallorca and am sunbrathling, that was very fun! That was a experienle of the live, and am stay as a camping place, wit my Grandmum and my Grandad, and we fising and have fun that summer. We also play Gitar_and Singing and 1 day we go to shopping I don't bay so much. ±. -Spelling: copy these words cflrefully