An analysis of simplification strategies in a reading textbook of Japanese
AS A FOREIGN LANGUAGE
Kristina HMELJAK SANGAWA
University of Ljubljana, Slovenia kristina.hmeljak@guest.arnes.si
Abstract
Reading is one of the bases of second language learning, and it can be most effective when the linguistic difficulty of the text matches the reader's level of language proficiency. The present paper reviews previous research on the readability and simplification of Japanese texts, and presents an analysis of a collection of simplified texts for learners of Japanese as a foreign language. The simplified texts are compared to their original versions to uncover different strategies used to make the texts more accessible to learners. The list of strategies thus obtained can serve as useful guidelines for assessing, selecting, and devising texts for learners of Japanese as a foreign language.
Keywords: readability; simplification; Japanese as a foreign language; textbook analysis; reading
Povzetek
Branje je eno od temeljev učenja drugega ali tujega jezika in je lahko posebej učinkovito, ko jezikovna težavnost besedila ustreza bralčevemu nivoju jezikovnega znanja. Članek nudi pregled dosedanjih raziskav na področju berljivosti in poenostavljanja japonskih besedil ter predstavlja analizo zbirke poenostavljenih besedil za učence japonščine kot tujega jezika. Iz primerjave poenostavljenih besedil z njihovo originalno različico izhaja seznam različnih strategij, ki so jih pisci uporabili, da izboljšajo dostopnost besedil za učence. Seznam teh strategij lahko služi kot iztočnica za ocenjevanje, izbiranje in sestavljanje besedil za učence japonščine kot tujega jezika.
Ključne besede: berljivost; poenostavljanje; japonščina kot tuj jezik; analiza učbenika; branje
1 Introduction
Reading is one of the bases of second language learning, and it can be most effective for the purpose of improving a reader's language skills when the text being read is not only appealing but also of the appropriate difficulty level for its reader. The development of reading skills through extensive reading can be supported on one hand by selecting appropriate material from existing texts and grading them according to objective or subjective readability criteria; and on the other hand by
Acta Linguistica Asiatica, 6(1), 2016. ISSN: 2232-3317, http://revije.ff.uni-lj.si/ala/ DOI: 10.4312/ala.6.1.9-33
10 Kristina HMELJAK SANGAWA
adapting existing texts to the level of the intended audience, i.e. simplifying and abridging existing material. Both approaches have been extensively researched and implemented for major languages, especially English (see for example DuBay 2004 for an overview), and some research has been conducted on the readability and simplification of Japanese texts. However, factors affecting readability for learners of Japanese as a second language have not been thoroughly researched yet.
In the present paper, after reviewing previous research on the readability and simplification of Japanese texts, a collection of texts that have been simplified for L2 Japanese learners is analysed and compared to the original Japanese texts from which the simplified versions were adapted, in order to investigate text characteristics that could be considered important factors in determining the readability of a text for readers of Japanese as a foreign language.
2 Readability and simplification of Japanese texts: previous research
Research in this area stems from different backgrounds and is targeted at different groups of weak readers. Some early work does not clearly specify in which context and by whom the measure is meant to be used, but most research is targeted either at young native speakers or at persons with disabilities.
Probably the earliest work on readability of Japanese texts (Morioka, 1952), inspired by Flesch's Reading Ease formula, reports on preliminary research at the National Institute for the Japanese Language to determine the criteria needed to develop a similar formula for Japanese. Regrettably, the project seems to have been discontinued.
Another early attempt at measuring the readability of Japanese texts was made by Sakamoto (1962), who manually analysed Japanese language textbooks for elementary school grades 1 to 6, using school grades as the scale of difficulty, and found that the ratio of frequent vocabulary, sentence length and the proportion of kanji characters in the text correlate with school grades.
A similar way of estimating the difficulty of written sentences is also proposed in a writing stylebook by Yasumoto (1983), who uses the average number of characters per sentence and the percentage of Chinese characters as indicators of text difficulty, but does not combine these two factors into a single formula.
Two decades after Sakamoto's research, when computers were already available for lengthier calculations, Tateishi et al. (1988a, 1989b) proposed the first readability formula for Japanese on the basis of four surface characteristics: the proportion of types of characters (Roman letters, hiragana, katakana and kanji); the length of continuous strings of the same type of character; the length of sentences; and the number of commas per sentence.
A more recent and very productive stream of research is work on readability formulae to predict the difficulty level of texts for Japanese school-children, to be
An analysis of simplification strategies in a reading textbook of Japanese ... 11
used in mother tongue education (Shibasaki & Tamaoka, 2010). Other formulae have also been developed by Sato et al. (2008, for young native speakers of Japanese), and Lee & Hasebe (2016, for learners of Japanese as a foreign language).
Another approach to the readability of Japanese, proposed by Sano and Maruyama (2008) is based on Halliday's concept of lexical density within the framework of Systemic Functional Grammar (Halliday, 1993). In this approach, lexical complexity is defined as the ratio of content words to ranking clauses in a text.
A second stream of research on Japanese readability is work on information accessibility and text simplification, aimed at facilitating communication with handicapped and elderly readers (Ichikawa, 2006), and paraphrase generation to assist handicapped readers with limited linguistic capabilities (Yamamoto et al., 2000, Inui and Yamamoto 2001, Inui and Fujita 2004, Nakano et al. 2005, Sato et al. 2004).
A third stream of research which bears on readability is work on computer-aided text revision, where readability criteria are used to highlight potentially incomprehensible passages and suggest more readable substitutions (Hayashi, 1992, Inui and Okada 2000, Ono et al. 2006, Oono and Inazumi 2007). These projects often use advice on clear writing from style manuals such as Kabashima 1979, Kinoshita 1981, Honda 1982, Mishima 1990 etc., which do not deal with numerical measurements of readability, but give hints on what factors can affect readability and should be considered in its measurement.
Linguistic factors which have been found to correlate with text readability in previous research can be divided according to the traditional levels of linguistic analysis: script (ratio of character type, punctuation, phonetic guides etc.), vocabulary, syntax (sentence length, clause length, ellpisis etc.), text and discourse (length, cohesion etc.). Statistical correlations between these factors and collections of graded texts have been described in previous research. However, the factors influencing the readability of a text for learners of Japanese have not been yet thoroughly researched. The following sections present a comparison between simplified texts and their originals and an analysis of the strategies used in this process.
3 Data
The texts are reading passages in a textbook for intermediate learners of Japanese, the second in the set of textbooks developed by the International Student Center of Sanno University: H^In^^L^^fc^ ■	Enjoyable task reading in
Japanese: Intermediate, published by Bonjinsha in 1991 and reprinted multiple times, a popular textbook for reading instruction.
These texts were chosen because they are one of the very few available collections of pairs of authentic and simplified Japanese texts targeted at foreign learners of Japanese.
12 Kristina HMELJAK SANGAWA
The reading passages included in the textbook were selected and simplified by the textbook authors, experienced teachers of Japanese as a foreign language. Selection criteria, as stated in the foreword, were: content (that should be interesting to adult learners of Japanese: worth reading, intellectually challenging, ), text type (as varied as possible, including narratives, expository and scientific writing, in order to offer learners the opportunity of practicing different reading strategies). Another criterion that is not stated in the foreword but was evidently applied, is length: not exceeding the length that can be read in a 90-minute lesson. The longest texts are approximately 1300 characters long, spanning one to two pages.
The foreword mentions that texts were rewritten for their target audience, learners of Japanese, while the afterword mentions that all textbook material was developed and used for two years in the Japanese course of Sanno University before being re-edited for publication in book form. Only vocabulary is mentioned in the foreword as a simplification criterion, but it is conceivable that strategies applied to text rewriting were based on the authors' experience as language teachers and empirically verified or found to be useful in their language classes.
The textbook has been used by the present author for Japanese language instruction in a class of 2nd year students of Japanese and received a positive response from the students, indicating that it is a good example of readable writing for students of Japanese. The pairs of texts were also given to read to a group of eight advanced learners of Japanese, who were asked to choose the easier in a pair of texts, the original and its simplified version. All participating students indicated the simplified versions as the easier to read.
In the forewords to each textbook in the series (To the teachers using this book), the authors mention vocabulary as the main criterion used both when assessing the difficulty of texts included and also when rewriting texts for their intended readers. All texts in this textbook series are graded from one to three stars (one star indicating the easiest texts and three stars indicating the most difficult texts) according to the percentage of vocabulary included in the text but not present in the vocabulary lists used as a yardstick, and thus expected not to be known by readers at the given level. These percentages and vocabulary lists are shown in Table 1 on the next page.
An analysis of simplification strategies in a reading textbook of Japanese ... 13
Table 1: Vocabulary lists used as yardstick and percentage of new vocabulary
Textbook level	No. of expected known words	Vocabulary list used as yardstick	Words not covered by yardstick list in passages marked by stars:		
			☆	☆ ☆	☆ ☆☆
Pre- intermediate (1996)	2000	list of 2030 words in Nihongo kyoiku no tame no kihon goi chosa (National Language Research Institute 1984)	5-6%	6-10%	10-20%
Intermediate (1991)	3500	unpublished vocabulary list compiled by the authors' research team	5%	5-10%	10-15%
Pre-advanced (1993)	6000	list of 2030 words in Nihongo kyoiku no tame no kihon goi chosa (National Language Research Institute, 1984)	up to 30%	up to 30%	30% or more
		list of 6000 words in Nihongo kyoiku no tame no kihon goi chosa (National Language Research Institute, 1984)	up to 15%	15% or more	15% or more
In the preface to the first volume the authors explicitly mention that vocabulary was overall the main criterion used in assessing text difficulty, while adding that the average length of sentences was also used as a secondary indicator of structural complexity, but no concrete data are given for these aspects of complexity. In the prefaces to the second and third volume in the series, only vocabulary is mentioned as the yardstick for assessing text difficulty.
Similarly, the level of proficiency which is expected from readers of each of the three volumes is defined in terms of hours (or months) of Japanese instruction received, which is supposed to reflect their vocabulary knowledge: readers who have studies Japanese for a certain period of time are expected to know a certain number of words, which should approximately correspond to the vocabulary prescribed for a certain level of the Japanese Language Proficiency Test (JF and AIEJ, 2004).
Table 2 shows the number of words which readers (learners of Japanese) are expected to know after different periods of study, as stated in the forewords.
14 Kristina HMELJAK SANGAWA
Table 2: Expected proficiency of learners of Japanese using the textbook series Enjoyable Task Reading in Japanese
Volume	Expected Japanese instruction time	Expected No. of known words	Expected JLPT level of vocabulary knowledge
Pre- intermediate	7 months - 1 year or 400 - 600 hours	2000-3500	2nd-3rd level
Intermediate	9 months - 1 year or 450 - 600 hours	3500-5000	2nd-3rd level
Pre-advanced	1 year - 1 year and a half or 600 - 900 hours	5000-10000	1st level
As can be seen from the above descriptions, the authors of the textbooks have carefully controlled the vocabulary used in the reading passages, considering it the main factor of text difficulty.
Each reading passage is also preceded by lists of 10 to 20 keywords used in the text, with exercises to learn or reinforce vocabulary knowledge, including written form (Chinese characters), morphological, syntactic and collocational patterns, again emphasising the importance of depth and breadth of vocabulary knowledge for reading comprehension.
The textbook is divided into 9 chapters, each chapter containing one or two reading passages. All reading passages were analysed, except reading no. 4, where the original text was in English and only the simplified text used as reading material was in Japanese.
The following table presents the data used in this analysis: the titles of the simplified passages as they appear in the textbook, the title of their originals, the length of both (expressed in number of characters) and their sources.
Table 3: Analysed data: simplified texts in Enjoyable Task Reading - Intermediate and their originals (length expressed in number of characters)
Chap.	Title of simplified text	Length	Original title	Length	Source
1		449		352	T^flJT^tJ 1989 ^ 6 s^B^Mtt p. 342
2		682		527	hV3 —0 9 h^* jH (ffi»fHR) firnmfê p. 18
An analysis of simplification strategies in a reading textbook of Japanese ... 15
Chap.	Title of simplified text	Length	Original title	Length	Source
3		1327		1285	tt^Mtt pp. 255~256
5	myfozn	526	myfozn	909	l^lSl (SfflffiftKte) ffift^ pp. 188-189
6	¥ &	707 829	¥ &	702 922	^B&ffl 1985 ^ 2 ft 20 B^ f ^B&ffl 1985 ^ 3 ft 1 B^fJ
7	nx?	1277	nx?	2640	pp. 73~80.
8	i¥2	1147 851	i¥i 1¥2	2408 1199	l Biif^^—X^—^i M pp. 97~100 pp. 34~35
9	ffin*	1219	ffin*	1972	77 ( pp. 136~138
	Total:	9014	Total:	12916	
4 Procedure and results
All pairs of original and simplified texts were scanned, OCR-processed and the resulting files were manually checked to correct OCR errors. Pairs of files were then automatically compared using the document comparison software JDiff X (Matsumoto, 2010), all differences found were transcribed into a spreadsheet file and marked according to type, linguistic level and content of modification. Modifications within the same sentence or clause which stem from different rewriting strategies, or are carried out at different linguistic levels, were counted separately. For example, the following rewriting of one original sentence into shorter sentences involved multiple strategies at distinct linguistic levels.
16 Kristina HMELJAK SANGAWA
Original sentence:
Literally: Having made a promise to a drinking pal who's dead, I'm executing his will that says, "always order my part too and drink it" when I go drinking. - Quotation marks are not used in Japanese.
Equivalent modified sentences:
Literally: A good friend of mine died last week, and having made a promise before he died ...
riifcj
Literally: Oh.
Literally: ... it was decided that when I go drinking, I would always order his part too and drink it.
Firstly, one strategy was simplification at the syntactic level: both adnominal clauses in the first and last part of the original complex sentence (t
drinking pal who died; ...	SI... the will that
states always order my part too and drink it... ) were split into separate simple sentences, avoiding adnominal modification, a known hurdle for learners of Japanese.
Secondly, a simplification at the discourse level retained the same entity the first person narrator) as the subject of all clauses, avoiding the shift from the first person narrator (^ boku - I) to the dead friend (using a more informal first person pronoun (^U ore - I) within a short clause of reported speech (not marked by quotes:
ore no bun mo chuumon shite nonde kure -always order my part too and drink it) and then back to the first person narrator (si Í^SLT^^ yuigon o jissen shite iru - I am executing his will), which could be confusing. This simplification also brought with it the omission of the very informal pronoun ^U (ore - I), thus resulting in a standardisation of register.
Thirdly, at the semantic level, two pieces of information were made more explicit: a concrete time setting senshuu - last week) was added, and the pronoun ^^ (boku no - my) was added to the noun	(nomitomodachi/shin'yuu
- drinking pal / good friend).
Fourthly, at the level of vocabulary, three simplifications were made by substituting the less common word	(drinking pal) with a less specific but
An analysis of simplification strategies in a reading textbook of Japanese ... 17
more common one: MS (goodfriend), and the words as^^S (yuigon o jissen -execute a will) with ft^^^^ÎT (yakusoku o jikkô - keep a promise).
Fifthly, two explicitations occurred at the script and punctuation level: the word h c* (toki - when) written in hiragana was rewritten with its commonly used and unambiguous Chinese character fë, and a comma was added after the adverb (kanarazu - always).
One further boundary explicitation was carried out by inserting a back-channelling expression (life Haa - Oh) by the other participant in the conversation, in the middle of the longest remaining sentence.
In all such cases of multiple modifications, each modification was counted and transcribed separately, resulting in a list of 815 modification occurrences in the whole corpus. All transcribed modifications are reported in Appendix 1. Repeated modifications of the same item: e.g. the rewriting of fofcL as ^A for three times in the same text was counted as 3 modifications; in such cases, the modification was transcribed once and the number of modifications was noted in the second column of the table in the appendix.
5 Analysis
Differences found between the simplified texts and their originals were grouped into three categories: simplification (including deletion), explicitation and standardisation (including visualisation). Strategies belonging to these categories were found to be used at different levels of linguistic analysis: from script, to vocabulary, morphology, syntax, semantics, to discourse and style. Let us consider each one in turn.
5.1 Simplification
Strategies of simplification were the most commonly used, amounting to 472 (of which 96 deletions) out of the 815 modifications found.
5.1.1 Script simplification
Script simplification occurred a few times, where non-standard or low-frequent Chinese characters were rewritten with hiragana:
#tet0 ^ tzTtetzo ;	^	^
V^tz;	+	iSofc ^ tfMtotz;
^	mfr< + fetztzfr< T;	+
; mtm + tzhttzfc, ^ ^ -^it^ mw + fcuèo.
18 Kristina HMELJAK SANGAWA
5.1.2 Vocabulary simplification
Vocabulary simplification was the most frequent of all changes, obtained by:
•	substituting less common with more common content words:
IfLt - #<©A^; Tfrmu - &Ll\ KZZfrt - KZftZ
-	^si^sofc; wm^ - M^; mm - zt;
t - tot; - mmtz -	miu
-	^Ll;	- g^lfcot; ASfc - ILl; -
; tmumm -	tfLt - ffiLt; A^fcL -
y; HKMfcffill&fcLt - HKMfcffillfcot;
-	-^©At£; iS©fcL - £S©®lt;	-
; -K^ - 'JXh; Ifi - iS^^^^tl« - j^Lt
frtollLfrb - Htt;	- ^K; ffi^Lt -
ffiLt; -	-	ttfcy -
I1'^;	- I:; P± - I^; AM^l - C*:^!; €
iizfc^x - I¥Lt; iiLfc - lliot; fe^tt -
tit; feCot -	SS© - #frb©; etc.
•	substituting less frequent functional vocabulary (which is usually learned later in courses of Japanese as a foreign language), with more basic expressions:
©^ - f£l+;	- ...lifrytlft
l; tl^l - l; Lfrt - ^Lt;	- tl:
©ttt - tl; Ifclt - tl; ltfrfrfrb^ -Itft^t; ©fcfr^tfeot - ^feotfrbfc; Ifetotfcy -¿dlfctfrft^ - t®:; ...lt£ - flj^ll . ...tt. ...tt; l^Ltt - lt; ^tbt...Lt^3 - ^C l...Lt^3;	- ©l; lotl^frl^ttelUbUl^ ffi
...^b - ...t; ^itlil^ ^t-L^: -	'Jiy^lt -	...IbHo
llftLt - f©^...Ib^tlt; ^tbt - ^Cl; etc.
Difficult words are sometimes substituted with an explanation or definition: S^ -— sometimes even substituted with words with a completely different meaning, if it does not change the overall gist of the text, such as II^ instead of the less frequent word	as an example of a hobby:
. frtollLfrb r^iS^tjlLtLt: - HtS^At. Ht
An analysis of simplification strategies in a reading textbook of Japanese ... 19
tLiOo
Often, the substitution of a vocabulary item brings with it also syntactical modifications, such as:
•	changes in part of speech, e.g. from noun to verb and vice-versa: W^iS^L X ^	fcbflofrl^LfXIt ^	3-P
^	fi^Wfcl^A; or from verb to adverb
to express a modal meaning:
^o ^	or from noun to adverb: ^	XW. ^ll ...^fcof;
•	shortening from phrase to word, e.g.:
•	argument distribution, resulting in the use of different particles:
Hi^Eiifcbfc^XSttLfc ^	H
^	The substitution of a difficult
verb (^W^f^) with a basic verb	here results in a change in subject,
from inanimate to animate:	tfcofc
Sometimes, vocabulary simplification is carried out by means of a paraphrase, where part of the original meaning is lost:
Vocabulary simplification at times also involves a change in cohesive devices, such as deictics instead of synonyms or paraphrases:
5.1.3 Morphological simplification
Morphological simplification could be seen in the use of more basic grammatical forms instead of markedly formal or markedly colloquial forms, such as:
20 Kristina HMELJAK SANGAWA
• substituting the formal truncated connective form of affirmative predicates with the te-form:	—
^ —	and substituting the for-
mal negative connective form with the general	Wfo^!
to ; and generally using basic instead of very formal forms:	—
5.1.4 Syntactic simplification
Syntactic simplification was also frequent, by means of:
•	dividing sentences with coordinate clauses into separate, shorter sentences:
tgiy ítJ! rfcy^toj ¿L^t^fofctffl
tito ^	^ofrfri^-. *
tg^lfo ffUtt^^b rfcy^t^j
tfUfc^o
^^yiitzt^fcoto
f^fíatí^t^fco ^ líttf^bg^roüíigf^fctóro fíat'fcofco oiy.
^Mfll© ttt^ílEtfá^Eg#
•	separating subordinate clauses and turning them into separate sentences, especially in the case of adnominal modifiers:
ttttii. ^SftA^roXfetfeofeo fr.zoifc
An analysis of simplification strategies in a reading textbook of Japanese ... 21
The separation of complex sentences into shorter, simpler ones, at times resulted in (or was motivated by a desire of) bringing subject and predicate of the original complex sentence nearer to each other:
nAtimczkZw^tzmvfo&zko
5.1.5 Discourse simplification
Discourse simplification was obtained in three cases by maintaining topic continuity and avoiding topic-shift from one agent to another:
taxutf^tmmLt^^t *
imftix^Sot^^i;
¿Ft. ^Efc^^iSofctt, liStt * fc^l-Ji^t.
ii^tsliils Z^f^tLfco
5.1.6 Deletion
Deletion was the most drastic form of simplification, used quite often: 95 instances of deletion were found, including deletion of:
•	modal forms, e.g.:
•	intensifiers, e.g.:
•	aspectual forms, e.g.:
mtXLf^St * mtS^; ^^ILt^ot *
•	extra-textual references, e.g.:
22 Kristina HMELJAK SANGAWA
•	rhetorical devices, e.g.:
•	semantically redundant, non-essential information or details, e.g.:
x<ms^fêrfrfr^ ^ msi^ax <%t; ap^^rns
ÖoX^fc;
•	redundant paraphrases, or information that can be inferred from the context, e.g.:
In some instances, whole paragraphs were omitted, such as the underlined part in the following example, which includes culturally-bound terms, where not only the words, but also the words' referents are probably not known to learners, and at the same time, being only exemplifications of the previous general statement, are not essential to convey the general meaning of the passage:
5.2 Explicitation
Strategies of explicitation were also very frequent: 203 instances of explicitation were found, encompassing all linguistic levels.
5.2.1 Semantic explicitation
Semantic explicitation was the most common, occurring in 110 cases, such as:
• adding concrete time or place settings, sometimes even with an extensive description:
An analysis of simplification strategies in a reading textbook of Japanese ... 23
; i^f.fofe, ^ Affll*
•	using a more specific word instead of a more general hypernym, e.g.:
^ ^	ZUO ^	^ iggLtUS
•	using a hypernym and adding a definition:
... fcftlfti^fcSfiAtfcot ^ ...
or adding a hypernym to a word that readers might not know, instead of a definition:	^
•	in one case even adding loan-word synonyms as furigana (here written in parentheses):
Almost half of the semantic explicitations occurred at the script level (42 cases): words that were written in hiragana in the original text, but can be and usually are written with Chinese characters, were rewritten using these characters, which made them less ambiguous, both visually, in terms of word delimitation, and semantically, distinguishing between homophones:
UoTUS ^ fot^S;	^	^ ft; life
U ^ ^¿y ^ -A; tot^S ^ ioTUS; b<H
^ ^HbtS; frfcL ^	^ B^^, etc.
5.2.2 Boundary explicitation
Boundary explicitation was the next most common type of explicitation (70 cases), mostly by means of added punctuation:
• adding commas to separate ambiguous or just long strings of hiragana:
24 Kristina HMELJAK SANGAWA
•	adding commas to separate ambiguous strings of kanji:
mim^tf ^ mm.
•	adding commas to separate phrases, often topical phrases:
•	adding commas between clauses:
•	adding parentheses for emphasis or reported speech:
......^ r^fta^u?^
•	dividing one paragraph into two: ... j
¿Sitt^xiLfc, liitt ^ ...jtS^tt^^iLfc, <p> £te;	^ ¿u^^tft^o <p>	r
^ r^o^fcft^tt 1000 RifeoL^ofctt^^. 2000 R^ttu^^^
L¿^J¿fr3o <p> fiy^^I^7000
•	inserting back-channelling expressions in dialogues:
.	<p> rtt
fcj <p>
r^-d^fr. ^^y^Lfco	ttft^uu^tttej <p> r
fcfcj <p> r^^fc.
Boundary explicitation was also obtained by splitting long complex sentences into shorter ones, which also implies syntactic simplification, as mentioned in the previous sub-section, and by using Chinese characters instead of hiragana where possible, as
An analysis of simplification strategies in a reading textbook of Japanese ... 25
mentioned in the previous subsection, to mark the delimitation between words, which is not marked by blank spaces in Japanese standard script.
5.2.3	Syntactic explicitation
Syntactic explicitation was obtained by:
•	adding an omitted argument to a predicate:
* ffliiJEitt;	ifotlio^b, o
ItV^U^o
•	substituting an intransitive expression, where all agents are not immediately obvious, with a transitive one, where agent and patient are more clear (here the semantic value of the verb is also more specific):
•	using polite or humble forms which disambiguate the subject of the predicate:
* "fflS^fcltli?; ^ofc^^ai* ^
•	adding particle 0 to split a compound noun into a noun with a nominal modifier:
Ernrnfo *	mRB* * mixoB*; afc&iij * a
Other, less common cases of explicitation, were cohesion and phonetic explicitation.
5.2.4	Cohesion
Cohesion explicitation, by adding cohesive elements, eg.:
•	deictics (^0, ^0, ±0 etc., such as in	* %ti<,±0lVt) or
•	temporal expressions (^0^,	etc.)
5.2.5	Phonetic explicitation
Phonetic explicitation by using hiragana instead of Chinese characters, where the pronunciation of the characters could be ambiguous: ^0^0^ *
26 Kristina HMELJAK SANGAWA
^^ -	or by adding furigana on difficult Chinese characters: ll,®
5.3 Standardisation
The third strategy used in the adaptation of texts for foreign language learners was the use of more standard or basic linguistic forms, i.e. forms which are usually learned at the beginning of Japanese language courses, instead of stylistically marked forms, which are usually learned later. 128 modifications were counted in this category, including the following means.
5.3.1 Script standardisation
Script standardisation, using:
• standard punctuation instead of non-standard brackets, question marks and other punctuation:
- ¿?fr^fcLÎLfcfr. ; ^-fr? - Vtfro ; fôX...... -	or using standard characters:
A+a^iLfc - 80
5.3.2 Tense levelling
Changing single predicates in non-past form to past form in texts which are otherwise written in the past form, to make the tense uniform throughout the text:
5.3.3 Formality levelling
Formality levelling from de-aru to plain style:
"Cfcofc — fiofc; or from plain to formal style: fcofc	fcUfLfc
;	—	in texts which are otherwise written in this style, to
make the style uniform throughout the text
An analysis of simplification strategies in a reading textbook of Japanese ... 27
5.3.4 Formality standardisation
Substituting colloquial or otherwise register-marked forms with standard (more polite) forms, which are generally learned earlier in Japanese language courses:
^	W^ ^	I"...
¿fôirttj ^	; ^d^^o ^ frfry^L
fco
Exceptionally, in one case the opposite occurred: a colloquial, shortened form was used instead of a politer one: iy^^ttt^^Ô^ ^	,
probably for stylistic effect.
5.3.5 Visualisation:
Throughout the texts, numbers written in Chinese characters in the original texts, printed vertically, were replaced with arabic numerals in the textbook reading passages which are printed horizontally:
Mft ^ 100 ft;	^ 100 0; Eft ^ 4ftg; E +	^ 45
^ 12 ft;	^ 17 ffiffi A.OOOR^ ^ 6000
R£;	^ 5	I + Eg ^ 54 g; "ft ^ 2ft;	^
2 ft; ^ 20 0ttl+0; " + ^ 2 0^1; " * IOOW ^ 2500 $T; HA ^ 3 A; HO ^ 3 0; --H^P ^ 1.3 ^P etc.
One modification was found that does not clearly belong to any of the four categories proposed, but could tentatively be categorised as standardisation, or could also be termed familiarisation or domestication. It was only used in one text: the setting of a story, originally happening in New York to a Mr. Steinberg, apparel vendor, was reset in Ginza, one of the most famous Tokyo districts, with Mr. Sato, publisher, as the main character:
5.4 Other modifications
Some modifications were found for which no clear motive could be guessed: they may have been made for stylistic purposes, according to the rewriter's tastes, or may be the results of multiple modifications, where the original motive became blurred in
28 Kristina HMELJAK SANGAWA
subsequent modifications. One substitution was probably just a spelling mistake, resulting in a colloquial cftSt instead of c*St in the sentence:
£t£to
Other cases where the motive for the substitution were not clear were:
•	one separation of a clause indicating reported speech into a separate paragraph, probably for dramatic effect:
okfrtej <p>
•	the substitution of a more standard full stop with a less standard comma after a polite predicate:
•	one substitution of a causal connective with a more polysemic and not easier
connective:
o
In six cases, commas separating phrases in short sentences were deleted, which is slightly surprising, given that in other 39 cases, commas were added in such positions:
BSfflitt'.lili^itotLSUSto ^ l^BtiiotLSUSto ;
f£oko ^	; ftltl
. ffitlz. f^fofco ^ fflflffilif^fofco ; ¿.Erofi ailUot^^o ^ ¿EroiStlfotU^o ;	^
6 Discussion
As could be seen in the previous section, multiple strategies were used when rewriting texts which were originally written for native speakers of Japanese, to be included in a reading textbook for intermediate learners of Japanese as a foreign language. A summary of all modifications, counting their number at different levels of linguistic analysis and by type of strategy, is given in the following tables.
An analysis of simplification strategies in a reading textbook of Japanese ... 29
Table 4: Number of modifications by level of linguistic analysis
Number of modifications by level of linguistic analysis	No. of occurrences
script	175
punctuation	56
vocabulary:	
content words	193
function words	44
vocabulary + syntax	167
morphology	20
modality	18
syntax	42
semantics	45
cohesion	10
discourse	32
formality	12
intertextuality	1
Total	815
Table 5: Number of modifications by strategy type
Number of modifications by strategy type	No. of occurrences	Percentage
simplification (of which 96 deletions (12% of total)	472	58%
explicitation	203	25%
standardisation (of which 80 visualisations)	128	16%
not categorised	12	1%
Total	815	100%
While the quantities of modifications on different linguistic levels cannot be objectively compared, since they refer to linguistic elements which occur in different scales of magnitude (the number of elements of vocabulary in a text is always larger than the number of phrases, clauses, sentences and paragraphs, thus making a comparison
30 Kristina HMELJAK SANGAWA
impossible), it is still interesting to see how modifications were made on all levels, multiple times.
The modifications at different linguistic levels confirm the central role of vocabulary as declared in the foreword to the textbook and as could be inferred from the structure of the textbook containing many vocabulary exercises. It is not, however, the only level at which modifications were made: a considerable number of modifications was made at the script level, and all other linguistic levels were also touched by the modification process. Vocabulary modifications in many cases (167) brought with them also syntactic changes, and other aspects of the text were modified irrespective of the vocabulary used: many were structural modifications, touching syntax and discourse, to simplify syntactic and discourse structures or make them more explicit, cohesive devices were introduced, and stylistic changes (standardisations) were made at the level of formality.
As for type of strategy used, simplification was the most common strategy, accounting for more than half of the occurrences. It occurred at the level of vocabulary, where less frequent words were substituted with more common synonyms, explanation, definitions or paraphrases, at the level of morphology, where less common predicate forms were substituted with more basic ones, at the level of syntax, where long sentences with coordinate and subordinate clauses were split into smaller units, at the level of discourse, where roles were switched to maintain topic continuity, paragraphs of narrative were divided and shorter conversation turns introduced in dialogues.
However, alongside simplification, explicitation was another strategy that should not be overlooked, as it accounts for one quarter of the number of modifications, indicating that the authors of the rewritings considered it a useful device and found it useful when rewriting texts for their students.
It was used on the semantic level, adding information that could otherwise be inferred from the text, or cultural background that is not likely to be know to learners, or just inventing concrete settings to help readers reconstruct the narrative being told. Semantic disambiguation also occurred at the script level, where strings of hiragana were often rewritten in Chinese characters to disambiguate homophones.
Structural explicitation was also observed at different levels: boundaries between linguistic units were made clearer by the use of punctuation, script or layout; omitted predicate arguments were made explicit, polite forms were used to disambiguate the subject of the sentence, and some phonetic information was added by means of rewriting Chinese characters in hiragana or by adding furigana.
The third strategy used, standardisation, was used at the script level, to standardise punctuation and to make the text visually more familiar (using Arabic instead of Chinese numbers), and at the discourse level, both to uniform the use of the same tense or level of formality within one text (choosing one of two possibilities, such as past/non-past, or formal/informal, which are both known to learners), or to standardise the text as a whole by removing marked forms which are typical of less
An analysis of simplification strategies in a reading textbook of Japanese ... 31
standard registers (very colloquial, literary etc.) and not likely to be known by intermediate learners.
7 Conclusion and further work
Overall, it could be seen that the rewriters used some strategies which could be applied to the simplification of texts for most weak readers of Japanese, not only foreign learners of Japanese: short sentences and frequent vocabulary are two aspects of language that have been found to be easier to read in most research on readability in different languages.
However, it is also clear that the authors (rewriters) of these texts, teachers of Japanese as a foreign language, were conscious of the typical progression of formal Japanese language instruction, and tended to prefer vocabulary, morphology and syntactic structures that are learned earlier in language courses. These are very often also the most frequent forms in the Japanese language as a whole and learned earlier by Japanese children (especially in the case of content words), but some linguistic elements, such as standard polite language (as opposed to very colloquial or very formal speech) are typical of beginning language courses for foreigners, while colloquial language (including vocabulary and contracted or otherwise colloquial morphology), which is learned quite early by Japanese children, is learned later in formal language instruction and therefore relatively difficult for foreign learners of Japanese.
All the strategies which were highlighted in this analysis could be useful as guidelines when assessing the readability of texts for foreign learners of Japanese, both in an overall assessment of readability, and when devising methods and systems to pinpoint difficult aspects of particular texts as a first step to text simplification. Especially in the first case, when assessing overall readability, i.e. grading multiple text on one scale of readability, further and more extensive analysis of the weight of each of these aspects on overall readability is needed. In both cases, it would be useful to devise a system for automatic discovery and assessment of particular aspects of readability.
References
DuBay, W. H. (2004). The principles of readability. Costa Mesa, California: Impact Information.
Halliday, M. A. K. (1993). Some grammatical problems in scientific English. In Writing Science: Literacy and Discursive Power, Pittsburgh, 1993 (pp. 69-85). University of Pittsburgh Press.
32 Kristina HMELJAK SANGAWA
Hayashi, Y. (1992). A Three-level Revision Model for Improving Japanese Bad-styled Expressions. In COLING 1992 Volume 2: The 15th International Conference on Computational Linguistics (pp. 665-671).
Honda, K. [^HM^j (1982). Nihongo no sakubun gijutu	[Japanese
writing techniquesj. Tokyo: Asahi shimbun ^
Ichikawa, A.	(2006). Fukushi jouhou gaku towa nanika	Ü^^.
Gekkan gengo Hfflsl, 35(7), 26-34.
Inui, K.	& Fujita, A. [Sffl^j (2004). likae gijutsu ni kansuru kenkyuu doukou
W^ft^fiW^ M^S^^K^ [Paraphrase research trends]. Shizen gengo shori 11(5), 151-198.
Inui, H.	] & Okada, N. [MfflM^j (2000). Nagai bun wa tsune ni wakarinikui ka?
Wakarinikusa no youin to sono izon kankei - Is a long sentence always incomprehensible? A structural analysis of readability factors
:	. NL SIG Technical reports tWM
rn^gffiftmñs&WMMm, 2000(11), 63-70.
[http://id.nii.ac.jp/1001/00048682/]
Inui, K. & Satomi, Y. (2001). Corpus-based acquisition of sentence readability ranking models for deaf people. In Natural Language Processing Pacific Rim Symposium Tokyo (pp. 205-212).
Japan Foundation [H^^^S^] & Association of International Education Japan [ 0^ H^^Wféi^] (2004). Nihongo nouryoku shiken shutsudai kijun - Japanese language proficiency test: test content specifications 0 ^tnt^ff^^^S^. Tokyo: Bonjinsha ^Aii.
Kabashima, S.	(1979). Bunshou sahou jiten	Tokyo:
Tokyodo.
Kinoshita, K. [^T^^] (1981). Rikakei no sakubun gijutsu	Tokyo:
Chuokoron shinsha
Koide, K. phft ^ ] (1991). Nihongo o manabu hitotachi no tame no Nihongo o
tanoshiku yomu hon - Enjoyable Task Reading in Japanese 0 ^m^^^Afc^fflfc L< ifc^. Tokyo: Sanno University International Student Center
Lee, J.-H. & Hasebe, Y. (2016). Readability measurement for Japanese text based on leveled corpora. In Papers on Japanese Language from an Empirical Perspective, Ljubljana: Academic Publishing Division of the Faculty of Arts, Univ. of Ljubljana.
Matsumoto, S. (2010). JDiff X Document Comparison Plug-in for Jedit X. [http://www.artman21.com/en/jdiff_x/].
Mishima, H. [Hft^] (1990). Gijutsusha, gakusei no tame no technical writing	•
•	Tokyo: Kyouristu shuppan ^^ft^.
Morioka, K. [ftll®^] (1952). Yomiyasusa no kiso kenkyuu
In Kokuritsu kokugo kenkyuujo nenpou - Annual report of National Language Research Institute	(pp. 91-108).
An analysis of simplification strategies in a reading textbook of Japanese ... 33
Nakano, T. [ +	Endo, A. [Mr$], Sugawara Sh.	Inui, K. [?£#»],
& Fujita, A. [SfflB] (2005). Lexical Paraphrasing for Improving Accessibility to the Web - Web	IEICE
technical report Welfare Information technology WIT »ff^I^ 25(25), 11-14.
Ono, T. [/J^^ff], Suganuma, A. [^^BJ], & Taniguchi, R.	(2006).
Nihongo bunshou suikou shien ni okeru kakariuke o gokai sareru bun no chuushutsu -Extraction of the sentences whose modification relation is misunderstood for a writing tool	IPSJ
SIG technical reports ff^M^WM^ FI	2006(94), 99-104.
Oono, H. [^^ff^] & Inazumi, H. [Mt^^] (2007). - Development of an education support tool for improvement of ability for sentence making
In	m 18	x^
- DEWS2007.
Sakamoto, I. —(1962). Bunshou no goi hijuu no sateihou - Readability no kenkyuu no kokoromi £#®^^tfcM®4£&---Readability ®W^®fi<^---. Dokusho kagaku	6(1), 37-44.
Sano, M. & Maruyama, T. (2008). Lexical Density in Japanese Texts: classifying text samples in the Balanced Corpus of Contemporary Written Japanese (BCCWJ). In Proceedings of ISFC 35: Voices Around the World, Sydney, 2008 (pp. 359-364).
Sato, S., Utsuro, T., Tsuchiya, M., Asaoka, M., & Matsuhoshi, S. (2004). Natural Language Processing Technologies to Enhance Readability. In Proc. of International Conference on Informatics Research for Development of Knowledge Society Infrastructure (pp. 46-53).
Sato, S., Matsuyoshi, S., & Kondoh, Y. (2008). Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus. In Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 2008.
Shibasaki, H. [^f^^] & Tamaoka, K.	(2010).	LfcJ
•	- Constructing a formula to predict school
grades 1-9 based on Japanese language school textbooks. Nihongo kyouiku kougakukai rombunshi - Japan journal of educational technology H^^Wl^^tt 33(4), 449-458.
Tateisi, Y., Ono, Y., & Yamada, H. (1988). A computer readability formula of Japanese texts for machine scoring. In Proceedings of the 12th conference on Computational linguistics (pp. 649--654). Association for Computational Linguistics.
Yamamoto, S. [^It^], Inui, K.	Nogami, M. [^±{f], Fujita, A. [SfflB],
& Inui, H.	(2000).	-
Exploring the Readability Criteria for Congenitally Deaf People: A Step toward Computer-Aided Text Reading. IPSJ SIG technical reports ff^^ifi^^W^^^ NL HsIM, 135(17), 127-134.
Yasumoto, B.	(1983). Settoku no bunsho gijutsu [Techniques of persuasive
writing]	Tokyo: Kodansha §Ss^±.