29 THE SOUNDS OF ENGLISH
Nataša Hirci
University of Ljubljana, Slovenia
2019, Vol. 16 (1), 29–45(164)
revije.ff.uni-lj.si/elope
doi: 10.4312/elope.16.1.29-45
UDC: 811.111'355:81'25
Trainee Translators’ Perceptions of the Role 
of Pronunciation and Speech T echnologies in 
the T echnology‑Driven T ranslation Profession
ABSTRACT
We live in a world of rapid technological advances which constantly affect the work of 
professional translators. Suitable training is therefore required for future translators to be 
able to compete on the translation market. With the rise of translation technologies, new 
ideas have been put forward on how to make translators faster and more efficient. Among 
the technologies that future translators may not be adequately familiar with are speech 
recognition tools; these enable translators to dictate their sight translation and have it typed 
out, allowing more time to focus on the content. However, as with all digital tools, the 
quality of input is important; a question thus arises on the role pronunciation assumes in 
such work. The present study aimed to establish how much awareness there is amongst the 
trainee translators of the possibilities afforded by speech technologies and to explore their 
perceptions of the role played by pronunciation.
Keywords: translator training; pronunciation; speech recognition tools; trainee translators’ 
perceptions; the future of translation work
Bodoči prevajalci o vlogi izgovarjave in govornih tehnologij 
v sodobnem prevajalskem poklicu
POVZETEK
Živimo v času vse hitrejšega tehnološkega razvoja, v kar je nenehno vpeto tudi delo 
profesionalnih prevajalcev. V luči tega je nujno sprotno prilagajanje izobraževanja bodočih 
prevajalcev, da bodo primerno usposobljeni in bodo konkurenčni na prevajalskem trgu. 
S porastom sodobnih prevajalskih tehnologij se pojavljajo ideje o tem, kako bi lahko bili 
prevajalci pri svojem delu hitrejši in učinkovitejši. Eden od tehnoloških pripomočkov, ki bi 
k temu lahko pripomogel, a ga bodoči prevajalci premalo poznajo, so govorne tehnologije. S 
pomočjo prevajanja na vpogled prevajalec lahko besedilo narekuje: s tem se izogne tipkanju, 
in se bolj osredotoča na vsebino. A kot pri vseh digitalnih orodjih je pomembna kakovost 
vnosa podatkov, zato se poraja vprašanje, kakšno vlogo igra pri tem izgovarjava. V pričujoči 
študiji smo želeli raziskati, v kolikšni meri se bodoči prevajalci zavedajo možnosti, ki jih 
ponujajo govorne tehnologije, in ali imajo predstavo o vlogi, ki jo pri tem igra izgovarjava.
Ključne besede: poučevanje prevajalcev; izgovarjava; govorne tehnologije za razpoznavo 
govora; zavedanje bodočih prevajalcev; prevajalsko delo v prihodnosti
30 Nataša Hirci  Trainee Translators’ Perceptions of the Role of Pronunciation and Speech T echnologies ...
1 Introduction
The impact of new technologies on translation work over the last few decades has 
significantly changed the way people perceive the work of professional translators. 
The usual translator’s workstation or translator’s workbench no longer involves 
working only with computers and computer-assisted (CAT) tools, but may, under 
certain conditions, also involve working with machine translation (MT) and speech 
recognition technologies. According to a Stanford study (cf. Weiner 2016
1
) speaking 
is much faster than typing on a touchscreen, while typing on a computer keyboard is 
seemingly easier and faster. However, even a few years ago speech recognition software 
was criticised due to its error-prone performance which inevitably lead to spending 
too much time correcting the mistakes. It therefore seemed reasonable to assume that 
professionals who use a keyboard as part of their daily routine, translators included, 
would not be inclined to integrate into their work technologies which actually slow 
them down. However, a lot has changed since then: Nuance has produced Dragon 
Speech Recognition software, one of the leading speech recognition technologies, 
and claims that it is now able to transcribe up to 160 words per minute, which is 
also about three times faster than typing, with an enviable 99% recognition accuracy 
(cf. Dragon NaturallySpeaking
2
). This suggests speech technologies are now much 
more effective, and can perhaps make translation work more efficient. Moreover, 
any technological advantage is worth exploring to ensure that professional translators 
remain competitive on the translation market.
With the swift rise of digital innovations and artificial intelligence (AI), significant 
endeavours will constantly, and increasingly so, be put into speech technologies for 
translation undertakings, at least for fairly basic communication purposes and simple 
translation tasks, with the aim to establish basic contact and ease communication 
for those who do not speak a particular language. Students might already be aware 
of the possibilities afforded by virtual AI speech assistants such as Amazon’s Alexa, 
Microsoft’s Cortana, Google’s Assistant or Apple’s Siri, and might have tried using 
such services. Large brands are all investing heavily into voice technologies, and they 
are associated with a growing number of applications (cf. for more details on virtual 
assistants see Moren 2018). Armour (2018) reports on the data provided by Adobe 
Analytics, which indicates that “71% of owners of smart speakers like Amazon Echo 
and Google Home use voice assistants at least daily” [...] with “44% using them 
multiple times a day” while “[o]ver 76% of smart speaker owners increased their 
usage of voice assistants in the last year”. Armour (2018) also quotes Steve Rabuchin, 
VP of Amazon Alexa, who stated that the vision they have for their customers is to 
“be able to access Alexa whenever and wherever they want. This means customers may 
1
 Cf. https://www.popularmechanics.com/technology/a22684/phone-dictation-typing-speed/.
2
 Compare with data provided by Nuance at https://www.nuance.com/dragon/industry/education-solutions.html .
Trainee Translators’ Perceptions of the Role of Pronunciation and Speech T echnologies ...
31 THE SOUNDS OF ENGLISH
be able to talk to their cars, refrigerators, thermostats, lamps and all kinds of devices 
in and outside their homes”. Armour (2018) believes that “voice is the future of how 
brands will interact with their customers”. These virtual assistants are all monolingual, 
however, and do not engage in multi-lingual communication. Even so, “[t]o build 
a robust speech recognition experience, the artificial intelligence behind it has to 
become better at handling challenges such as accents and background noise. And as 
consumers are becoming increasingly more comfortable and reliant upon using voice 
to talk to their phones, cars, smart home devices, etc., voice will become a primary 
interface to the digital world and with it” (Armour 2018).
Virtual assistants no longer work only with English
3
; Cortana, for example, is currently 
also available in Chinese, French, German, Italian, Japanese, Portuguese and Spanish 
versions, making these voice technologies increasingly accessible to a much wider 
audience
4
. Even regular dictation services available to Windows and Mac users have 
the option of choosing between language varieties, with American, Australian, British 
or Canadian English, for example, already embedded while, depending on the tool, 
other varieties can easily be downloaded from the Internet. However, more time may 
be required to have languages of lesser diffusion
5
 successfully integrated into existing 
systems. Slovene is a language spoken by only about two million people, and thus is 
less likely to be automatically added to other major language options. However, there 
are some speech recognition tools available, such as Voice Notepad, which already 
have Slovene embedded, and the dictation performance is relatively accurate. This is in 
contrast to the Google T ranslate dictation option, as the quality of translation is often 
still highly questionable and the final output more frequently than not inadequate 
and unusable. There is even a virtual AI assistant SecondEGO, designed by Amebis
6
, 
and several other systems available for Slovene, which were originally created on the 
basis of large corpora and other language resources
7
, such as the speech-to-speech 
communicator VoiceTRAN
8
 or eBralec
9
 (eReader): the direction, however, is speech-
to-speech or written to spoken rather than spoken to written, which would be most 
suitable for translators. Moreover, these technologies are only available commercially 
or for research purposes (cf. Sepesy Maučec et al. 2009; Donaj and Kačič 2012; Žgank 
and Sepesy Maučec 2010; Žgank , Verdonik, and Sepesy Maučec 2016, to name just a 
few), while their non-commercial availability is still a matter for the future.
3
 Other languages are also gaining ground on the Internet (cf. Internet World Stats 2017). 
4
 For more on English and its relative share online see Holly Young’s article available at http://labs.theguardian.
com/digital-language-divide/ and Laura Gonzales‘ article available at http://uxpamagazine.org/improving-digital-
translation/. 
5
 Slovene included (cf. Pokorn 2005; Hirci 2012).
6
 Cf. https://www.amebis.si/novice/npi-2015.
7
 For more on Slovene in the digital age see Rehm and Uszkoreit (2012).
8
 Cf. http://www.alpineon.si/voicetran/slovensko/html/index.html. 
9
 Cf. https://ebralec.si/?jezik=sl. 
32 Nataša Hirci  Trainee Translators’ Perceptions of the Role of Pronunciation and Speech T echnologies ...
Still, none of these technologies are directly applicable to regular translation work as 
they are aimed at the general public to ease their daily routines. None of the virtual 
assistants are applicable to ease the tedious task of typing which has to be regularly 
undertaken by translators; translators thus need more specialised translation tools 
to facilitate their work (cf. Cronin 2013). One option that could possibly aid their 
daily routines and reduce the need for constant typing is dictation. Combined with 
sight translation it could change the way translation is habitually performed. It might 
thus be worth investigating the usability of speech-to-text technologies in translator 
training, foregrounding the time-efficiency ratio in particular. The awareness of trainee 
translators of the role of pronunciation and their familiarity with speech recognition 
technologies deserve research attention, in order to establish whether the application 
of such technologies could be motivating and beneficial for future translators. 
2 Literature Review
Professional translation work is usually associated with the written output. However, 
the spoken modality should not be neglected in today’s information society and its 
digital world, so heavily imbued with multimodality . It is therefore worth exploring the 
issues in translator training that address these modalities, spoken included, especially 
since – within the scope of interpreter training – Shlesinger (1995, 193–214) already 
maintained that “one modality can teach us about the constraints, conventions and 
norms of the other”. This suggests that sight translation, a bridge between the oral 
and written mode of translation (cf. Agrifoglio 2004), should perhaps play a more 
prominent role not only in professional translation, but also in translation pedagogy. 
So far, sight translation has been recognised as relevant in interpreting studies and 
interpreting pedagogy (cf. Agrifoglio 2004; Angelelli 1999; Li 2014; Gile [1995] 
2009; Gonzalez, Vásquez, and Mikkelson 2012; Jimenez Ivars 2008; Lambert 
2004; Mikkelson 1994; Moser-Mercer 1995; Pöchhacker 2004, 2010; Riccardi 
2002; Schlesinger 1995; Song 2010; Viaggio 1995;  Viezzi 1990; Weber 1990). 
Although there is still a fairly small body of literature focusing on the advantages 
of sight translation for written translation (cf. Baxter 2016; Dragsted and Hansen 
2009; Dragsted, Hansen, and Sørensen 2009; Dragsted, Mees, and Hansen 2011; 
Gorszczyńska 2010; Mees et al. 2013), a recent study has shown (cf. Hirci, Mikolič 
Južnič, and Pisanski Peterlin forthcoming) that engaging in sight translation for the 
purposes of written translation can result in creative, novel translation solutions, 
which gives an added value to the translation process and can make the entire process 
of translating much faster and more efficient. Some scholars have already explored 
the application of dictation in sight translation and foregrounded its benefits for 
translation work in terms of time efficiency (cf. Biela-Wolonciej 2007). Possible 
advantages were also reported by Dragsted, Mees, and Hansen (2011), who compared 
written and sight translation output with and without speech recognition software. 
33 THE SOUNDS OF ENGLISH
They concluded that with additional training and better familiarity with speech 
recognition tools, “greater time savings and higher quality are likely to be achieved 
as technical obstacles are either reduced or overcome” (Dragsted, Mees, and Hansen 
2011, 26). Baxter (2016) also investigated the application of sight translation skills 
to written translation combined with speech recognition; although there were no 
considerable time differences for the two studied groups, idiomaticity was enhanced, 
suggesting that combining sight translation with speech recognition “improves 
the spontaneity of the final text, thereby producing a more natural-sounding 
translation than the traditional W2W
10
 method” (Baxter, 2016, 14). However, the 
most interdisciplinary approach was adopted in a study by Mees et al. (2013) where 
close collaboration among phoneticians, translators and interpreters yielded sound 
grounds for further interdisciplinary cooperation, proving that speech recognition 
technologies
11
 can be successfully applied in translator training. 
In Slovenia, no study has been carried out on having speech recognition technology 
fully integrated into translation work, focusing on a hybrid which “involves crossing 
borders between translation and interpreting since the translation is produced 
orally, as in interpreting, but is visible on the screen, as in translation” (Mees et al. 
2013, 141). There is an introductory course on English phonetics and phonology for 
translators offered in year one of the undergraduate programme at the Department 
of Translation Studies in the University of Ljubljana to help students improve 
their pronunciation. As the advances in speech-to-text technology are relatively 
recent, students enrolled in the course may not be familiar with the relevance of 
pronunciation skills in technological applications, and may perceive pronunciation 
to be more important for interpreters than translators. Yet this issue is particularly 
relevant for those who may wish to use software which is heavily reliant on one’s 
pronunciation. As Nuance is claiming a 99% accuracy for its software, it needs to be 
acknowledged that such accuracy is only possible if one’s pronunciation is also highly 
accurate, otherwise the success rate of speech recognition is much lower. Near-native 
and intelligible pronunciation is required for the dictation systems to work well, at 
least for the time being, otherwise the rate of mistakes due to mispronunciation is too 
great to have such tools considered effective. However, so far the potential relevance 
of pronunciation skills for the trainee translators’ work in the translation modules 
offered later as part of the graduate programme in Translation/Interpreting has not 
yet been addressed, as none of the specialised translation courses involve working 
with speech recognition technologies. As there are built-in dictation options available 
on computers (both for Windows and Mac users) that enable working with English, 
translation modules focusing on translation from L1 to L2 could possibly benefit 
10
  W2W means written to written translation.
11
  For more details on speech recognition technology see Jurafsky and Martin (2000).
34 Nataša Hirci  Trainee Translators’ Perceptions of the Role of Pronunciation and Speech T echnologies ...
from integrating this technology into their regular translation instruction. In their 
study, Mees et al. (2013) also report on working into L2 (cf. studies by Dragsted 
and Hansen 2009; Dragsted, Mees, and Hansen 2011). In Denmark, and the rest 
of Scandinavia, where, according to Phillipson (2003, 96) there are “good grounds 
for referring to English as a second language rather than a foreign language”, working 
into L2 is not perceived as unusual. Both Danish as well as Slovene are comparable 
in this respect, as they can both be considered as languages of lesser diffusion, so 
L1 to L2 translation (cf. Pokorn 2005; Hirci 2012) is not uncommon in Slovenia 
either. In fact, children in Slovenia start learning English as part of their primary 
school curriculum at the age of six. Films and TV shows are regularly subtitled rather 
than dubbed, and Slovene translators work into both directions, L2 to L1 as well 
as L1 to L2. Many professional translators in Slovenia find themselves in a position 
where they are required to undertake translation into L2, English in particular, on 
a regular basis, since there is a serious shortage of native English speakers working 
with Slovene. Thus training is necessary in the L2 direction and is offered as part of 
the translator training curriculum at the Department of Translation Studies in the 
University of Ljubljana. 
2.1 Future Prospects – More Work with Speech Recognition Systems?
So far, no research has been undertaken in Slovenia to explore working with speech 
recognition systems focusing on time efficiency in translation. However, a study was 
carried out on the possible benefits of applying speech recognition technologies in 
the pronunciation training of non-native speakers of English. Šuštaršič (2005, 87) 
investigated some software packages to explore their “usability within an English 
phonetics curriculum for EFL learners at the university level” that can be applied to 
pronunciation training. Šuštaršič (2005, 93–97) suggested that “speech recognition 
can be applied in phonetics (or more precisely, in pronunciation) teaching, and that 
a number of aspects of articulatory and auditory phonetic principles can be observed 
in the way that speech recognition programs transfer (or fail to transfer) the received 
speech signals into written form.” He pointed out that “using any speech recognition 
program with English pronunciation students has several other justifications. Firstly, 
the program needs to be trained to one’s voice, which requires a great deal of loud 
reading. […] The basic rule is: the more you train the program (i.e. the more you 
read), the higher will be the accuracy of recognition, and thus the usefulness of the 
program for any practical task.” Šuštaršič (2005, 98) also suggested that students 
can be encouraged to record their own speech and apply a speech recognition 
programme to convert it into a written text, an idea which in itself is closely related 
to sight translation from Slovene into English. Šuštaršič (2005) reported working 
with commercial speech recognition technologies such as Via Voice and Dragon’s 
NaturallySpeaking, which, however, are not freely available. A cost-free option 
35 THE SOUNDS OF ENGLISH
nowadays is to simply activate the automatically built-in dictation option on the 
computer (either for Windows or Mac users), as it comes at no additional price, and 
explore its usability before obtaining some more sophisticated commercial software.
Drawing on Mees et al. (2013) and Šuštaršič (2005), a study was thus conceived to 
explore the possible benefits of using speech technologies in translator training for 
two reasons:
•	 to improve trainee translators’ pronunciation,
•	 to use speech instead of typing to speed up the process of translation.
3 Study Design and Methodology
The present study was designed to explore the trainee translators’ perceptions of the 
role of English pronunciation, as well as their familiarity with speech recognition 
tools, to establish whether or not it might be viable to introduce such technologies 
into translator training at the University of Ljubljana.
3.1 Methodology and Participants
An online questionnaire was designed for the purposes of the present study to 
foreground the perceptions of both undergraduate and graduate trainee translators 
studying at the Department of Translation Studies at the Faculty of Arts, University 
of Ljubljana, in the academic year of 2018/2019. 
3.2 Data Collection
The questionnaire was made available online for 18 days, between 4 January 2019 
and 22 January 2019, with a total of 94 participants taking part in the study. The 
questionnaire, designed using the online Google Forms survey mode, consists of 18 
questions. The first part of the survey aims to collect general information about the 
participants, eliciting data on their age, gender and year of study. The second part of 
the questionnaire explores the participants’ perceptions and self-awareness of their 
own pronunciation and their familiarity with the existing speech-to-text technologies 
that might prove to be useful in their future profession. 
The trainee translators were asked to respond to several statements referring to their 
perceptions of the role pronunciation in English and their aspirations to improve it (i.e. 
a total of nine questions corresponding to yes/no answers, and four statements using 
a five-point Likert-type scale, ranging from “totally unmotivated” = 1 to “extremely 
motivated” = 5 related to the participants’ motivation to have good pronunciation of 
English, from “the least important” = 1 to “the most important” = 5 on how important 
they find pronunciation in relation to other language skills, from “extremely poor” = 
1 to “excellent” = 5 on how they would rate their own pronunciation at the time of 
36 Nataša Hirci  Trainee Translators’ Perceptions of the Role of Pronunciation and Speech T echnologies ...
Figure 1. Participants in the study (N=94).
filling out the questionnaire, and finally from “do not aspire to this at all” = 1 to “aspire 
to this 100%” = 5 on how much they aspire to have a near-native pronunciation of 
English). 
Additional information on the existing speech recognition tools and the students’ 
experience with the application of these technologies to their work was elicited 
using a number of multiple choice questions. The participants were also encouraged 
to provide additional comments on the possible benefits of using speech-to-text 
technologies in the final section of the questionnaire.
4 Results and Discussion
This section reports on the results of the questionnaire completed by the participants 
of the study. First, general demographic information on the participants is provided, 
followed by the data related to their pronunciation and awareness of speech 
recognition technologies. Due to the limited scope of this paper only those results 
that directly address the topic are discussed in detail.
4.1 General Information on the Participants
The study involved 94 participants, of whom all completed the questionnaire in 
full. All of the participants are either undergraduate BA students of Interlingual 
Mediation, or graduate MA students of T ranslation/Interpreting in the University of 
Ljubljana (cf. Figure 1). Of the 94 participants, 76 were female and 18 were male, 
and all were aged between 17 and 26 (average 21). 
Most participants (41, i.e. 43.6%) are enrolled in year 1 of the BA in Interlingual 
Mediation, with 14 (14.9%) respondents from year 2 of the BA in Interlingual 
Mediation, and 16 (17%) respondents from year 3 of the BA in Interlingual Mediation 
(cf. Figure 1). At the graduate level, there were 15 (16%) participants from MA I in 
T ranslation, three (3.2%) from MA I in Interpreting, and five (5.3%) from MA II in 
T ranslation (there is no MA II in Interpreting available for this academic year).
37 THE SOUNDS OF ENGLISH
4.2 Specific Information on Pronunciation
Importance to speak English well
As evident from the results of the questionnaire, all of the participants believe that 
it is important to speak English well to make a good impression on their clients and 
employers, and all but one believe the same is important to be a successful interpreter, 
while 88 out of 94 participants (i.e. 93.6%) were of the opinion that this is also 
important for translators (cf. Hirci 2017). In addition, 90 (95.7%) respondents think 
that it is important to speak well to sound professional, and 83 (88.3%) to be able to 
use speech recognition tools more easily.
Significance of speaking English well
The participants seem to have rather diverse views on what speaking English well 
actually means. Most of the participants, i.e. 89 (94.7%), agreed that this meant 
having pronunciation which is intelligible and easy-to-understand, with 65 (69.1%) 
believing it meant speaking with an accent which is close to standard varieties of 
English. Fewer than half of the respondents in all (45 or 47.9%) believe that this 
meant having a native-like pronunciation.
Motivation to have a good pronunciation of English
The questionnaire yielded an insight into the participants’ motivation with regard to 
having good pronunciation: the results show that over half of the participants (52 or 
55.3%) are extremely motivated and an additional 30 (31.9%) are very motivated 
to have a good pronunciation of English (a mean score
12
 of 4.4, cf. Table 1), which 
confirms that the respondents regard having good pronunciation in English as 
essential for their future profession.
Table 1. Mean scores for pronunciation.
Perceptions about pronunciation Mean score
Motivation to have a good pronunciation of English 4.4
Importance of pronunciation compared to other language skills 3.8
Assessment of own pronunciation 3.5
Aspirations to improve their pronunciation 4.3
When asked about how important they find pronunciation compared to other 
language skills, the participants showed considerable agreement that pronunciation 
skills are quite important (a mean score of 3.8).
12 
 The central tendency for each Likert-type statement was summarised using the mean score.
38 Nataša Hirci  Trainee Translators’ Perceptions of the Role of Pronunciation and Speech T echnologies ...
Figure 2. Speed of speaking v typing (N=94).
The participants’ replies furthermore revealed that they tend to aspire to have English 
pronunciation which is intelligible yet close to one of the English standards. They 
deemed their own pronunciation at the time of filling out the questionnaire as 
only “good” or “fairly good”, while only two participants considered it “excellent”. 
Three participants even believed their pronunciation was “extremely poor” or 
“rather poor” (mean score 3.5). The responses revealed that over half (54.3%) of 
the participants have extremely high aspirations to improve their pronunciation, 
and an additional 29.8% of the participants have high aspirations to improve it 
(mean score 4.3).
These results are quite valuable, as they reveal that most participants are aware of 
the significance of having a good pronunciation of English. Whether they see a 
correlation with speech-to-text technologies, however, is yet to be explored. As clear, 
accurate and intelligible pronunciation is required to have speech recognition systems 
work well, at least for the time being, improving non-native English pronunciation 
is undoubtedly worth investing time and effort into if we also wish to gain from the 
advantages afforded by such technologies. 
4.3 Specific Information on Speech Recognition Technologies
We wished to establish if the respondents were aware of the differences in speed 
as related to speech and typing. According to Nuance’s Dragon speech recognition 
software, speaking is three times faster than typing. Most respondents of this study, 
i.e. 45 (47.9%), believed that speech was two times faster than typing, while 37 
(39.4%) participants in fact responded that it was actually three times faster. Only 
two participants were of the opinion that speaking was slower than typing, three 
assumed that it was four times faster, while another four responded that these two 
activities were both of equal speed (cf. Figure 2, where responses are provided as 
option Other, after the option 4x faster).
39 THE SOUNDS OF ENGLISH
It was no surprise to see that almost half of the participants (44.7%) responded 
that they have already used the built-in dictation software on their smartphones; 
nevertheless, the number is much lower for computers, where only 11 of participants 
out of 94 reported using this technology. Interestingly enough, 28 of the participants 
reported that their dictation was successful, or at least sometimes or to some extent. It 
is fair to assume that with more accurate pronunciation of English the perception of 
the success rate would most likely be even higher. Some participants also pointed out 
that they used the dictation option only on their smartphones, without ever realising 
that this was also possible on their computers.
In all, 70 (74.5%) of the participants responded that they would consider using 
dictation in their translation work; even more, i.e. 84 (89.4%) believed that it would 
be useful to work with speech recognition tools as part of their translator training 
at the university. In additional, individual comments, the participants provided 
a number of reasons why they assumed it would be useful to work with speech 
recognition tools as part of translator training (cf. Figure 3). 
P14: “It could improve the student’s pronunciation skills and, more importantly, the proper 
flow of speech.”
P8: “Speech recognition tools are great for improving ones pronunciation and I think we should 
focuse on that and phonetics in general more thoroughly.”
P4: “I believe that students should be familiar with any translation- or language-related 
technology. This can be useful in their careers.”
P53: “I think that such thing as a speech recognition tool would help me a lot with my poor 
pronunciation.”
P16: “Working with these tools would improve our pronounciation.”
P15: “I think we would be able to translate everything faster. And we would also practice 
our pronunciation and expand our vocabulary, because when we say something outloud, we 
remember it faster.”
P18: “So that we learn different approaches to translating and figure out for ourselves which 
best suits us. Also I think it is less time consuming than typing and prevents you from making 
spelling mistakes”
P23: “It’s a tool that is becoming increasingly popular and it could potentially make future 
work easier.”
P29: “Knowledge of new technologies is always useful, the more you know the more you can 
learn, new skills can easily improve our employability, variation of skills is important for 
adapting to the market”
P19: “The more education we get - conected to our studies and technology connected to 
languages – the better.”
P20: “Speech recognition tools are developing and becoming a bigger part of our everyday life”.
40 Nataša Hirci  Trainee Translators’ Perceptions of the Role of Pronunciation and Speech T echnologies ...
P42: “It would improve out studying and it would be a variation of “teaching” that is not 
often used.”
P49: “Because any aspect of the translation work that we are presented is welcome and useful. 
Anything that we learn might come in handy and we are better because of each of those 
experiences.”
P56: “So that we learn different techniques and figure out which approach best suits us.”
P59: “The advancement of technology will impose these tools sooner or later and it would be 
best if the new generations of translators and interpreters had mandatory training with them.”
Figure 3. Comments provided by the participants.
13
These comments show that there is already some degree of awareness amongst the 
trainee translator population of the possible advantages associated with the integration 
of speech recognition technologies into translator training.
The results of the questionnaire related to the various types of speech recognition 
software that the participants might have heard of are specified in Figure 4. The 
most frequently recognised speech recognition technologies were Windows Speech 
Recognition (60), Apple’s dictation (49) and Google Docs Voice Typing (48), 
followed by IBM’s Speech to Text (38), Amazon’s Transcribe (27) and Speechnotes 
(25). The other speech-to-text tools (such as Via Voice, Dragon NaturallySpeaking 
or Voice Finger) were much less frequently recognised, while only one 
participant in this study had heard of Braina Pro.   
Figure 4. Familiarity with speech recognition technologies (N=94).
In addition, only three other online speech recognition tools were mentioned by 
the participants who were offered an option to list any other speech recognition 
technologies of which they might be aware: one of the participants noted using 
Google Keep, while another participant had not only heard of but has tried Voice 
Notepad for Slovene (they reported, however, that their dictation work was not 
13
 All comments by the participants are provided in their original form, verbatim, with spelling mistakes and other 
errors left unchanged. 
41 THE SOUNDS OF ENGLISH
highly successful). It is interesting to note that 32 (i.e. 34%) of the participants 
responded that they have already tried using some of the tools mentioned, 
selecting mainly Apple Dictation, Google Docs Voice Typing and Windows 
Speech Recognition (only two participants selected Speechnotes, while only one 
mentioned IBM’s Speech to text and another one Via Voice, cf. Figure 4). Most of 
the participants (62, or 66%) learnt about these speech tools online, by themselves, 
and only five (i.e. 5.3%) at the university.
P13: “It’s a faster way of writing down what you need to translate and possibly a more fun 
and/or interesting way of translation.”
P15: “Good speech recognition tools could help us learn proper pronunciation.”
P20: “Because they could save quite a lot of time and work for translators (there would be also 
be no typos in the text etc).”
P23: “An extra aid one might find useful (like a dictionary or a thesaurus).”
P29: “They could be useful for translating things that need to be transcribed anyway, like 
speeches, directly, or just as an alternative to typing.”
P40: “For transcribing and general text formation – the limits of one’s typing skill can cause 
the occurence of getting lost in thought while typing and forgetting what you were about to say. 
In speech it happens less often”
P44: “It could be helpful if one has to translate videos or with subtitling.”
P48: “These tools can facilitate the translation of audio documents”
P58: “They could replace typing, which can be time-consuming and tiring.”
P65: “We could see where the problem with our speech is.”
P67: “Because it is useful knowing tools that can make the translation work easier. This 
presents us with what the translation work is like and prepares us for it.”
P68: “It is faster, so they can earn more money in a shorter period of time and thus have more 
free time. :)”
P70: “These tools could mean that translators would finish their work faster. Some may speak 
faster than they type so it could improve their working conditions.”
P75: “T o facilitate transcribing spoken language, could be useful for making subtitles”
P81: “Since speech recognition tools are very accurate nowadays, I believe it would save a lot of 
time.”
Figure 5. Participants’ comments on the usefulness of speech recognition tools for translators 
(also verbatim).
Judging from the comments provided in the questionnaire, some participants are 
also aware of the drawbacks of the current speech recognition technologies and their 
reliability: 
42 Nataša Hirci  Trainee Translators’ Perceptions of the Role of Pronunciation and Speech T echnologies ...
P7: “Translators could work faster, but the speech recognition tools would need to be very 
good, especially when it comes to punctuation. Going over a text two or more times to correct 
punctuation that was wrongly placed by speech recognition tools is very time consuming.”
P37: “They might be useful for cases, when translators have to write subtitles e.g. a speech or 
movie, and are not sure about what a person is saying. However the speech recognition tools are 
not yet reliable enough to be completely sure of whether their result is correct.”
As it can be observed from the participants’ comments, the predominant idea revolved 
around the opinion that speech is “faster than typing”, and that the application of 
speech technologies could make translators more efficient. Some students are well 
aware of the current situation in the ever-evolving digital world, recognizing that “the 
use of speech recognition today is growing and many people use it on their phones 
(Siri) or have devices (Alexa) that help them with everyday tasks.” (P34)
All this suggests it might be worth raising the awareness of the trainee translator 
population about the existence of such tools, and possibly even integrate speech 
recognition technologies into translator training. This could be achieved in several 
ways: either by implementing information on speech recognition technologies into 
the already existing technology-related courses, or by introducing it as part of a new 
course focusing on this particular topic with hands-on training within L1 to L2 
translation modules. 
Some studies have already shown (cf. Mees et al. 2013; Désilets et al. 2008) that 
the implementation of speech technologies into translation work is something that 
could possibly be better addressed in the future. There are also interesting pedagogical 
implications of this: if dictation may soon become an increasingly dominant mode 
of communication, it is important to gain an in-depth insight into the aspects of 
pronunciation that would be particularly relevant in translator training.
5 Conclusion
The present study explored the perceptions of trainee translators studying at the 
Department of T ranslation Studies in the University of Ljubljana on pronunciation 
and speech technologies. The results of the study offer good grounds for a more 
prominent role to be assigned to both pronunciation instruction and speech 
technologies in translator training. The study yielded results showing that an 
overwhelming majority of trainee translators (just under 90%) believe that having 
good pronunciation of English is important for their profession (cf. Hirci 2017), 
while over 80% also have aspirations to improve their pronunciation. In addition, 
the results show that all the participants believe it is important to speak English well 
to make a good impression on clients and employers; all but one find this important 
for interpreters, while 93.6% also find it important for translators. Moreover, 
43 THE SOUNDS OF ENGLISH
95.7% respondents stated that it is important to speak well to sound professional, 
and 88.3% believe this is important to be able to use speech recognition tools more 
easily. 
These results suggest that equipping trainee translators with pronunciation skills for 
speech recognition technologies is of relevance and would most likely be embraced 
by the students. This is in line with the study by Mees et al. (2013, 149), whose 
retrospective interviews revealed that “a number of students feel that they have 
become more aware of their pronunciation problems in the course of training the 
SR [speech recognition] program”. Their study also revealed that speech recognition 
“provides a potentially useful supplement to written translation, or indeed an 
alternative to it” (Mees et al. 2013, 140–42). The immediate time-efficiency aspect 
is therefore yet another reason why speech recognition technologies could be applied 
in translator training: a new modality could also enhance the learning experience in 
the translation classroom. As some participants of this study have observed, “Time is 
valuable. Every second saved from sitting in front of a screen and keyboard is warmly 
welcome” (P41) or “It is faster, so they can earn more money in a shorter period of 
time and thus have more free time. :) (P68).” 
With the increasingly rapid advances in voice activated technologies, translator trainers 
should seize the opportunity to retain tech-savvy students’ interest and channel it into 
their regular coursework. Staying ahead is vital to remaining competitive; having that 
special ‘edge’ might be a deciding factor in having trainee translators turn into successful 
players on the professional translation market. Thus aiming to have good pronunciation 
and speak English well enough to be able to work with speech recognition technologies 
could prove to have added value for translators’ professional careers.  
References
Agrifoglio, Marjorie. 2004. “Sight T ranslation and Interpreting: A Comparative Analysis of Constraints 
and Failure.” Interpreting. International Journal of Research and Practice in Interpreting 6 (1): 43–67. 
http://dx.doi.org/10.1075/intp.6.1.05agr.
Angelelli, Claudia. V. 1999. “The Role of Reading in Sight T ranslation.” The ATA Chronicle (Translation 
Journal of the American Association of Translators) 28 (5): 27–30.
Armour, Britt. 2018. “7 Key Predictions for the Future of Voice Assistants and AI.” Accessed January 15, 
2019. https://clearbridgemobile.com/author/britt_clrbridge/.
Baxter, Neal. R. 2016. “Exploring the Effects of Computerised Sight T ranslation on Written T ranslation 
Speed and Quality.” Perspectives 1: 1–18. https://doi.org/10.1080/0907676X.2016.1241287.
Biela-Wolonciej, Aleksandra. 2007. “A-vista: New Challenges for Tailor-Made T ranslation Types on the 
Example of Recorded Sight T ranslation.” Kalbotyra 57 (3): 30–39.
Cronin, Michael. 2013. Translation in the Digital Age. London: Routledge.
Désilets, Alain, Marta Stojanovic, Jean-François Lapointe, Rick Rose, and Aarthi Reddy. 2008. “Evaluating 
Productivity Gains of Hybrid ASR-MT Systems for T ranslation Dictation.” In IWSLT 2008, 
International Workshop on Spoken Language Translation, 20–21 October 2008, Waikiki, Hawai’i, USA, 
Waikiki, Hawai’i, 158–65.
44 Nataša Hirci  Trainee Translators’ Perceptions of the Role of Pronunciation and Speech T echnologies ...
Donaj, Gregor, and Zdravko Kačič. 2012. “ Širjenje slovarja in dvoprehodni algoritem v razpoznavalniku 
tekočega govora UMB Broadcast News.” In Proceedings of the Eighth Language T echnologies Conference, 
October 8th-12th, 2012, Ljubljana, Slovenija, 48–51. Ljubljana: Institut Jožef Stefan.
Dragsted, Barbara, and Inge G. Hansen. 2009. “Exploring T ranslation and Interpreting Hybrids. The Case 
of Sight T ranslation.” Meta: Journal des traducteurs/Meta: Translators’ Journal 54 (3): 588–604. https://
doi.org/10.7202/038317ar. 
Dragsted, Barbara, Inge G. Hansen, and Henrik S. Sørensen. 2009. “Experts Exposed.” In Methodology, 
T echnology and Innovation in Translation Process Research (Copenhagen Studies in Language 38), edited 
by Inger M. Mees, Fabio Alves and Susanne Göpferich, 293–317. Copenhagen: Samfundslitteratur.
Dragsted, Barbara, Inger M. Mees, and Inge G. Hansen. 2011. “Speaking Your T ranslation: Students’ First 
Encounter with Speech Recognition Technology.” Translation & Interpreting 3 (1): 10–43. 
Gile, Daniel. [1995] 2009. Basic Concepts and Models for Interpreter and Translator Training. Amsterdam/
Philadelphia: John Benjamins Publishing.
Gonzalez, Roseann D., Victoria F . Vásquez, and Holly Mikkelson. 2012. Fundamentals of Court 
Interpretation: Theory, Policy and Practice. 2nd ed. Durham, North Carolina: Carolina Academic Press.
Gonzales, Laura. 2017. “Improving Digital T ranslating: Research Findings from Multilingual 
Communicators.” User Experience Magazine 17 (5). http://uxpamagazine.org/improving-digital-
translation/. 
Gorszczyńska, Paula. 2010. “The Potential of Sight T ranslation to Optimize Written T ranslation: The 
Example of the English-Polish Language Pair.” In  Translation Effects. Selected Papers of the CETRA 
Research Seminar in Translation Studies 2009, edited by Omid Azadibougar, 1–12. Leuven: KU 
Leuven. https://www.arts.kuleuven.be/cetra/papers/files/paula-gorszczynska-the-potential-of-sight.pdf.
Jimenez Ivars, Maria. 2008. “Sight T ranslation and Written T ranslation. A Comparative Analysis of Causes 
of Problems, Strategies and T ranslation Errors within the PACTE T ranslation Competence Model.” 
FORUM. International Journal of Interpretation and Translation 6 (2): 79–104.
Hirci, Nataša. 2012. “Electronic Reference Resources for T ranslators. Implications for Productivity and 
T ranslation Quality.” The Interpreter and T ranslator T rainer 6 (2): 219–35. https://doi.org/10.1080/13
556509.2012.10798837.
—. 2017. “Investigating T rainee T ranslators’ Views on the Pronunciation of English: a Slovene Perspective,” 
Linguistica: Sounds and Melodies Unheard: Essays in Memory of Rastislav Šuštaršič 57 (1): 93–106. 
https://doi.org/10.4312/linguistica.57.1.93-106.
Hirci, Nataša, Tamara Mikolič Južnič, and Agnes Pisanski Peterlin. (forthcoming). “Enriching T ranslator 
T raining with Interpreting Tasks: Bringing Sight T ranslation into the T ranslation Classroom.” 
Convergence, Contact and Interaction in Translation and Interpreting Studies, edited by Eugenia Dal 
Fovo and Paola Gentile. Berlin: Peter Lang.
Internet World Stats. 2017. Accessed January 22, 2019. https://www.internetworldstats.com/stats7.htm 
Jurafsky, Dan, and James H. Martin. 2000. Speech and Language Processing: An Introduction to Natural 
Language Processing, Computational Linguistics, and Speech Recognition. New Jersey: Prentice Hall. 
Lambert, Sylvie. 2004. “Shared Attention During Sight T ranslation, Sight Interpretation and Simultaneous 
Interpretation.”  Meta: journal des traducteurs/Meta: Translators’ Journal 49 (2): 294–306. https://doi.
org/10.7202/009352ar . 
Li, Xiangdong. 2014. “Sight T ranslation as a Topic in Interpreting Research: Progress, Problems and 
Prospects.”  Across Languages and Cultures 15 (1): 67–89. https://doi.org/10.1556/Acr.15.2014.1.4 
Mees, Inger M., Barbara Dragsted, Inge G. Hansen, and Arnt Lykke Jakobsen. 2013. “Sound Effects in 
T ranslation.” T arget 25 (1): 140–54. https://doi.org/10.1075/target.25.1.11mme. 
Mikkelson, Holly. 1994. “Text Analysis Exercises for Sight T ranslation.” In Vistas: Proceedings of the 31st 
Annual Conference of ATA, edited by Peter W. Krawutschke, 381–90. NJ: Learned Information.
Moser-Mercer, Barbara. 1995. “Sight T ranslation and Human Information Processing.” Basic Issues in 
Translation Studies. Proceedings of the Fifth International Conference 2: 159–66. 
Moren, Dan. 2018. “Alexa vs. Google Assistant vs. Siri: Google Widens Its Lead.”  Accessed January 15, 
2019. https://www.tomsguide.com/us/alexa-vs-siri-vs-google,review-4772.html
Phillipson, Robert. 2003. English-only Europe: Challenging Language Policy? London: Routledge.
45 THE SOUNDS OF ENGLISH
Pokorn, K. Nike. 2005. Challenging the Traditional Axioms: Translation into a Non-Mother Tongue. 
Amsterdam/Philadelphia: John Benjamins Publishing.
Pöchhacker, Franz. 2004. Introducing Interpreting Studies. London: Routledge. 
—. 2010. “The Role of Research in Interpreter Education.” Translation & Interpreting 2 (1): 1–10.
Rehm, Georg, and Hans Uszkoreit, eds. 2012 . The Slovene Language in the Digital Age. Berlin, Heidelberg : 
Springer. https://doi.org/10.1007/978-3-642-30636-5.
Riccardi, Alessandra. 2002. “Interpreting Research: Descriptive Aspects and Methodological Proposals.” 
In Interpreting in the 21st Century: Challenges and Opportunities, edited by Guiliana Garzone and 
Maurizio Viezzi, 73–82. Amsterdam/Philadelphia: John Benjamins Publishing.
Sepesy Maučec, Mirjam, Tomaž Rotovnik, Zdravko Kačič, and Janez Brest. 2009. “Using Data-Driven 
Subword Units in Language Model of Highly Inflective Slovenian Language.” International 
Journal of Pattern Recognition and Artificial Intelligence 23 (2): 287–312. https://doi.org/10.1142/
S0218001409007119.
Shlesinger, Miriam. 1995. “Shifts in Cohesion in Simultaneous Interpreting.” The T ranslator 1 (2): 193–214. 
https://doi.org/10.1080/13556509.1995 .10798957.
Šuštaršič , Rastislav. 2005. English-Slovene Contrastive Phonetic and Phonemic Analysis and Its Application in 
T eaching English Phonetics and Phonology. Ljubljana: Filozofska fakulteta.
Viaggio, Sergio. 1995. “The Praise of Sight T ranslation (and Squeezing the Last Drop Thereout of).”  The 
Interpreters’ Newsletter 6: 33–42.
Viezzi, Maurizio. 1990. “Sight T ranslation, Simultaneous Interpretation and Information Retention.” In 
Aspects of Applied and Experimental Research on Conference Interpretation, edited by Laura Gran and 
Christopher Taylor, 54–60. Udine: Campanatto.
Weber, Wilhelm K. 1990. “The Importance of Sight T ranslation in an Interpreter T raining Program.” 
In Interpreting: Yesterday, T oday, and T omorrow, edited by David Bowen and  Margareta Bowen, 44–52. 
Amsterdam: John Benjamins.
Weiner, Sophie. 2016. “Study Says Speech-to-Text Is 3 Times Faster Than Typing On Your Phone.” 
Accessed January 15, 2019. https://www.popularmechanics.com/technology/a22684/phone-
dictation-typing-speed/ 
Young, Holly. 2018. “The Digital Language Divide. How Does the Language You Speak Shape Your 
Experience of the Internet?” Accessed January 15, 2019. http://labs.theguardian.com/digital-language-
divide/ 
Žgank , Andrej, and Mirjam Sepesy Maučec. 2010. “Razpoznavalnik tekočega govora UMB Broadcast 
News 2010: nadgradnja akustičnih in jezikovnih modelov.” In Proceedings of the Seventh Language 
T echnologies Conference, Ljubljana, Slovenia, 28–31. Ljubljana: Institut Jožef Stefan.
Žgank , Andrej, Darinka Verdonik, and Mirjam Sepesy Maučec. 2016. “Razpoznavanje tekočega govora v 
slovenščini z bazo predavanj SI TEDx-UM.” In Proceedings of the Conference on Language T echnologies 
& Digital Humanities, Ljubljana, Slovenija, 186–89. Ljubljana: Ljubljana University Press.