Prosody training benefits in perception vs. production skills in simultaneous interpreting: An experimental study

Supporting information There is no supporting information for this study. Abstract The present study investigates the prosody training benefits for interpreter trainees in perception vs. production skills in simultaneous interpreting. Two groups of student interpreters were formed. Participants were assigned to groups at random. The control group received routine instruction in interpreting skills. The experimental group spent 20 minutes less time per session on the routine curriculum and instead received awareness training on prosodic features of English. The total instruction time was the same for the students in two groups, i.e., 15 hours. Students then took a posttest in interpretation skills.The results showed that the experimental groupperformed better than the control group in simultaneous interpretationperformance. Moreover, the study revealed that prosody training enhances the students’ perception skills more than that of the production skills. These results have pedagogical implications for curriculum designers, interpreter training programs, and all who are involved in language study and pedagogy.


Introduction
Language interpreting or interpretation is the intellectual activity of facilitating oral communication, either simultaneously or consecutively, between two or more users of different languages (e.g., Chen & Dong, 2010;Gile, 1995). A simultaneous interpreter is someone who interprets for someone in another language while the speaker speaks without interruption (e.g., Qianxi & Liang, 2019). This is the opposite of consecutive interpreting, because a consecutive interpreter awaits his turn and does not start speaking until the speaker allows him the time to do so. Simultaneous interpreting is one of the most common kinds of interpreting, but also the most difficult one (e.g., Qianxi & Liang, 2019).
The present study was set up in improving the quality of simultaneous interpreting between Persian and English by Iranian students of interpreting, i.e., with Persian as the native language and English as the foreign language. The perception skills were operationalized as students' interpretation from L2 to their L1 and production skills from students' L1 to L2. Earlier studies had shown that the consecutive interpreting quality improved significantly when a portion of the instruction time was devoted to the explicit teaching of the differences between the sound systems of Persian and English (Yenkimaleki & Van Heuven, 2020). Yenkimaleki and Van Heuven (2020) developed a series of instruction modules (Yenkimaleki, 2017) that targeted the segmental structure (i.e., the differences in vowels, consonants, and syllable structure, see Yenkimaleki & Van Heuven, 2020 for details) as well as the prosodic structure (differences in word and sentence stress, melody, and rhythm, see Yenkimaleki, 2017, pp. 50-85 for a detailed description of the prosody modules). They concluded that explicit knowledge and awareness of the prosody of the nonnative language are especially helpful for decoding the English input for consecutive interpreters.
The key to building simultaneous interpreting expertise lies in improving the efficiency of the interpreter's perception and production skills to facilitate the communication of the message (e.g., Hu, 2010;Qianxi & Liang, 2019). The previous research (e.g., Yenkimaleki & Van Heuven, 2019a, 2020 has shown that prosody training enhances the quality of consecutive interpreting performance. The effect of prosody training on simultaneous interpreting performance has not been studied systematically. Therefore, the present study addresses the effect of the prosody training benefits for interpreter trainees in perception vs. production skills in simultaneous interpreting.

Background literature
Prosody is the ensemble of properties of speech that cannot be derived from the mere sequence of phonemes that make up a spoken sentence (Van Heuven, 1994). Prosody then includes such phenomena as lexical tone, stress at the word, and at the sentence level, boundary marking, and intonation. All these suprasegmental phenomena are characteristics of linguistic units larger than a single vowel or consonant, i.e., larger than a segment (Nooteboom, 1997;Van Heuven, 2017;Van Heuven & Sluijter, 1996). Although words are recognized mainly from the sequence of segments, word-level prosody assumes a critical role in the recognition process when the segmental quality is poor, as is typically the case in foreign-accented speech (e.g., Cutler, 2012;Van Heuven, 2008;Yenkimaleki, 2016). Moreover, sentence prosody is often indispensable in the signaling of the speaker's intention (e.g., O'Neal, 2010). Prosody plays an important role in the decoding and encoding of meaning. Segmentation of continuous speech into syllables, words, and phrases, informing syntactic structure, and emphasizing content words and other salient information are some of the functions of prosody that facilitate the processing of speech (Whalley & Hansen, 2006;. For successful decoding of input speech and encoding speech output in the nonnative language, the L2 learner will benefit from an explicit comparison of the prosodic properties of his native language and those of the L2 (Yenkimaleki & Van Heuven, 2019b, 2020. Many researchers have emphasized the importance of awareness and 'consciousnessraising' for second language learning (e.g., Schmidt, 2010;Yenkimaleki & Van Heuven, 2020). Mainstream cognitive psychologists consider awareness a fundamental precondition to learning and even claim that learning is impossible without conscious awareness (Brewer, 1974;Dawson & Schell, 1987;Lewis & Anderson, 1985). In the field of foreign-language education, these views are reflected by, for instance, Bialystok (1978), who proposed a theoretical framework in which consciousness knowledge plays a key role. In a similar vein, Rutherford and Sharwood Smith (1985) asserted that drawing the learner's conscious attention to the formal properties of the foreign language can be advantageous to second language learning. These perspectives could be applied to interpreter training programs to make interpreters have conscious knowledge of prosodic features in reducing the number of competing representations of the incoming structures they have to entertain in working memory while having interpretation performance.
Prosody awareness training is the most marginalized activity in the training of interpreters though prosody plays a critical role in communicating the message. The neglect of prosody awareness training for interpreters may be due to the (apparent) complexity of this issue and the misconception about what content should be taught and how this could be done (Yenkimaleki, 2017(Yenkimaleki, , 2018. The reason for the neglect of prosody awareness training for interpreters is that the practitioners in the EFL (English as Foreign Language) contexts find it difficult to listen analytically to the students' pronunciation, identify errors and suggest remedies, or because they give priority to other aspects of communicative competence such as the acquisition of vocabulary and morphosyntax. Jackson and O'Brien (2011) maintain that the relationships between prosody, second language speech production, and second language comprehension are understudied and need more investigation. Systematic studies should be done to learn how interpreters may exploit the relationships between prosody and meaning when decoding messages in the source language and encoding the same message in the target language.
Hahn (2004) investigated the effect of primary stress (i.e., marking the focus of content with relatively high pitch tones) in native speakers' understanding of nonnative speakers of English, asking native speakers to listen to various short lectures delivered by a single Korean EFL speaker whose sentence stresses were manipulated. The results showed that the correct placement of sentence stress significantly improved native listeners' comprehension and memorization of the content. In a similar vein, Field (2005) asked both native speakers and nonnative listeners to transcribe various types of nonnative speakers' speech tokens, finding a negative impact of incorrect word stress (wrong position and/or wrong phonetic realization). Saito and Saito (2016) investigated the effects of prosody-oriented instruction on the global comprehensibility and suprasegmental development (word stress, rhythm, and intonation) of Japanese EFL learners. Students in the experimental group received a total of three hours of instruction over six weeks, while those in the control group were provided with meaning-oriented instruction without any focus on suprasegmentals. Speech samples elicited through reading-aloud tasks were assessed via native-speaking listeners' intuitive judgments and acoustic analyses. Overall, the pre-/posttest data showed significant gains in overall comprehensibility, and use of word stress, rhythm, and intonation of the experimental group in both trained and untrained lexical contexts. In particular, by virtue of explicitly addressing first language / second language linguistic differences, the instruction was able to help learners mark stressed syllables with longer and clearer vowels, reduce vowels in unstressed syllables, and use appropriate intonation patterns for yes/no and wh-questions. Cutler et al. (1997) reviewed the exploitation of prosodic information in the comprehension of spoken language. They looked at the use of prosody in recognition of spoken words, in which most attention has been paid to the question of whether the prosodic structure of a word plays a role in the initial activation of stored lexical representations; the use of prosody in the computation of syntactic structure, in which the resolution of global and local ambiguities has formed the central focus; and the role of prosody in the processing of discourse structure, in which there has been a preponderance of work on the contribution of accentuation and deaccentuation to the integration of concepts with an existing discourse model. They concluded that the task of the listener is to reconstruct the speaker's message, and that there are various different aspects to this task: recognizing the individual words, extracting their syntactic relationships, determining the semantic structure of the utterance and its relation to the discourse context. The processing of speech input is facilitated in several ways by coherent prosodic structure appropriate for sentences.
Listening to speech in a nonnative language is inherently noisy since the linguistic code of the input speech does not match the deep-rooted expectation pattern of the nonnative listener (Cutler, 2012). It is the hypothesis that drawing the nonnative listener's attention to the specific characteristics of the L2 prosody, by intensive exposure to words with unexpected stress patterns and/or explicitly pointing out prosodic differences between in the L1 and L2, will help the nonnative listener process the L2 input speech. It is further assumed that knowing how to exploit the redundancies imparted by word and sentence prosody in the L2 input pays off, especially when the speech processing task is aggravated by time pressure and heavy burden on working memory, which is unavoidable in interpreting tasks.

Main aim
Considering the studies reviewed above, the effectiveness of the prosody training benefits for interpreter trainees in perception vs. production skills have been established. Yenkimaleki and Van Heuven (2016b) ran two separate experimental studies to investigate the effect of explicit teaching of prosody on enhancing the quality of interpreting in both recto1 and verso2 interpreting. They concluded that prosody training has a positive impact on qualifying the interpretation performance by interpreter trainees. The present study investigates the effect size of prosody training benefit in perception vs. production skills of interpreter trainees. Given that interpreter training curricula have to make choices as to how much time should be spent on teaching particular skills and what type of materials should be developed in prosody training for interpreters, it is important to know the effect size of prosody training in perception vs. production skills. Therefore, the present study addresses the effect of the prosody training benefits for interpreter trainees in perception vs. production skills in simultaneous interpreting. The hypothesis is that training explicitly targeting the English prosody will improve the interpreter trainees' perception skills (e.g., interpretation from students' L2 to their L1) more than that of the production skills (e.g., interpretation from students' L1 to their L2) (Yenkimaleki & Van Heuven, 2020).

Participants
Thirty-two interpreter trainees were chosen to participate in this study. All participants were undergraduate students at the University of Applied Science in Tehran, Iran. None had studied or lived abroad at that point. They were randomly divided into two classes of 16 students (8 male and 8 female students per group). The participants were native speakers of Farsi within an age range of 20-24 years. They participated in all sessions of the training program.

Ethical issues
Approval is received from the ethics committee of the Dept. of Modern Languages for the present study. All the participants agreed to take part in the research project on the basis of informed consent and received a small amount of money for their services.

Procedures
At first, two groups took a pretest of simultaneous interpreting. The test included eight 30-second extracts, which the participants interpreted. The recorded extracts that were used in the instructional sessions were authentic Farsi and English audios. They included news, political discussions, and social interviews. In the choice of extracts, careful attention was paid to including sentences in which correct stress placement at the word and/or sentence level was vital to understanding the meaning. For instance, variable-stress words were included in which initial stress codes a noun but final stress a verb (the import vs to import, the conduct vs to conduct); also sentences were included as much as possible in which referents that were mentioned in the preceding discourse should not receive sentence stress, which would urge the listener to look for a new referent (see Hahn, 2004). In rating, for four of the extracts, the interpreter trainees listened to authentic extracts in English and interpreted into Farsi (the participants' mother tongue). The other four extracts were authentic extracts in Farsi, which the interpreter trainees interpreted into English.
The control group received routine instruction in interpreting, i.e., the routine curriculum and the syllabus, which was used in the English Translating and Interpreting Department for training interpreters in Iran. For this group, the techniques of interpreting, different aspects of interpreting, and types of interpreting were instructed and practiced. The experimental group spent 20 minutes less time per session on the routine curriculum and instead received awareness training on prosodic features of English (see Yenkimaleki, 2017 for details of the training program). Altogether each group took part in 10 sessions for a total of 15 hours of instruction (90 minutes per session, one session in 10 successive weeks). In both classes, authentic extracts from spoken Farsi and English were presented to the students, who then interpreted into English and Farsi.
Formative quizzes were administered from time to time during the training program in order to provide feedback on the progress of students to the instructor. Then a posttest was administered to both the control and experimental groups alike to measure the quality of (simultaneous) interpreting at the end of the training. The posttest was like the pretest but with different audio extracts. The pretest and posttest were different to rule out repetition effects. The level of difficulty of pretest and posttest was kept the same.
All recorded texts produced by all participants were then evaluated independently by three expert judges of interpreting quality. Evaluation criteria (see Table 1) were explained beforehand (based on Sawyer, 2004). Judges rated the students' performance in separate cubicles of a language laboratory, could not see one another and were not allowed to compare or discuss their marks with each other. The order in which the 32 student interpreters were rated was the same for all judges. The eight fragments selected for each subject were presented in immediate succession. After each fragment there was a 30second pause during which the rater could fill in check his marks. The presentation of the next fragment started once all raters has signaled that they were ready. The materials were played back over small loudspeakers without interruption or repetition; judges noted down their marks (one for each criterion) on paper evaluation sheets as the fragments progressed. As indicated in Table 1, the maximum number of score points that the rater could award differs between scales. Here I followed the recommendations issued by Sawyer (2004). Note. For detailed explanation of these criteria see Yenkimaleki, 2017. The numbers are the maximum scores that can be given to each component. Component scores add up to 100. After Sawyer (2004).

Results3
In the design of this study there are three independent variables (factors) and one dependent variable. Independents are the Direction of the interpreting process, where recto interpreting (into Farsi) targets perception skills in English and verso interpreting (into English) involves production skills in English. The second factor is the Training group the student belonged to, i.e., the experimental group with special attention of prosodic matters vs the control group with the routine training program. The third factor is the Moment of testing, i.e., the pretest or the posttest. This yields a 2 ×2 ×2 factorial design. The dependent variable is the sum of the eight component scores per student specified in Table 1, as rated by the three judges separately. We will first assess the agreement among the three raters and then decide to combine the scores over raters into a single mean rating per student, which is a number between 0 and 100. The main analysis will be done by a repeated measures Analysis of Variance with Direction of interpreting and Moment of testing as within-subject factors, and Training type as a between-subjects factor. To simplify the exposition, I will perform a second repeated measures Analysis of Variance on the difference between the student's pretest and posttest score, also known as the gain. This eliminates the main effect of, and all interaction terms with, the Moment of testing from the analysis, so that only the main effects of Training and Direction remain.
The mean of intraclass correlation for eight evaluation criteria rated by three raters amounted to .976, which indicates excellent agreement among the three raters. On the basis of this result, the mean rating score is considered a valid estimate of the students' performance in simultaneous interpretation.
Table 2 (part A) summarizes the scores of the interpretation test of the control group and of the experimental group in the pretest. A repeated measures ANOVA on the pre-test scores (A) with Direction as a withinsubjects factor, and Training as a between-subject factor indicated that the effects of Training, F(1, 30) < 1, Direction, F(1, 30) = 1.6, p = .241, pη2 = .051, as well as the interaction between Training and Direction, F(1, 30) < 1, were insignificant. Therefore, the groups can be considered equal in terms of their interpreting skills before the start of the treatment, irrespective of the direction of the interpreting task. The next step in the data analysis was to determine whether the improvement in the scores from pretest to posttest is significant, and whether the improvement may differ significantly depending on the Direction of the interpreting process and the type of Training. This was done by performing a repeated measures ANOVA on the pre-and post-test scores (A and B) with Moment of testing and Direction as within-subject factors and Training as a between-subjects factor. The results indicate a significant main effect of Moment, F(1, 30) = 38.3, p < .001, pη2 = .561, with the posttest scores 2.13 point better than the pretest scores (81.3 vs. 79.1). The effect of Direction was somewhat smaller but also significant, F(1, 30) = 22.3, p < .001, pη2 = .427, with a difference of .6 of a point in favor of recto interpreting (80.5 vs. 79.9). The main effect of Training was not significant, F(1, 30) < 1. The Training × Direction interaction failed to reach significance, F(1, 30) = 3.6, p = .069, pη2 = .106. Importantly, the Training × Moment interaction is significant, F(1, 30) = 22.3, p < .001, pη2 = .427. Also the Moment × Direction, and the third-order Training × Moment × Direction interactions were significant, F(1,30) = 5.2, p = .029, pη2 = .148, and F(1,30) = 7.5, p = .010, pη2 = .199, respectively. These interactions are illustrated in Figure 1. In this figure the left-hand panel shows the overall mean scores obtained in the pretest (as presented numerically in Table 2A) broken down by the Training method, i.e., by control group (green bar is interpretation into Farsi, and red bar is interpretation into English) vs experimental group. The same breakdown is shown in the right-hand-panel for the Gain from pretest to posttest (right-hand panel, presented numerically in Table 2C, which lists the difference scores of B minus A). By way of post-hoc analysis the Gain was adopted as the dependent variable, thereby eliminating the Moment of testing as a factor in the design. The Gain after treatment for the control group was less than half a point on the assessment scale from 0 to 100 points, and insignificant by a separate Bonferroni post-hoc test. There was no difference depending on Direction of the interpreting, F(1, 15) < 1. This was very different from the 3.8-point gain found for the experimental group. The main effect of Training was significant, F(1, 30) = 22.3, p < .001, pη2 = .427. The gain obtained by the experimental group was significantly larger for recto interpreting (into Farsi), than for verso interpreting (into English), as was shown by the interaction between Training and Direction, F(1, 30) = 7.5, p = .010, pη2 = .199.

Discussion
The results show that prosody training has a positive effect on the students' interpretation skills. Moreover, students' interpretation skills were significantly better when they interpreted from the second language into their mother tongue (message perception skills) in simultaneous interpreting. The results are in line with Yenkimaleki and Van Heuven (2020), who concluded that interpreter trainees perform better when they have acquired conscious knowledge in word and sentence prosody and of the differences in prosody between their working languages. This shows that the training is effective and that the students' improved performance is not due to some halo-effect caused by the novelty of this part of the curriculum. Rather, it would be argued that the gain in performance is obtained because of what Whalley and Hansen (2006) claimed, viz. that increased awareness of prosodic cues in the (nonnative) input speech facilitates the listener's task of breaking up the incoming stream of sound into syllables, words, and phrases, inform syntactic structure, and emphasize salient content words. The findings of the study also converge with Pennington and Ellis (2000), who concluded that directing learners' attention to and raising their awareness of prosodic features of the second language during training improves the perception skills of EFL learners. Awareness of prosody facilitates the perceptual processing of the nonnative input speech; it does not automatically yield an equally large benefit for production tasks. Perception (necessarily) leads production: the learner recognizes the phenomena when s/he perceives them but has not reached the point where (all) the newly acquired knowledge could be used to improve speech production.
Nonnative pronunciation is perceived in the production of both segmentals and suprasegmentals in L2 speech, it contributes to the perception of foreign accent, and it may lower intelligibility or comprehensibility in speech (Kang et al., 2010;Munro & Derwing, 2008). Additionally, nonnative production of suprasegmentals appears to be more detrimental than segmental errors in L2 comprehensibility and intelligibility perception (Field, 2005;Kang et al., 2010;Yenkimaleki & Van Heuven, 2019a). To help L2 learners with these problems, training studies have proven to be beneficial in speech perception/production (Gordon et al., 2013;Yenkimaleki & Van Heuven, 2019a, 2019c. Second language perceptual training is necessary not only because natural L2 perceptual acquisition is challenging for adult learners due to their potential lack of perception of foreign sounds, but also because perceptual training facilitates oral production (Qian et al., 2018). Research suggests a precedent relationship of perception development to production achievement (e.g., Detey & Racine, 2015;Walden, 2014), such that perceptual insufficiency tends to inhibit production performance (Iverson et al., 2005). The indispensable role of perception to production may be further strengthened by recent neurolinguistic discoveries that the ability to perceive is essential to accurate articulation (Golestani & Pallier, 2007). The mutually facilitative interaction between perception and production has been demonstrated in some studies (e.g., Linebaugh & Roche, 2013, 2015, including perceptual research conducted specifically at the segmental level (e.g., Okuno & Hardison, 2016;Yenkimaleki & Van Heuven, 2016a).
An important point to consider is that the importance of prosody teaching for learners of English as a foreign language (EFL) may differ depending on the learner's L1. If the prosodic system of English and that of the L1 are rather similar, there is less need to teach prosody. If the systems are quite different, then prosody teaching will be crucial (e.g., Yenkimaleki & Van Heuven, 2019b). This is the case for Farsi and English, since word stress is fixed in the vast majority of vocabulary in Farsi but it is complex and weight sensitive in English. Also, the rhythmic structure in Farsi is syllable timed, and in English it is stress timed (for details of prosody differences between English and Farsi see Yenkimaleki, 2016).
It has been shown before that a closer approximation of the prosody of native English yields better intelligibility and comprehensibility of non-native speech (e.g., Saito & Saito, 2016;Yenkimaleki & Van Heuven, 2019a, 2019b. In a recent study by the present author it was shown that prosody training program was successful in boosting the quality of the speech output in so-called inverse consecutive interpreting, i.e., from native Farsi into nonnative English (Yenkimaleki & Van Heuven, 2018). In inverse interpreting, of course, the increased awareness of the prosodic requirements of English is directly observable in the output of the interpreting process. In the present experiment, targeted comparing the effect of prosody training in perception vs. production skills of interpreter trainees, the effect size of training benefit was compared systematically. Interestingly, the effect size of the prosody training program was significantly higher in perception skills of interpreter trainees. This issue can be traced in cognitive theory that perception skills are easier for EFL learners than production skills to perform (Johnson-Laird, 2001).

Conclusion
Overall, the results showed that the effect size of the prosody training program was significantly higher in perception skills than that of the production skills for interpreter trainees.
In the present study, the emphasis on comparing perception vs. production skills of interpreter trainees was addressed because of the contribution it could have on developing standard curriculum in training qualified future interpreters which has been pointed out in practitioners' beliefs (Yenkimaleki & Van Heuven, 2019b). The findings emphasized an increased importance of the role of prosody in the perception of the nonnative speech (Derwing et al., 2012), that prosodic features often producing promising results in speech recognition (Anderson-Hsieh et al., 1992;Yenkimaleki & Van Heuven, 2019c). It may be pointed out that increased conscious attention in pronunciation materials to training students to monitor their production through the teaching of formal rules, noticing the differences, providing constructive feedback, and reflective activities result in the enhancement of speaking skills (Yenkimaleki & Van Heuven, 2019c).
The number of participants in the present study was thirty-two students since the researcher did not have access to a large number of participants. Future studies can be set up with large number of participants, and also with different L1 backgrounds in other learning contexts. The pedagogical implications of the present study could be applied to interpreting programs and the EFL curriculum. EFL curriculum developers and practitioners need to make a number of changes in their overall approach in methodology choice in teaching prosody at interpreter training programs.
It is suggested that the effectiveness of prosody training be investigated in other working languages. Furthermore, the study could be extended to investigate these issues in other students in other contexts.
1 Interpreting from the nonnative language into the interpreter's native language is called direct or recto interpreting. 2 Interpreting from the interpreter's native language into the foreign language is called verso interpreting. 3 I am most grateful to Prof. dr. Vincent van Heuven for doing the statistical analysis for this study.