Dutch listeners’ responses to Dutch, British and American English accents in three contexts

Author contributions Warda Nejjari, conceptualization, methodology, data collection, data analyses, writing original draft, writing, reviewing, editing; Marinel Gerritsen, supervision, conceptualization, writing, reviewing, editing; Roeland van Hout, supervision, conceptualization, methodology, data analyses, writing, reviewing, editing; Brigitte Planken, supervision, conceptualization, writing, reviewing, editing

example in a higher education context, are often about the perceived English fluency levels of lecturers and their accents (e.g. Bouma, 2016;Bronkhorst, 2015). Dutch students have indicated that the Dutch-accented English used by professors has hindered their understanding and resulted in less effective lectures (e.g. Huygen, 2017). Such perceptions were partly confirmed in an experiment on self-reported speech understanding of Dutch instructors' accents in English (Hendriks et al., 2016). As self-reported speech perceptions do not necessarily reflect actual speech understanding, our study investigated actual understanding of Dutch-accented English (versus British and American English) by Dutch listeners. Furthermore, it studied the speaker evaluations Dutchaccented English elicits, in various communication contexts. L2 accents can also affect the perceived quality and effectiveness of information transfer (see Nejjari et al., 2012 for an overview). Information transfer involves a continuous evaluation of speech and speakers in interactions in terms of, among other things, the linguistic features of an utterance and the context in which these are used (Lev-Ari & Keysar, 2010). In addition, accents are one of the first signals used to assess someone as similar, intelligent, friendly or even untrustworthy (e.g. Nejjari et al., 2012;Purnell et al., 1999). Research on L2 accents in a lecturing setting shows that L2-accented speech can negatively impact listeners. For instance, lecturers' L2-accented speech can lead to lower perceptions of status, but does not affect a lecturer's likeability (affect) (e.g. Dalton-Puffer et al., 1997;Hendriks et al., 2016). Furthermore, it irritates students as well as negatively impacts perceived lecture understanding (e.g. Bolton & Kuteeva, 2012;Hellekjaer, 2010;Hendriks et al., 2016). Kachru and Smith (2008) offer a fine-grained definition of speech understanding consisting of three components. The first component is intelligibility, which refers to the manner in which utterances are deciphered into words and sentence-level elements. Intelligibility can be measured by requiring listeners to orthographically transcribe individual words or sentences produced by speakers in recorded settings (see also Nejjari et al., 2012;Nelson, 2011;Yorkston et al., 1996). The second component is comprehensibility, which is the manner in which individual words are understood and how words put together in sentences express meaning within a specific context (see Nejjari et al., 2012). Kachru and Smith (2008) indicate that unlike intelligibility, comprehensibility requires that a listener understands the syntax, semantics and even the physical context in which an utterance is heard. The third component is interpretability, which is difficult to distinguish from comprehensibility, because both deal with meaning beyond the recognition of words and phrases. Interpretability refers to whether listeners are able to grasp a speaker's intentions and have the cultural baggage to understand and interpret discourse strategies (see Nejjari et al., 2012). Nejjari et al. (2012) operationalized Kachru and Smith's model to investigate standard British English speakers' reactions to a standard British English accent and a strong or light Dutch English accent. Both Dutch English accents led to lower intel-ligibility and comprehensibility, but did not impact interpretability, suggesting that L1 listeners are capable of understanding the intentions of a speaker with an L2 accent without having heard each word correctly or completely comprehending message content. These results demonstrate that measuring speech understandability as a multicomponent construct can yield nuanced insights into listeners' understanding of L2 accents.
In addition to status and affect, accentedness research has investigated speaker dynamism, albeit less consistently. According to Zahn and Hopper (1985), dynamism refers to a person's activity level and enthusiasm. It taps into a listener's perception of the self-presentation of a speaker, and this differs from perceptions of a speaker's status or affect, as confirmed by Grondelaers and Van Hout (2015). Their study investigated the perceived prestige of the use of a non-standard object pronoun in Dutch (Dutch: 'hun' (them) in subject position instead of 'zij' (they)). Even though non-standard use of the object pronoun evoked lower status, it did not affect speaker dynamism, suggesting that, while a speaker who uses non-standard features may be perceived as less cultured, their self-presentation will not be perceived differently from that of a standard language user.
Since the aim of speaker evaluation research is to understand listeners' responses to specifically selected accents, it is important to determine whether listeners are able to identify the accents they are asked to evaluate. Research by Yook and Lindemann (2013) has shown that accent recognition triggers speaker evaluations associated with a particular group of people (e.g. nationality, culture, sub-culture), and therefore, speaker evaluations are connected to stereotypical ideas listeners have of certain groups. Consequently, to study evaluations of specific speaker groups (e.g. Dutch-accented English speakers) adequately, we need to determine whether listeners can correctly identify speakers' language background.
Most accentedness studies focus on one (professional) communication context, such as higher education (Bolton & Kuteeva, 2012;Hellekjaer, 2010;Hendriks et al., 2016) or business sales (Nejjari et al., 2012;Tsalikis et al., 1991). One exception is Cargile (1997) who investigated L1 and L2 listeners' perceptions of the suitability of Mandarin-accented English in a job interview compared with a higher education classroom. He found that listeners viewed the Mandarin-accented English accent as acceptable in a job interview, but not in a higher education classroom. The difference in acceptability could be connected to the Expectancy theory of Burgoon and Burgoon (2001): people have expectations of, among other things, verbal and non-verbal communications that are 'expected' and/or 'desired' in a certain context. By extension, a particular accent in a certain context may violate what is an expected or desired accent in that context, which may result in negative evaluations, and perhaps even impact understanding (cf. Cargile, 1997;Dalton-Puffer et al., 1997). It is important to note that expectations in terms of context are connected to topic and content (how and what is stated). For example, listeners can have different communication expectations of a lecture on Corona virus in terms of content and topic compared with a lecture on Dutch poetry. Although we acknowledge that a context may be linked to different topics and contents, for practical reasons, only context was included as a variable, with a topic and content typical of each context.
A matched-guise experiment was set up (see 2.1) to investigate listeners' actual understanding of Dutch-accented English compared to standard British and American English accents and listeners' perceptions of the speakers of these accents across three communication contexts. We were also curious whether the three components of speech understandability correlate with the three speaker evaluation dimensions we investigated, as this has not been studied systematically. Three research questions were formulated: RQ1: Do accent (standard British English = BrE, standard American English = AmE, Dutch English = DE) and context (Lecture, Audio tour, Job pitch) affect Dutch listeners' speech understandability (intelligibility, comprehensibility, interpretability)? RQ2: Do accent (BrE, AmE, DE) and context (Lecture, Audio Tour, Job Pitch) affect Dutch listeners' speaker evaluations (status, affect, dynamism)? RQ3: Are speech understandability and speaker evaluations correlated? 2 Method

Speakers: matched-guises, controls, and filler
A matched-guise technique was used, which meant that one male speaker produced all three tested accents. This technique has the advantage of controlling for the influence of paralinguistic features like voice quality. We discussed different voice evaluation paradigms in Nejjari et al. (2019) leading us to conclude that the matched-guise technique constitutes the most optimal speech evaluation paradigm. The male matched-guise speaker in our study had been assessed in a prior accentedness experiment, which showed that only he could authentically produce the three accents: (1) BrE, (2) AmE, and (3) DE, the typical English accent of L1 speakers of Dutch (see Nejjari et al., 2019 for detailed discussion on the matched-guise technique itself, the reason for employ-ing the matched-guise technique, and the matched-guise speaker selection process). In the current study, "standard" accents of British and American English refer to the accents generally associated with the national accent norm of the nations in which they originate. Both Englishes are used as models in L2 English education around the world (Robinson, 2019). A typical Dutch English accent contains features that L1 speakers of Dutch and others familiar with Dutch and Dutch English will recognize as such. For example, because Dutch lacks dental consonants [ð] as in this, mother, breathe and [θ] as in think, Martha, breath, they are often mispronounced as stop consonants, [d] and [t] respectively, by Dutch speakers of English. Dutch also lacks voiced fricatives and plosives in the coda, causing the voiced obstruents of English to generally be pronounced as voiceless in Dutch speakers' English (e.g. live, badge, bad, bag will be pronounced with [f, tʃ, t, k]) (Gussenhoven & Broeders, 1997). As no standard has been defined for Dutch English, this accent is referred to as 'typical' in the present study (cf. Nejjari et al., 2019).
Listeners only heard one of the nine matched-guise speech samples, preventing listeners from deducing that we used a matched-guise technique. We included (speech samples from) six male control speakers as stimuli: two L1 speakers of BrE, two L1 speakers of AmE, two DE speakers. All but one had been assessed on the representativeness of their accents in the same accentedness experiment as the matched-guise speaker (see Nejjari et al., 2019). The speaker who was not assessed was one of the DE control speakers. He was generally regarded by experienced linguists and Dutch language specialists as a representative speaker of DE. One further male speaker produced a speech sample that was presented to all listeners at the beginning of the experiment (the filler speech sample) to familiarize them with the task. The filler speaker had also been selected from the prior study as an L1 and standard speaker of British English (Nejjari et al., 2019).

Instrumentation and participants
To ensure that the nine (plus one filler) matched-guise samples could be evaluated in each context for each accent, to avoid repeating the content of each context, and to limit any order effect, 18 listener groups were created, aiming to collect approximately 30 listeners per group (Table 1).
Listeners were highly educated (highest degree attained: 12.3 % A-level; 51.2 % bachelor; 27.6% master; 2.3% PhD; 6.3% other) native speakers of Dutch (mean age 39; 60% female, 40 % male) with no background in linguistics. They were selected because they represent the part of Dutch society most likely to use English in educational and professional settings (Bouma, 2016;Lizzini et al., 2017). To understand whether self-reported English fluency impacted speech understandability, the listeners were asked to estimate their English language skills (listening, reading, speaking, writing) on a 5-point Likert scale (1: very low; 2: low; 3: average; 4: high; 5: like a  native speaker). Ninety-two percent indicated that their skills in all four areas were average or higher.
Participants were asked to indicate what they believed the speakers' country of origin was. Over 92% correctly indicated that the DE matched-guise speech sample was by a speaker from the Netherlands. For the BrE speech sample 88 % correctly indicated that the speech sample was produced by a speaker from Great Britain, and over 81% correctly indicated that the AmE matched-guise speech sample was by a speaker from the United States. Responses of listeners who were not able to correctly identify speakers' country of origin were excluded, as these could reflect interfering associations with other speaker groups, and thus with other accents than those under study.
In order to answer the first two RQs, the responses by listeners who had correctly identified the speaker's country of origin were included in the analyses (N = 392). To answer RQ3, however, correlations were calculated using all responses (N = 545), so as to provide a better assessment of the relationship between the speech understandability dimensions and English language skills, based on the larger range of English skills in the full sample.

Stimuli
Three texts, (1) an introduction to a marketing lecture, (2) an art gallery audio tour segment and (3) a job pitch for a retail management position, were used as the basis for the speech samples. One filler text on a general topic was used to start each questionnaire and allow listeners to get accustomed to the questions (links and texts: see appendix).
All but the filler text reflect three contexts (independent variable context) in which English is an important lingua franca: higher education, tourism, and international business (Gerritsen et al., 2016). Every listener evaluated all three contexts in the same order. To ensure realistic topics and content, the lecture, audio tour, and filler texts were selected from an IELTS Academic English listening test and the job pitch text from a human resources webpage. The matched-guise speaker produced the three accents in all three contexts, resulting in nine speech samples. The six control speakers produced English in their L1 accents (DE, BrE, AmE) in the three contexts, resulting in 18 speech samples, and the additional speaker produced one filler speech sample on a general topic in BrE.

Speech understandability
Following Kachru and Smith (2008) and Nejjari et al. (2012), three questions measured speech understandability in terms of ability to (1) literally recognize words (intelligibility), (2) understand the meaning of the words within the context (comprehensibility), and (3) understand the intention of the speaker / purpose of the message (interpretability). To measure intelligibility, listeners were presented with a speech sample consisting of the first 11-12 words of each of the four stimuli and asked to write down what was literally stated. We counted the number of words correctly transcribed, and an Intraclass Correlation Coefficient for the two raters turned out to be extremely high (.98). To measure comprehensibility, listeners indicated whether one statement per speech sample on the topic of that speech sample was correct or not. Interpretability was measured by having listeners indicate whether one statement per speech sample was correct or not with respect to the speaker's intention.

Speaker evaluations
To assess speaker evaluations, the listeners indicated on 5-point Likert scales (1 = strongly disagree; 5 = strongly agree; 3 = neither disagree nor agree) to which extent they believed the speaker possessed 11 personality traits, representing the three dimensions of speaker evaluations this experiment focuses on, namely status (competent, educated, having authority, intelligent and cultured), affect (considerate, pleasant and friendly), and dynamism (energetic, enthusiastic, confident). The traits associated with status and affect are based on Nejjari et al. (2012). Status represents the degree to which a speaker is viewed as intelligent and well-educated, and affect represents the degree to which a speaker is perceived as likeable. Dynamism measured the self-presentation of the speaker and was based on Grondelaers et al. (2015). To confirm the dimensionality of the speaker evaluation items, a principal component analysis, was applied with an Eigenvalue > 1 criterion for factor extraction, and varimax rotation. The personality items showed a resolution into three factors: status, affect, dynamism, as can be seen in Table 2.

Procedures
An online survey was conducted via a student Facebook page and the online data collection service Qualtrics. This approach enabled the participation of many listeners, but the disadvantage was that many listeners did not complete the questionnaire or produced repetitive or nonsense response patterns. Approximately 41% of the original data was discarded, resulting in data from 545 listeners. Reasons for excluding data included: incomplete questionnaires (approximately 25%); nonsense answers or symbols and /or only neutral answers (approximately 15% of the original data). Listeners who did not speak the required L1 were also excluded (1%). The median questionnaire duration was a little over 16 minutes for all listeners, including the excluded responses.

Statistics
Descriptives and frequencies were calculated to establish means and percentages of listener characteristics and responses. We used ANOVAs when the dependent variable  was continuous and logistic regression when it was binomial (two values only). The factor analyses we applied to trace underlying dimensions in speaker evaluations were principal component analyses, with varimax rotation.

Results
The findings will be reported for the three RQs successively (datafiles are available upon request). All mean and frequency measurements of speech understandability and speaker evaluation with regard to the three accents in three contexts can be found in Table 3 below. Only the matched-guise speaker results are reported here, and not the control speakers, since individual speaker voice quality differences can obscure the impact of accent and context, which is precisely the reason why the matched-guise technique was used.

Speech understandability, accent, context (RQ1)
In general, speech understandability was high: comprehensibility was 89.2%, interpretability was 82.2%. The mean number of correctly transcribed words (intelligibility) was 9.23 (81.0%), which is fairly high given the maximum correct score of 11 (Lecture, Job Pitch) or 12 (Audio Tour) words. Results for intelligibility (proportion correct words) in relation to context and accent are shown in Figure 1, with 95% confidence intervals. They show substantial overlap in the confidence intervals between bars, indicating that there are no strong differences between the accents and the contexts. An analysis of variance was applied to investigate the effects of context and accent and their interaction. There was an interaction effect (F(4,383) = 2.84, p = .02, ηp2 = .03). There was a main effect for accent (F(2,383) = 4.30, p = .01, ηp2 = .02), but not for context (F(2,383) = 1.27, p = .28, ηp2 = .01). However, the post-hoc test (HSD) for accent did not show significant differences.
Results for comprehensibility in relation to context and accent are shown in Figure 2, with 95% confidence intervals. There are no error bars for the Lecture in British and American English because the comprehension scores were 100%. The remaining error Figure 1 Mean proportions of correct intelligibility for accent (BrE, AmE, DE) and context (Audio Tour (12 words), Lecture (11 words), Job Pitch (11 words)). Error bars 95 % bars show substantial overlap in the confidence intervals between bars, indicating that there are no strong differences between the three accents and the three contexts. A logistic regression (comprehensibility is a yes/no variable) was applied to test the effects of context and accent as well as their interaction. The interaction effect could be removed (Deviance score = 5.58, df = 4, p = .23). The same applied to accent in the next step (Deviance score = 1.80, df = 2, p = .41), but not for the remaining factor, context (Wald = 17.77, df = 2, p < .01). Post-hoc analysis (Bonferroni) revealed that the Lecture was comprehended better than the Audio Tour and Job Pitch.
Results for interpretability in relation to context and accent are shown in Figure 3, with 95% confidence intervals. They demonstrate substantial overlap in the confidence intervals between bars, indicating that there are no strong differences between the three accents and three contexts. A logistic regression (interpretability is a yes/no variable) was applied to test the effects of context and accent as well as their interaction. The interaction effect could be removed (Deviance score between the model with the interaction and the model without interaction was 3.00, df = 4, p = .56). The same applied to accent in the next step (Deviance score = .42, df = 2, p > .05), but not for the remaining factor, context (Wald = 33.85, df = 2, p < .01). Post-hoc analysis (Bonferroni) revealed that interpretability was higher for the Lecture than the Audio Tour and Job Pitch.
Results for affect in relation to context and accent are shown in Figure 5, with 95% confidence intervals. An analysis of variance was applied to investigate the effects of context and accent and their interaction. There was no interaction effect (F(4, 381) = 1.90, p = 11, ηp2 = .02). There was no main effect for accent (F(2, 381) = 1.07, p = .31, ηp2 = .01), but there was for context (F(2, 381) = 20.29, p < .01, ηp2 = .10). Post- The results for dynamism are shown in Figure 6, with 95% confidence intervals. An analysis of variance was applied to investigate the effects of context and accent and their interaction. There was an interaction effect (F(4,384) = 75.59, p < .01, ηp2 = .06). There was no main effect for accent (F(2,384) = 1.42, p = .24, ηp2 = .01), but there was for context (F(2,384) = 8.19, p < .01, ηp2 = .04). Post-hoc comparisons (HSD) showed that Job Pitch (M = 2.97, SD = .78) evoked lower dynamism than the Audio Tour (M = 3.42, SD = .72) and Lecture (M = 3.26, SD = .71, p < .01), but the interaction clarified that this was due to the negative evaluation of the Dutch English Job Pitch. Table 4 gives the correlations between speech understandability and speaker evaluations. There was only one negative correlation between comprehensibility and affect, but with a low value (r = -.11, p = .03).

Conclusion and discussion
The aim of our experiment was to study whether accent (standard BrE, standard AmE, Dutch-accented English) and context (Lecture, Audio tour, Job pitch) affect Dutch listeners' speech understandability and speaker evaluations, and whether speech understandability and speaker evaluations are correlated.

Speech understandability, accent, context (RQ1)
The effects of accent and context on speech understandability (RQ1) are only significant for context. Accent did not influence speech understandability. This indicates that Dutch-accented English is as comprehensible as standard British and American English.
Context had an effect on interpretability and comprehensibility. The Lecture was more interpretable and comprehensible than the Audio Tour and the Job Pitch regardless of accent. Most listeners were highly educated, and might have had more frequent experience with academic educational settings. This concurs with Cargile (1997), who found that if listeners were confronted with an accent in a unexpected context, their responses to that speaker were negatively affected.
Speech understandability was high for all accents and contexts. The experiment was conducted with highly educated Dutch listeners who are generally regarded as having good English language skills (EF, 2018). This could mean that our listeners had high enough English fluency to easily deal with the content in our stimuli.
The high speech understandability caused a ceiling effect, which could have obscured the observed interpretability effects, and also contributed to weak, positive correlations between the three dimensions. This means that, when listeners could decipher words and phrases (intelligibility), this positively influenced their ability to understand them (comprehensibility) and to interpret speaker's intention (interpretability). Comprehensibility and interpretability were measured by asking listeners to confirm or deny the validity of a statement on the content (comprehensibility) and communicative purpose (interpretability) of each sample. This method might have positively biased results; however, the positive correlation between the three components suggests that higher intelligibility helps comprehensibility and interpretability.

Speaker evaluations, accent, context (RQ2)
With regard to the effects of accent and context on speaker evaluations (RQ2), it can be concluded that having a Dutch English accent has a negative effect on a speaker's status compared with standard British and American English. Context has no effect on the perceived status of a speaker. This indicates that accent matters more than context in perceptions of speaker status.
In terms of perceptions of affect and dynamism, context appears to be more important than accent, because context had an effect on affect and dynamism, with the Job Pitch context evoking lower affect and dynamism compared to the Lecture and Audio Tour contexts. Yet, even though the means for all three contexts were mostly neutral, and despite the fact that for dynamism and context the statistically significant results were caused by the Dutch English job pitch, it is striking that the Job Pitch context evoked such a negative response.

Correlations speech understandability and speaker evaluations (RQ3)
The last research question focused on correlations between speech understandability and speaker evaluations (RQ3). Status and dynamism were not correlated with speech understandability. This means that a listener's evaluation of the status and dynamism of a speaker is not related to their ability to comprehend a speaker. For affect there was a weak, negative correlation with comprehensibility. These results contradict Nejjari et al. (2012), who found a positive correlation between status and intelligibility and comprehensibility as well as between affect and comprehensibility. The explanation may be that their 2012 study involved British listeners, half of whom were not familiar with Dutch accented-English, who reacted to standard British English accent and two degrees of Dutch-accented English. These listeners probably needed to put more effort into comprehending accents and content than the L1 speakers of Dutch in the current study, who were all familiar with all three accents. They achieved high levels of speech understandability, resulting in the ceiling effect discussed above.

Limitations and future research
This study has a number of limitations. First, we only had one, male matched-guise speaker, because it was extremely challenging to find a speaker who could produce representative native as well as non-native English accents. However, future matched-guise studies could benefit from more diverse matched guise speakers. Second, the contexts for the questionnaire versions were presented in one order (filler, Lecture, Audio Tour, Job Pitch) without counterbalancing. We did not apply counterbalancing as this would have resulted in more questionnaire versions which would have required more respondents. We recommend counterbalancing for future studies. Third, the listeners were assumed to represent people who would likely be familiar with the three selected contexts. With hindsight, selecting professionals who regularly interact in these contexts, such as HR managers in the job interview context, would have been better.
The results for context provide nuance to earlier L1 and L2 accentedness studies. Future research should investigate other contexts and L2 and L1 accents to help language learners understand the effects of their accents and provide insights into which accents are deemed desirable for which contexts. The assessment of speech understandability in terms of Kachru and Smith's (2008) three components yielded results that provide useful insights into the levels at which listeners are able to understand speech. However, the specific method employed to measure comprehensibility and interpretability resulted in high scores, ceiling effects, and weak correlations between the speech understandability dimensions. This suggests that the questions might have positively impacted the results. Therefore, this method should be further validated with listener groups with various English language levels and language backgrounds. While the use of the matched-guise technique contributed to the validity of our results, since they cannot be attributed to voice characteristics of individual speakers (as might have been the case with verbal guises), future studies should aim to use more than one matched-guise speaker.
Although our methodology yielded insightful results in terms of the understanding and evaluations of speakers, future research should assess how L2 and L1 English accents affect L1 and L2 English speakers' behavior in specific contexts (see e.g. Purnell et al., 1999). With the increased globalization of academia and the job market, it is relevant to investigate how people view one another and behave towards each other on the basis of accents in these particular communication contexts.

Implications
Accent training aimed at becoming as L1 as possible for advanced Dutch learners of English need not be emphasized in language teaching when the aim is to be understandable and to evoke higher affect and dynamism. However, accent training to sound like an L1 speaker can be beneficial if the aim of language teaching is to evoke perceptions of high status. As context was shown to affect attitudes towards speakers, creating awareness of such potential effects can help learners understand the impression they make in English in different contexts.