Predictive contexts can facilitate word processing, in the sense of increasing reading speed (e.g., Rayner et al., 2011) and decreasing neuronal activation elicited during recognition of expected words (as reflected, for example, in the reduction of the N400 component of the event-related brain potential; e.g., Kutas & Federmeier, 2011; Kutas & Hillyard, 1984). These often-replicated findings have been linked to domain-general theories that postulate active, top-down (i.e., from hierarchically higher to lower processing levels) prediction of expected stimulus characteristics before actually perceiving the stimulus (predictive coding; cf. Friston, 2005; Rao & Ballard, 1999). In line with this, several neuro-cognitive models of visual word recognition (e.g., Carreiras et al., 2014; Seidenberg & McClelland, 1989) assume context-based prediction across multiple levels of linguistic processing, and it has been hypothesized that hierarchical predictions during reading involve the pre-activation of visual, pre-lexical (i.e., orthographic or phonological), and lexical-semantic representations of predicted words (Federmeier, 2007; Kuperberg & Jaeger, 2016). However, this proposal has not been systematically tested because currently, available evidence does not unambiguously differentiate between predictive pre-activation of representations at these different linguistic processing stages.
First studies showed that language-related brain regions can be activated before a highly expected stimulus appears (Bonhage et al., 2015; Dikker & Pylkkanen, 2011; Wang et al., 2017), but did not further assess the nature of pre-activated representations. More recent work showed pre-stimulus effects of semantic category (Heikel et al., 2018; Wang et al., 2020) using multivariate pattern analysis techniques (King et al., 2018; Kragel et al., 2018), as well as pre-stimulus effects of word frequency (Fruchter et al., 2015) using linear mixed models (LMMs; Baayen et al., 2008). These studies provide initial evidence that lexical-semantic representations are pre-activated before a predictable word appears. Also, several studies manipulating lexical-semantic context (e.g., the sentence context preceding the target word) found context-based modulations of brain activation in time windows associated with visual or pre-lexical processing (e.g., Brothers et al., 2015; Lee et al., 2012), which however provides only indirect evidence for predictive processing at other levels of linguistic representation than the lexical-semantic level. Studies that investigated pre-lexical context effects more directly found no (Eisenhauer et al., 2019) or only minimal effects (Nicenboim et al., 2020; Nieuwland et al., 2018), indicating that more statistically robust approaches may be needed for identifying the processes involved in pre-lexical pre-activation (see also Nieuwland, 2019, for a review).
Here, we combine behavioral and magnetoencephalography (MEG) data elicited during processing of words and (orthographically legal and pronounceable) pseudowords, to investigate the mechanisms of predictive pre-activation at multiple levels of linguistic representation, that is, visual, pre-lexical (orthographic and phonological), and lexical-semantic. Contextual predictability was explicitly controlled by using a repetition priming paradigm, which is a common approach for investigating predictive processing (e.g., Auksztulewicz & Friston, 2016; Grotheer & Kovács, 2016). In a first step, we conducted a behavioral experiment to determine which representational levels of visual word recognition are influenced by predictive processing. In detail, we determined whether context-dependent facilitation (priming) interacts with quantitative metrics from psycholinguistics representing different stages of word processing, that is, visual stimulus complexity (early visual processing; e.g., Pelli et al., 2006), orthographic word similarity (pre-lexical orthographic processing; e.g., Yarkoni, Balota, et al., 2008), the number of syllables (pre-lexical phonological processing; e.g., Álvarez et al., 2004), and word frequency and stimulus lexicality (lexical-semantic processing; e.g., Fiebach et al., 2002; Forster & Chambers, 1973). Subsequently, MEG activity measured in an independent experiment was explored strictly for those metrics that interacted in behavior with contextual predictability (i.e., whose effects differed between primed vs. unprimed words). This procedure prevented the investigation of neurophysiological effects without a behavioral counterpart, which would be difficult to interpret (Krakauer et al., 2017).
We investigated MEG data given its excellent temporal resolution (Gross, 2019) to separate effects of predictive pre-activation and stimulus processing. We first investigated the neurophysiological correlates of predictability across representational levels during stimulus processing by assessing the interaction of context effects with psycholinguistic metrics. Based on the observed pattern of context effects, we were able to differentiate whether predictive processing in visual word recognition is based on predictive coding (according to which predictable stimulus features are “explained away”) or on a “sharpening” mechanism (which postulates the suppression of noise for predictable stimuli; cf. Blank & Davis, 2016; Kok et al., 2012). Crucially, we also assessed effects of psycholinguistic metrics in the delay period prior to predictable target letter strings. Detecting these effects would provide direct evidence for predictive pre-activation at the associated representational level. As a critical test case for a mechanistic contribution of the respective linguistic processing stage to context-dependent predictability, we hypothesized that the strength of pre-activation effects in the delay should be inversely related to the strength of neural effects measured during processing of the target. Finally, we localized the brain regions underlying the observed effects to assess whether the brain regions implicated in letter string processing are also implicated in predictive pre-activation.
2 METHOD 2.1 Behavioral experiment 2.1.1 ParticipantsForty-nine healthy, right-handed, native speakers of German recruited from university campuses (33 females, mean age 24.7 ± 4.9 years, range: 18–39 years) were included in the final data analyses. All participants had normal or corrected-to-normal vision, and normal reading abilities as assessed with the adult version of the Salzburg Reading Screening (the unpublished adult version of Mayringer & Wimmer, 2003). Further participants were excluded before the experiment due to low reading skills (i.e., reading test score below 16th percentile; N = 3) or participation in a similar previous experiment (N = 1), and during the course of the experiment due to failure to complete the experimental protocol (N = 8) and because of an experimenter error (mix-up of the pseudoword lists during the pre-experiment familiarization procedure; N = 1). All participants gave written informed consent according to procedures approved by the local ethics committee (Department of Psychology, Goethe University Frankfurt, application N° 2015-229) and received 10 € per hour or course credit as compensation.
2.1.2 Stimuli and presentation procedureSixty words and 180 pseudowords (five letters each) were presented (black on white background, 14 pt., 51 cm viewing distance) in a repetition priming experiment consisting of two priming blocks and two non-priming blocks (120 trials per block; Figure 1a) with a total duration of ~20 min. This paradigm allows for strong predictions while maintaining the aspect of natural reading that letter strings are processed sequentially. For the present study, we focused on a subset of 60 pseudowords that were unfamiliar to the participants (“novel pseudoword” condition). However, note that around 90 min prior to the priming experiment described here, the pseudowords had been presented to the participants in another experiment not of interest for the present study, without any learning instruction. In addition, the experiment contained two further sets of 60 pseudowords each, which were familiarized prior to the experiment as described in Eisenhauer et al. (2019; behavioral experiment). A description of the learning procedure and the results can be found in Supporting Results 2. Participants learned meanings for one of these pseudoword sets (i.e., “semantic pseudowords”); however, this set was not part of the present analysis as no comparable condition was included in the MEG dataset. The pseudowords from the final set were familiarized via repeated presentation and reading aloud without learning a meaning for these pseudowords. Thus, these “familiar pseudowords” were not associated with lexical-semantic representations, but were nevertheless familiarized at a pre-lexical level as participants gained familiarity with the orthographic and phonological structure of these pseudowords. These pseudowords were included in a control analysis (see below). The assignment of pseudoword sets to the three conditions (i.e., novel pseudowords, familiar pseudowords and semantic pseudowords) was varied across participants.
Experimental procedures. (a) Behavioral repetition priming paradigm. Two priming blocks (left) and two non-priming blocks (right) were presented in alternating order. In priming blocks, each trial consisted of a prime and a target stimulus presented for 800 ms each, separated by an interval of 800 ms during which a string of five hash marks was presented. Stimuli could be words or pseudowords (PW). 75% of trials were repetition trials with identical prime and target, while in the remaining 25% two different letter strings were presented (non-repetition trials; not analyzed). In this case, prime and target could be from the same or from two different conditions, with all combinations of conditions appearing equally often. In non-priming blocks, only one word or pseudoword was presented in each trial. Participants were instructed to respond on each trial whether or not they had a semantic association with the target in priming blocks or with the isolated item in non-priming blocks. Before onset of the prime or the isolated item, two black vertical bars presented for 800–1,200 ms indicated the center of the screen where participants were asked to fixate. Context effects were investigated by comparing isolated items from the non-priming blocks with the targets from the repetition trials. (b) Repetition priming paradigm during magnetoencephalography recording. The presentation procedure was identical to the priming blocks in (a). There were no non-priming blocks. Additionally, after target offset two grey vertical bars were presented for 1,000 ms indicating a blinking period. Before the onset of the next trial, a blank screen was presented for 500 ms. Participants were instructed to silently read presented letter strings and to respond only to rare catch trials (i.e., presentation of the word Taste, Engl. button). Context effects were investigated by comparing primes to repeated targets
In priming blocks, a prime and a target stimulus were presented for 800 ms each, separated by a delay period of 800 ms during which a string of five hash marks was presented. Prime and target were identical in 75% of trials. In non-priming blocks, a single letter string was presented for 800 ms in each trial. We choose extended presentation durations for the MEG study to separate effects of stimulus processing and pre-activation, which we expected during the delay period. Thus, for comparability, we also adopted the timing for the behavioral study. The inter-trial-interval was jittered between 800 and 1,200 ms (mean: 1,000 ms). Participants were asked to fixate the space between two vertical black bars at the center of the screen. Upon presentation of a letter string between the two lines, they had to indicate as quickly and accurately as possible whether it had a semantic association or not (which was the case for 50% of items, i.e., for words and for one list of pseudowords that had been semantically familiarized prior to the experiment). For simplicity, these judgments will be called lexical decisions in the following. The novel and familiar pseudowords used for the present analyses had no semantic associations. In priming blocks, participants responded only to the second (i.e., the target) letter string. Response hands and the order of blocks were counterbalanced across participants. Each letter string was presented in one priming trial and one non-priming trial.
Words and pseudowords were matched between lists (i) for orthographic similarity (word likeness) using the Orthographic Levenshtein Distance 20 (OLD20; Yarkoni, Balota, et al., 2008) based on the SUBTLEX-DE database (Brysbaert et al., 2011) and (ii) with respect to the number of syllables (computed via Balloon, cf. Reichel, 2012; see also Table 1). Other psycholinguistic metrics of interest for our analyses were logarithmic word frequency and trigram frequency (i.e., the mean frequency of each trigram per word; obtained from SUBTLEX-DE), as well as visual complexity measures (perimetric complexity and the number of simple features), which were obtained for each letter from the GraphCom database (Chang et al., 2018) and averaged across the five letters of each stimulus (Table 1). The other two visual complexity parameters from GraphCom, that is, the number of connected points and the number of disconnected components, were not chosen for our analyses, the former due to its high correlation with the number of simple features (>0.8 in our stimuli) and the latter due to its low variance across letters of the German language. For parameter correlations within words and pseudowords, see Figure S1 in Supporting Information.
TABLE 1. Stimulus parameters of words and pseudowords (PW) in the behavioral and magnetoencephalography experiments Minimum 1st Quartile Median 3rd Quartile Maximum Mean SE Behavior: Logarithmic word frequency Words 0.000 1.518 1.971 2.189 3.301 1.933 0.095 Behavior: OLD20 Words 1.000 1.288 1.650 1.750 1.950 1.538 0.038 PW set 1 1.000 1.500 1.650 1.762 2.000 1.605 0.032 PW set 2 1.000 1.288 1.650 1.850 2.100 1.542 0.045 PW set 3 1.000 1.337 1.700 1.850 2.300 1.596 0.044 Behavior: Trigram frequency Words 1.806 2.556 2.793 3.162 3.773 2.837 0.056 PW set 1 1.342 2.623 2.856 3.067 3.514 2.826 0.047 PW set 2 1.176 2.567 2.833 3.136 3.136 2.792 0.063 PW set 3 1.437 2.576 2.844 3.237 3.602 2.808 0.061 Behavior: Number of syllables Words 1.00 2.00 2.00 2.00 3.00 1.833 0.059 PW set 1 1.00 2.00 2.00 2.00 2.00 1.95 0.028 PW set 2 1.00 2.00 2.00 2.00 2.00 1. 967 0.023 PW set 3 1.00 2.00 2.00 2.00 2.00 1.9 0.039 Behavior: Perimetric complexity Words 4.400 5.800 6.400 6.600 8.200 6.354 0.093 PW set 1 5.000 5.800 6.300 6.800 7.800 6.320 0.083 PW set 2 5.200 6.200 6.600 7.000 7.800 6.553 0.083 PW set 3 5.000 5.950 6.600 7.000 7.800 6.457 0.090 Behavior: Number of simple features Words 1.400 1.950 2.000 2.200 2.600 2.031 0.031 PW set 1 1.200 1.800 2.000 2.200 2.600 1.950 0.038 PW set 2 1.200 1.800 2.000 2.200 2.600 2.003 0.040 PW set 3 1.400 1.800 2.000 2.200 2.400 1.973 0.040 MEG: Logarithmic word frequency Words 0.000 1.512 2.229 2.858 4.032 2.137 0.115 MEG: OLD20 Words 1.600 1.750 1.850 1.900 2.050 1.825 0.013 novel PW 1.250 1.637 1.750 1.863 2.300 1.743 0.027 familiar PW 1.250 1.600 1.725 1.863 2.100 1.717 0.026 MEG: Trigram frequency Words 1.773 2.573 2.785 3.070 3.778 2.804 0.049 novel PW 1.176 2.402 2.684 2.915 3.597 2.649 0.060 familiar PW 1.342 2.585 2.684 2.919 3.425 2.670 0.407 MEG: Number of syllables Words 1.00 2.00 2.00 2.00 2.00 1.817 0.050 novel PW 1.00 2.00 2.00 2.00 2.00 1.933 0.032 familiar PW 1.00 2.00 2.00 2.00 2.00 1.95 0.028 MEG: Perimetric complexity Words 5.000 5.800 6.400 6.800 7.600 6.260 0.086 novel PW 5.000 5.800 6.400 6.800 8.400 6.323 0.091 familiar PW 5.200 5.800 6.200 6.800 7.800 6.327 0.627 MEG: Number of simple features Words 1.400 2.000 2.200 2.400 2.800 2.160 0.038 novel PW 1.200 1.800 2.000 2.200 2.400 1.983 0.040 familiar PW 1.200 1.800 2.000 2.200 2.400 1.913 0.298 Note See Figure S1 for parameter correlations. 2.1.3 Analyses Statistical modelingLinear mixed models (LMMs) were used to investigate the three-way and subordinate two-way interactions of each of the four visual and pre-lexical word parameters (see below) with the factors context (i.e., the priming effect of primed vs. unprimed stimuli) and stimulus lexicality (words vs. novel pseudowords) in log-transformed response times, using the lmerTest package (Kuznetsova et al., 2017) of the statistical software package R, version 3.5.3, 2019-03-11 (R Development Core Team, 2008). The model structure is shown in the left panel of Figure 2. In the case of the word frequency parameter, only the interaction with context was included, as an interaction with lexicality is not possible (all pseudowords were assigned a word frequency of zero). Note that in the behavioral study, the factor context was operationalized as the contrast between repeated targets in priming blocks vs. single items in non-priming blocks. Therefore, “context” here represents the effect of the presence vs. absence of contextual information on processing of the target stimulus, whereby only valid contextual information was considered while trials in which the target was preceded by a non-identical prime were discarded (analogous to the MEG experiment; see below). As a consequence, the priming condition had fewer trials (0.75 × 60 = 45; minus errors) than the non-priming condition. However, LMMs with crossed random effects are optimal for the analysis of imbalanced data (Baayen et al., 2008). The two-way interaction terms between word parameters (e.g., word frequency) and context were used to determine whether the effect of the respective word parameter was modulated by a predictive context. The three-way interaction with lexicality additionally revealed whether the “context by word parameter” modulations differed between words and pseudowords. This allowed us to assess whether context-based facilitation at the respective level relies on prior knowledge, which is available for words but not pseudowords. Note that we included these interaction terms, that is, with context and lexicality, for all word parameters within one single LMM. Trials in which errors occurred (9.8%) were excluded from analyses. We know from previous experience that trial order can have a strong effect on response times and neuronal activation. To explicitly account for this, trial order was included into the LMMs as fixed effect. Participant and item were included as random effects on the intercept. For visualization of partial effects, i.e., effects of a parameter of interest after partialling out all other effects in the LMM, we used the remef package (Hohenstein & Kliegl, 2015).
Analysis pipeline for behavioral (top) and magnetoencephalography (MEG) data (right). In the behavioral experiment, a linear mixed model assessed the interaction of context (isolated vs. repeated letter strings) with various psycholinguistic metrics associated to visual, orthographic, phonological or lexical-semantic representational levels. In the MEG experiment, linear mixed models were estimated for each time points and included only those psycholinguistic metrics that were found significant in behavior. For each significant effect in the MEG data, partial effects were estimated from the linear mixed models. Pairwise correlations between partial effects were computed across time point. Brain regions underlying the partial effects were identified via source localization. ERF, event-related field; n.s., not significant; RT, response time. *denotes interactions and (1| …) denotes random effects on the intercept
Investigated word parametersDifferent descriptors of words were chosen in order to isolate different “levels” of processing a word. Early visual processing of a written word depends on the complexity of its physical appearance, which we here characterize, following Chang et al. (2018), using perimetric complexity (describing the density of black pixels in relation to white background which has previously been associated with letter identification efficiency; see Pelli et al., 2006) and the number of simple features that make up a word (i.e., the number of strokes per letter). Pre-lexical processing of written words comprises phonological and orthographic processing (e.g., see Carreiras et al., 2014). Phonological processing can be captured by the number of syllables of a word (as syllables reflect sublexical units for sequential phonological processing; e.g., Álvarez et al., 2004; Chetail, 2014), and orthographic processing is captured by the Orthographic Levenshtein Distance 20 (OLD20, Yarkoni, Balota, et al., 2008) and by trigram frequency (e.g., Chen et al., 2015; Colegate & Eriksen, 1972). Lastly, lexical processes of word identification are often associated with word frequency (e.g., Fiebach et al., 2002; Forster & Chambers, 1973), so that logarithmic word frequency (Brysbaert et al., 2011) is included as a parameter representing lexical-semantic processing. Pseudowords were assigned a word frequency of zero as they did not appear in the SUBTLEX database. Finally, we included a binary lexicality contrast (words vs. pseudowords) comparing items with and without semantic associations investigating lexical-semantic processing.
Model comparisonsIn cases with non-significant interaction effects on one or more processing levels, we repeated the analysis with a simpler model in which only the significant interactions and main effects were included. This sparser model was compared to the full model based on the difference in the Akaike information criterion (AIC). The AIC allows comparing models of different complexity (i.e., with more or fewer parameters included). A significantly lower AIC for a more complex model indicates an increase in model fit with the newly added parameter. If the AIC difference is positive or equal, the sparser model has a better fit, and the addition of the new parameter is not advised. Our results and interpretations will be based on the model that includes the set of parameters that lead to an optimal fit.
Control analysesBesides our main analysis of interest described above, we performed two control analyses. First, we re-estimated the LMM with optimal fit while including the familiar as opposed to the novel pseudowords. This allowed us to assess whether the observed lexicality effects are driven by lexical-semantic information, which is available for words but neither pseudoword group. If this is the case, lexicality effects should be observed both for words vs. novel pseudowords as well as for words vs. familiar pseudowords. In contrast, if lexicality effects are based on the different general familiarity with words vs. novel pseudowords, the lexicality effect should be diminished when contrasting words with familiar pseudowords.
The analyses so far were focused on trials of repeated targets and isolated items. Trials of non-repeated targets, that is, in which prime and target were not identical, were seldom (12.5% of trials). In a second cont
Comments (0)