Walk-through for the analyses of a Generalizability Study
The purpose of this walk-through is to introduce and document the analyses in R.
For this walk-through the packages lme4
(Bates et al. 2015), tidyverse
(Wickham et al. 2019) and dplyr
(Wickham et al. 2021) are used.
library(dplyr)
library(tidyverse)
library(lme4)
The data for the analyses is available on the Open Science Framework (OSF) as a .csv
file. The file can be downloaded making use of the link below or can be loaded directly making use of the following code.
library(readr)
<- read_csv("https://osf.io/h2fcw/download") Data
The mixed effects model
In the manuscript we present several analyses that all have the same statistical mixed effects model at it’s core. The model can be written as
ywsp=β0∗Cons+β1∗Copytaskp+(ν0p+μ0ps+ψ0w+ϵ0psw)
with ywsp being the score of a pause time y for a certain word w, in a sentence s written by participant p. In the model we add the variable Copytaskp as a fixed effect to control for motor skills of participants.
We expect that pause times depend on the facets:
p (participant) reflected in the participant-specific deviation towards the intercept ν0p
s (sentence) reflected in the sentence-specific deviation towards the intercept μ0ps
w (word) reflected in the word-specific deviation towards the intercept ψ0w
All interactions between these facets are captured in the residual term ϵ0psw.
In the dataset the following variable names are used for these variables:
copy_task
participant_ID
sentence_ID
word_ID
In what follows we will present the code for the different analyses presented in the manuscript. The analyses for the first generalizability study concerning all words are described in more detail. For the other analyses similar steps are taken so we will simply present the code so all the analyses can be replicated.
All words
Between words
Estimate the variances
The variable between_words
contains the pause times between words. To estimate the model the function lmer()
from the package lme4
is used.
With print(VarCorr(Model_between_words), comp = "Variance", digits = 4)
the estimates of the variances based on the linear mixed effects model are extracted.
<-
Model_between_words lmer(between_words ~ 1 +
+
copy_task 1|participant_ID) +
(1|sentence_ID) +
(1|word_ID),
(data=Data,
REML = F)
print(VarCorr(Model_between_words),
comp = "Variance",
digits = 4)
Groups Name Variance
word_ID (Intercept) 4412600.4
participant_ID (Intercept) 110701.0
sentence_ID (Intercept) 936.3
Residual 1937654.3
Calculate the G coefficient
Now that we have estimated the variances we can use these variance estimates to calculate the Generalizability coefficient. For this calculation we make use of formula 2 from the manuscript. The calculation is as follows in R:
<- 110701.0 /
G_all_bw 110701.0 +
(936.3/27) +
(4412600.4/650) +
(1937654.3/(4.75*27))
(
)
G_all_bw
[1] 0.8346433
So the resulting generalizability score for the differences between the participants is 0.83 .
D study
The D study builds on the G study. Formula 2 from the manuscript is used. First we create an object called Nsentences
that contains integers from 1 to 40. In the calculations we replace the number 27 by this object to calculate 40 G coefficients dependent on the number of sentences going from 1 sentence to 40 sentences. The results are stored in the object `G_all_bw_sim`.
<- 1:40
Nsentences
<- 110701.0 /
G_all_bw_sim 110701.0 +
(936.3/Nsentences) +
(4412600.4/650) +
(1937654.3/(4.75*Nsentences))
( )
With this information we can create the plot that is used in the manuscript.
plot(x = Nsentences, y = G_all_bw_sim,
type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times between all words
with an average of 4.75 words per sentence"
)
abline(h=0.7, lty=2)
Within words
Estimate the variances
<-
Model_within_words lmer(within_words ~ 1 +
+
copy_task 1|participant_ID) +
(1|sentence_ID) +
(1|word_ID),
(data = Data,
REML = F)
print(VarCorr(Model_within_words),
comp = "Variance",
digits = 4)
Groups Name Variance
word_ID (Intercept) 2504.24
participant_ID (Intercept) 4068.67
sentence_ID (Intercept) 36.48
Residual 26288.40
Calculate the G coefficient
<- 4068.67 /
G_all_within 4068.67 +
(36.48/27)+
(2504.24/697)+
(26288.40/(4.75*27))
(
)
G_all_within
[1] 0.9509367
D Study
<- 1:40
Nsentences
<- 4068.67 /
G_all_within_sim 4068.67 +
(36.48/Nsentences) +
(2504.24/697) +
(26288.40/(4.75*Nsentences))
( )
plot(Nsentences,
G_all_within_sim,type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times within all words
with an average of 4.75 words per sentence")
abline(h=0.7,lty=2)
Nouns
An object is created with the data on Nouns only by the following code.
<- filter(Data, word_class == "NOUN") Nouns
Between words
Estimate the variances
<- lmer(
Model_between_words_nouns ~ 1 +
between_words +
copy_task 1|participant_ID) +
(1|sentence_ID) +
(1|target_noun),
(data = Nouns,
REML = F)
print(VarCorr(Model_between_words_nouns),
comp = "Variance",
digits = 7)
Groups Name Variance
participant_ID (Intercept) 217859.57
target_noun (Intercept) 63572.77
sentence_ID (Intercept) 0.00
Residual 2700891.49
Calculate the G coefficients
<- 217859.57 /
G_nouns_bw 217859.57 +
(0/27) +
(63572.77/70) +
(2700891.49/(1.75*27))
(
) G_nouns_bw
[1] 0.789548
D Study
<- 1:40
Nsentences
<-
G_nouns_bw_sim 217859.27 /
217859.27 +
(0/Nsentences) +
(63572.77/70) +
(2700891.49/(1.75*Nsentences))) (
plot(Nsentences,
G_nouns_bw_sim,type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times between target nouns
with an average of 1.75 nouns per sentence")
abline(h=0.7,lty=2)
Within words
Estimate the variances
<-lmer(
Model_within_words_nouns ~ 1 +
within_words +
copy_task 1|participant_ID) +
(1|sentence_ID) +
(1|target_noun),
(data=Nouns,
REML = F)
print(VarCorr(Model_within_words_nouns),
comp = "Variance",
digits = 7)
Groups Name Variance
participant_ID (Intercept) 3403.78947
target_noun (Intercept) 1585.30854
sentence_ID (Intercept) 52.90428
Residual 16211.28364
Calculate the G coefficient
<- 3403.8/
G_nouns_wi 3403.8+
(52.9/27) +
(1585.3/70) +
(16211.3/(1.75*27))
(
)
G_nouns_wi
[1] 0.902505
D Study
<-1:40
Nsentences<- 3403.8 /
G_nouns_wi_sim 3403.8 +
(52.9/Nsentences) +
(1585.3/70) +
(16211.3/(1.75*Nsentences))
( )
plot(Nsentences,
G_nouns_wi_sim,type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times within target nouns
with an average of 1.75 nouns per sentence")
abline(h=0.7,lty=2)
Verbs
<- filter(Data, word_class=="VERB") Verbs
Between words
Estimate the variances
<- lmer(
Model_between_words_verbs ~ 1 +
between_words +
copy_task 1|participant_ID) +
(1|sentence_ID) +
(1|target_verb),
(data=Verbs,
REML = F)
print(VarCorr(Model_between_words_verbs),
comp = "Variance",
digits = 6)
Groups Name Variance
participant_ID (Intercept) 72092.73
sentence_ID (Intercept) 2381.46
target_verb (Intercept) 21249.20
Residual 1638076.52
Calculate the G coefficient
<- 72093/
G_verbs_bw 72093 +
(2381/27) +
(21249/40) +
(1638077/(1*27))
(
) G_verbs_bw
[1] 0.5405005
D Study
<- 1:80
Nsentences
<- 72093/
G_verbs_bw_sim 72093 +
(2381/Nsentences) +
(21249/40) +
(1638077/(1*Nsentences))
( )
plot(Nsentences,
G_verbs_bw_sim,type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times
between target verbs")
abline(h=0.7,lty=2)
Within words
Estimate the variances
<- lmer(
Model_within_words_verbs ~ 1 +
within_words +
copy_task 1|participant_ID) +
(1|sentence_ID) +
(1|target_verb),
(data=Verbs,
REML = F)
print(VarCorr(Model_within_words_verbs),
comp = "Variance",
digits = 6)
Groups Name Variance
participant_ID (Intercept) 3838.52
sentence_ID (Intercept) 0.00
target_verb (Intercept) 1472.12
Residual 14694.66
Calculate the G coefficient
<- 3838.52 /
G_verbs_wi 3838.52 +
(0/27) +
(1472.12/40) +
(14694.66/(1*27))
(
) G_verbs_wi
[1] 0.868528
D Study
<- 1:40
Nsentences
<- 3838.52 /
G_verbs_wi_sim 3838.52 +
(0/Nsentences) +
(1472.12/40) +
(14694.66/(1*Nsentences))
( )
plot(Nsentences,
G_verbs_wi_sim,type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times
within target verbs")
abline(h=0.7,lty=2)