Walk-through for the analyses of a Generalizability Study
The purpose of this walk-through is to introduce and document the analyses in R.
For this walk-through the packages lme4 (Bates et al. 2015), tidyverse (Wickham et al. 2019) and dplyr (Wickham et al. 2021) are used.
library(dplyr)
library(tidyverse)
library(lme4)The data for the analyses is available on the Open Science Framework (OSF) as a .csv file. The file can be downloaded making use of the link below or can be loaded directly making use of the following code.
library(readr)
Data <- read_csv("https://osf.io/h2fcw/download")The mixed effects model
In the manuscript we present several analyses that all have the same statistical mixed effects model at it’s core. The model can be written as
ywsp=β0∗Cons+β1∗Copytaskp+(ν0p+μ0ps+ψ0w+ϵ0psw)
with ywsp being the score of a pause time y for a certain word w, in a sentence s written by participant p. In the model we add the variable Copytaskp as a fixed effect to control for motor skills of participants.
We expect that pause times depend on the facets:
p (participant) reflected in the participant-specific deviation towards the intercept ν0p
s (sentence) reflected in the sentence-specific deviation towards the intercept μ0ps
w (word) reflected in the word-specific deviation towards the intercept ψ0w
All interactions between these facets are captured in the residual term ϵ0psw.
In the dataset the following variable names are used for these variables:
copy_taskparticipant_IDsentence_IDword_ID
In what follows we will present the code for the different analyses presented in the manuscript. The analyses for the first generalizability study concerning all words are described in more detail. For the other analyses similar steps are taken so we will simply present the code so all the analyses can be replicated.
All words
Between words
Estimate the variances
The variable between_words contains the pause times between words. To estimate the model the function lmer() from the package lme4 is used.
With print(VarCorr(Model_between_words), comp = "Variance", digits = 4) the estimates of the variances based on the linear mixed effects model are extracted.
Model_between_words <-
lmer(between_words ~ 1 +
copy_task +
(1|participant_ID) +
(1|sentence_ID) +
(1|word_ID),
data=Data,
REML = F)
print(VarCorr(Model_between_words),
comp = "Variance",
digits = 4) Groups Name Variance
word_ID (Intercept) 4412600.4
participant_ID (Intercept) 110701.0
sentence_ID (Intercept) 936.3
Residual 1937654.3
Calculate the G coefficient
Now that we have estimated the variances we can use these variance estimates to calculate the Generalizability coefficient. For this calculation we make use of formula 2 from the manuscript. The calculation is as follows in R:
G_all_bw <- 110701.0 /
(110701.0 +
(936.3/27) +
(4412600.4/650) +
(1937654.3/(4.75*27))
)
G_all_bw[1] 0.8346433
So the resulting generalizability score for the differences between the participants is 0.83 .
D study
The D study builds on the G study. Formula 2 from the manuscript is used. First we create an object called Nsentences that contains integers from 1 to 40. In the calculations we replace the number 27 by this object to calculate 40 G coefficients dependent on the number of sentences going from 1 sentence to 40 sentences. The results are stored in the object `G_all_bw_sim`.
Nsentences <- 1:40
G_all_bw_sim <- 110701.0 /
(110701.0 +
(936.3/Nsentences) +
(4412600.4/650) +
(1937654.3/(4.75*Nsentences))
)With this information we can create the plot that is used in the manuscript.
plot(x = Nsentences, y = G_all_bw_sim,
type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times between all words
with an average of 4.75 words per sentence"
)
abline(h=0.7, lty=2)Within words
Estimate the variances
Model_within_words <-
lmer(within_words ~ 1 +
copy_task +
(1|participant_ID) +
(1|sentence_ID) +
(1|word_ID),
data = Data,
REML = F)
print(VarCorr(Model_within_words),
comp = "Variance",
digits = 4) Groups Name Variance
word_ID (Intercept) 2504.24
participant_ID (Intercept) 4068.67
sentence_ID (Intercept) 36.48
Residual 26288.40
Calculate the G coefficient
G_all_within <- 4068.67 /
(4068.67 +
(36.48/27)+
(2504.24/697)+
(26288.40/(4.75*27))
)
G_all_within[1] 0.9509367
D Study
Nsentences <- 1:40
G_all_within_sim <- 4068.67 /
(4068.67 +
(36.48/Nsentences) +
(2504.24/697) +
(26288.40/(4.75*Nsentences))
)plot(Nsentences,
G_all_within_sim,
type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times within all words
with an average of 4.75 words per sentence")
abline(h=0.7,lty=2)Nouns
An object is created with the data on Nouns only by the following code.
Nouns <- filter(Data, word_class == "NOUN")Between words
Estimate the variances
Model_between_words_nouns <- lmer(
between_words ~ 1 +
copy_task +
(1|participant_ID) +
(1|sentence_ID) +
(1|target_noun),
data = Nouns,
REML = F)
print(VarCorr(Model_between_words_nouns),
comp = "Variance",
digits = 7) Groups Name Variance
participant_ID (Intercept) 217859.57
target_noun (Intercept) 63572.77
sentence_ID (Intercept) 0.00
Residual 2700891.49
Calculate the G coefficients
G_nouns_bw <- 217859.57 /
(217859.57 +
(0/27) +
(63572.77/70) +
(2700891.49/(1.75*27))
)
G_nouns_bw[1] 0.789548
D Study
Nsentences <- 1:40
G_nouns_bw_sim <-
217859.27 /
(217859.27 +
(0/Nsentences) +
(63572.77/70) +
(2700891.49/(1.75*Nsentences)))plot(Nsentences,
G_nouns_bw_sim,
type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times between target nouns
with an average of 1.75 nouns per sentence")
abline(h=0.7,lty=2)Within words
Estimate the variances
Model_within_words_nouns <-lmer(
within_words ~ 1 +
copy_task +
(1|participant_ID) +
(1|sentence_ID) +
(1|target_noun),
data=Nouns,
REML = F)
print(VarCorr(Model_within_words_nouns),
comp = "Variance",
digits = 7) Groups Name Variance
participant_ID (Intercept) 3403.78947
target_noun (Intercept) 1585.30854
sentence_ID (Intercept) 52.90428
Residual 16211.28364
Calculate the G coefficient
G_nouns_wi <- 3403.8/
(3403.8+
(52.9/27) +
(1585.3/70) +
(16211.3/(1.75*27))
)
G_nouns_wi[1] 0.902505
D Study
Nsentences<-1:40
G_nouns_wi_sim <- 3403.8 /
(3403.8 +
(52.9/Nsentences) +
(1585.3/70) +
(16211.3/(1.75*Nsentences))
)plot(Nsentences,
G_nouns_wi_sim,
type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times within target nouns
with an average of 1.75 nouns per sentence")
abline(h=0.7,lty=2)Verbs
Verbs <- filter(Data, word_class=="VERB")Between words
Estimate the variances
Model_between_words_verbs <- lmer(
between_words ~ 1 +
copy_task +
(1|participant_ID) +
(1|sentence_ID) +
(1|target_verb),
data=Verbs,
REML = F)
print(VarCorr(Model_between_words_verbs),
comp = "Variance",
digits = 6) Groups Name Variance
participant_ID (Intercept) 72092.73
sentence_ID (Intercept) 2381.46
target_verb (Intercept) 21249.20
Residual 1638076.52
Calculate the G coefficient
G_verbs_bw <- 72093/
(72093 +
(2381/27) +
(21249/40) +
(1638077/(1*27))
)
G_verbs_bw[1] 0.5405005
D Study
Nsentences <- 1:80
G_verbs_bw_sim <- 72093/
(72093 +
(2381/Nsentences) +
(21249/40) +
(1638077/(1*Nsentences))
)plot(Nsentences,
G_verbs_bw_sim,
type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times
between target verbs")
abline(h=0.7,lty=2)Within words
Estimate the variances
Model_within_words_verbs <- lmer(
within_words ~ 1 +
copy_task +
(1|participant_ID) +
(1|sentence_ID) +
(1|target_verb),
data=Verbs,
REML = F)
print(VarCorr(Model_within_words_verbs),
comp = "Variance",
digits = 6) Groups Name Variance
participant_ID (Intercept) 3838.52
sentence_ID (Intercept) 0.00
target_verb (Intercept) 1472.12
Residual 14694.66
Calculate the G coefficient
G_verbs_wi <- 3838.52 /
(3838.52 +
(0/27) +
(1472.12/40) +
(14694.66/(1*27))
)
G_verbs_wi[1] 0.868528
D Study
Nsentences <- 1:40
G_verbs_wi_sim <- 3838.52 /
(3838.52 +
(0/Nsentences) +
(1472.12/40) +
(14694.66/(1*Nsentences))
)plot(Nsentences,
G_verbs_wi_sim,
type = 'l',
xlab = "number of sentences",
ylab = "generalizability",
ylim = (0:1),
main = "Generalizability for pause times
within target verbs")
abline(h=0.7,lty=2)