vignettes/calculating-interrater-agreement.Rmd
calculating-interrater-agreement.RmdNote: This vignette is not run now because of changes in how the R package works
First, load the {birdseyevyu} package and the {tidyverse} suite of packages:
library(birdseyevyu)
library(tidyverse) # install via install.packages("tidyverse")
library(irr) # install via install.packages("irr")
library(patchwork) # install via install.packages("patchwork")
library(here) # install.packages("here")Next, let’s prepare the files we wish to analyze. To do so, we have to export them from the datavyu software, as follows:
.opf files:datavyu2csv.rb
.opf file should now be created.This is the directory (folder) passed to the datavyu functions below.
The {here} package can be used to flexibly (across computers/operating systems) specify file paths: To save on typing, the directory can be set for an entire R session via the following:
Frequency of codes; note that the code is the code listed appended to the column name after a period.
summarize_column(column = "LogClass_IS",
code = "LogClass_IS.i")Frequency of codes by file:
summarize_column(column = "LogClass_IS",
code = "LogClass_IS.i",
by_file = TRUE)Plot of duration (note that summary = "duration" can be added to any of the above) by file:
freq_summary <- summarize_column(column = "LogClass_IS",
code = "LogClass_IS.i",
by_file = TRUE,
summary = "duration")
plot_column_summary(freq_summary)
prepared_time_series_tm <- prep_time_series(column = "LogClass_IS",
code = "LogClass_IS.i",
specified_file = "TM 14-12-03 T201 Content Log")
plot_time_series(prepared_time_series_tm)
prepared_time_series_hh <- prep_time_series(column = "LogClass_IS",
code = "LogClass_IS.i",
specified_file = "HH T201 14-12-03 Content Log")
plot_time_series(prepared_time_series_hh)These could be composed together using the patchwork library:
plot_time_series(prepared_time_series_tm) +
plot_time_series(prepared_time_series_hh) +
plot_layout(ncol = 1)First, looking at data:
prepared_time_series_tm
prepared_time_series_hhWe’ll do a “full” join, re] taining all time stamps for both files. First, we must rename one (or both) of the two code columns. Having done this, we can easily compare the two once joined:
prepared_time_series_tm <- rename(prepared_time_series_tm, code_tm = code)
prepared_time_series_hh <- rename(prepared_time_series_hh, code_hh = code)
joined_data <- prepared_time_series_tm %>%
full_join(prepared_time_series_hh, by = "ts")
joined_dataWe can calculate agreement using the {irr} package, passing only the 2nd and 3rd columns (with the codes) to the function agree() (from the {irr} function):
agree(joined_data[, 2:3])We can do the same for Cohen’s Kappa using the kappa2 statistic:
kappa2(joined_data[, 2:3])