Note: This vignette is not run now because of changes in how the R package works

Loading, setting up

First, load the {birdseyevyu} package and the {tidyverse} suite of packages:

library(birdseyevyu)
library(tidyverse) # install via install.packages("tidyverse")
library(irr) # install via install.packages("irr")
library(patchwork) # install via install.packages("patchwork")
library(here) # install.packages("here")

1. Preparing the data in datavyu (the software—not the package!)

Next, let’s prepare the files we wish to analyze. To do so, we have to export them from the datavyu software, as follows:

  1. Run the following Ruby script within the datavyu software by selecting Script and then Run Script; select a directory with one or more .opf files:

datavyu2csv.rb

2. Open the directory that the Ruby script created; a number of CSV files for each .opf file should now be created.

This is the directory (folder) passed to the datavyu functions below.

The {here} package can be used to flexibly (across computers/operating systems) specify file paths: To save on typing, the directory can be set for an entire R session via the following:

options(directory = here("irr-data", "datavyu_output_11-16-2020_14-13"))

Summarizing a column

Frequency of codes; note that the code is the code listed appended to the column name after a period.

summarize_column(column = "LogClass_IS",
                 code = "LogClass_IS.i")

Frequency of codes by file:

summarize_column(column = "LogClass_IS",
                 code = "LogClass_IS.i",
                 by_file = TRUE)

Plot of duration (note that summary = "duration" can be added to any of the above) by file:

freq_summary <- summarize_column(column = "LogClass_IS",
                                 code = "LogClass_IS.i",
                                 by_file = TRUE,
                                 summary = "duration")

plot_column_summary(freq_summary)

Inter-rater agreement

Plots

prepared_time_series_tm <- prep_time_series(column = "LogClass_IS",
                                            code = "LogClass_IS.i",
                                            specified_file = "TM 14-12-03 T201 Content Log")

plot_time_series(prepared_time_series_tm)

prepared_time_series_hh <- prep_time_series(column = "LogClass_IS",
                                            code = "LogClass_IS.i",
                                            specified_file = "HH T201 14-12-03 Content Log")

plot_time_series(prepared_time_series_hh)

These could be composed together using the patchwork library:

plot_time_series(prepared_time_series_tm) +
  plot_time_series(prepared_time_series_hh) +
  plot_layout(ncol = 1)

Agreement statistics

First, looking at data:

prepared_time_series_tm
prepared_time_series_hh

We’ll do a “full” join, re] taining all time stamps for both files. First, we must rename one (or both) of the two code columns. Having done this, we can easily compare the two once joined:

prepared_time_series_tm <- rename(prepared_time_series_tm, code_tm = code)
prepared_time_series_hh <- rename(prepared_time_series_hh, code_hh = code)

joined_data <- prepared_time_series_tm %>% 
  full_join(prepared_time_series_hh, by = "ts")

joined_data

We can calculate agreement using the {irr} package, passing only the 2nd and 3rd columns (with the codes) to the function agree() (from the {irr} function):

agree(joined_data[, 2:3])

We can do the same for Cohen’s Kappa using the kappa2 statistic:

kappa2(joined_data[, 2:3])