Survey data I/O with likert
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This post is a tutorial on how to prepare different forms of Likert-style survey data for the R likert package and using its output to create 100% stacked-bar charts. I focus on preparing the data for likert()
input and editing its output for the final chart. For exploring the package functionality more fully, I recommend the tutorials by Laura Mudge (2019) and Jake Chanenson (2021).
In a companion post I develop the R script for constructing the 100% stacked-bar chart and discuss the rationale for selecting it as a more effective design for Likert-style survey data.
I use data.table, ggplot2, and likert R packages. An appealing feature of likert is its compatibility with data.table and ggplot2 functionality. Note that to reproduce this work, likert must be at least version 1.3.6 (currently the development version).
The R code for the post is listed under the “R code” pointers.
R code
# packages library("data.table") library("ggplot2") library("likert") # function based on likert.plot to construct a 100% stacked bar chart my_breaks <- seq(-100, 100, 10) likert_100_pct_bar <- function(likert_list) { plot(likert_list, plot.percent.neutral = FALSE, plot.percent.high = FALSE, plot.percent.low = FALSE, neutral.color = "grey90", include.center = TRUE, centered = FALSE) + geom_hline(yintercept = my_breaks, color = "white", size = 0.25) + scale_y_continuous(limits = c(0, 100), breaks = my_breaks, sec.axis = sec_axis( # second scale trans = function(z) z - 100, breaks = my_breaks, labels = as.character(abs(my_breaks)))) + theme(panel.background = element_blank(), legend.key.size = unit(4, "mm"), legend.title = element_blank(), axis.ticks = element_blank(), legend.justification = 0.5, legend.position = "top") } # labeling vectors opinion_labels <- c("Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree") question_labels <- c("Beyond the content", "Analyze errors", "Provide facts", "Develop writing", "Independent learning") # functions for renaming columns setnames_Item <- function(x) { setnames(x, old = "q_no", new = "Item", skip_absent = TRUE) } setnames_opinion_labels <- function(x) { setnames(x, old = c("str_disagree", "disagree", "neutral", "agree", "str_agree"), new = opinion_labels, skip_absent = TRUE) }
Data
The practice data in my example are from an engineering education article by Ashanthi Maxworth (2021), selected because the data are compact and the survey includes a Neutral option. The table from the original article is shown below. There were 31 respondents.
Survey data are most likely to be reported in one of three forms: summary percentages (as above), summary counts, or row-records. The likert()
function accepts any of these forms as input. The practice data, in all three forms, are available in the blog data directory as CSV files.
Summary counts
Read the prepared data file in summary count form.
R code
# read prepared data dt <- fread("data/case-study-2021-count.csv")
q_no | str_disagree | disagree | neutral | agree | str_agree |
---|---|---|---|---|---|
Q1 | 2 | 0 | 8 | 12 | 9 |
Q2 | 2 | 2 | 7 | 14 | 6 |
Q3 | 1 | 1 | 5 | 9 | 15 |
Q4 | 0 | 2 | 10 | 12 | 7 |
Q5 | 2 | 0 | 6 | 11 | 12 |
likert()
input
I rename the first column Item
for consistency with the likert()
function.
R code
# rename first column setnames_Item(dt) # examine the result dt[]
Item str_disagree disagree neutral agree str_agree <char> <int> <int> <int> <int> <int> 1: Q1 2 0 8 12 9 2: Q2 2 2 7 14 6 3: Q3 1 1 5 9 15 4: Q4 0 2 10 12 7 5: Q5 2 0 6 11 12
The likert()
function accepts input data frames having this structure. The salient characteristics are:
- one row per question
- first column is named
Item
and contains the question labels - remaining columns are named for the opinion levels in increasing order left to right
- column values are the counts of respondents choosing that option
- the sum of row counts is the number of respondents answering that question
likert()
output
To operate on this data frame, we assign it to the summary
argument of the likert()
function. The result is a list of various statistics about the Likert-style data. Note that the results
output preserves the data.table structure of the input.
R code
# create the likert list likert_list <- likert(summary = dt) # examine its structure str(likert_list)
List of 5 $ results :Classes 'data.table' and 'data.frame': 5 obs. of 6 variables: ..$ Item : chr [1:5] "Q1" "Q2" "Q3" "Q4" ... ..$ str_disagree: num [1:5] 6.45 6.45 3.23 0 6.45 ..$ disagree : num [1:5] 0 6.45 3.23 6.45 0 ..$ neutral : num [1:5] 25.8 22.6 16.1 32.3 19.4 ..$ agree : num [1:5] 38.7 45.2 29 38.7 35.5 ..$ str_agree : num [1:5] 29 19.4 48.4 22.6 38.7 ..- attr(*, ".internal.selfref")=<externalptr> $ items : NULL $ grouping: NULL $ nlevels : num 5 $ levels : chr [1:5] "str_disagree" "disagree" "neutral" "agree" ... - attr(*, "class")= chr "likert"
The components of the list are:
results
- Data frame. Percentage of responses by question, opinion level, and group.
items
- Data frame. Copy of original row-record input (NULL in this example).
grouping
- Copy of original grouping vector that subsets results (NULL in this example).
nlevels
- Integer. Number of opinion levels used in the calculations.
levels
- Character. Ordered vector of opinion level labels.
Basic chart
To use this list to create a chart, we assign it as the first argument of the plot()
function.
R code
# create the basic chart (default digits = 0 throws an error) plot(likert_list, digits = 1)
100% stacked bar chart
The same list can be used to create a 100% stacked-bar chart by assigning it as the first argument of likert_100_pct_bar()
—a function (defined at the top of the post) that wraps likert.plot
and sets the likert arguments and ggplot2 functions that produce my preferred design.
R code
# customize the chart likert_100_pct_bar(likert_list)
Legend key
The legend key is edited via the column names of likert_list$results
. Viewing its column names,
R code
names(likert_list$results)
[1] "Item" "str_disagree" "disagree" "neutral" "agree" [6] "str_agree"
Using a vector of opinion labels defined at the top of the post, I rename the opinion columns of the data frame.
R code
# recode the opinion options setnames_opinion_labels(likert_list$results) # examine the result str(likert_list)
List of 5 $ results :Classes 'data.table' and 'data.frame': 5 obs. of 6 variables: ..$ Item : chr [1:5] "Q1" "Q2" "Q3" "Q4" ... ..$ Strongly Disagree: num [1:5] 6.45 6.45 3.23 0 6.45 ..$ Disagree : num [1:5] 0 6.45 3.23 6.45 0 ..$ Neutral : num [1:5] 25.8 22.6 16.1 32.3 19.4 ..$ Agree : num [1:5] 38.7 45.2 29 38.7 35.5 ..$ Strongly Agree : num [1:5] 29 19.4 48.4 22.6 38.7 ..- attr(*, ".internal.selfref")=<externalptr> $ items : NULL $ grouping: NULL $ nlevels : num 5 $ levels : chr [1:5] "str_disagree" "disagree" "neutral" "agree" ... - attr(*, "class")= chr "likert"
The change can be seen in the structure above and in the revised figure.
R code
# create the chart likert_100_pct_bar(likert_list)
Question labels
The question labels are edited via the values in the Item
column of likert_list$results
. Viewing the first column in vector form,
R code
likert_list$results[["Item"]]
[1] "Q1" "Q2" "Q3" "Q4" "Q5"
Using a vector of question labels defined at the top of the post, I substitute them for the values in the original Item
column.
R code
# recode the question labels likert_list$results[, Item := question_labels] # examine the result str(likert_list)
List of 5 $ results :Classes 'data.table' and 'data.frame': 5 obs. of 6 variables: ..$ Item : chr [1:5] "Beyond the content" "Analyze errors" "Provide facts" "Develop writing" ... ..$ Strongly Disagree: num [1:5] 6.45 6.45 3.23 0 6.45 ..$ Disagree : num [1:5] 0 6.45 3.23 6.45 0 ..$ Neutral : num [1:5] 25.8 22.6 16.1 32.3 19.4 ..$ Agree : num [1:5] 38.7 45.2 29 38.7 35.5 ..$ Strongly Agree : num [1:5] 29 19.4 48.4 22.6 38.7 ..- attr(*, ".internal.selfref")=<externalptr> $ items : NULL $ grouping: NULL $ nlevels : num 5 $ levels : chr [1:5] "str_disagree" "disagree" "neutral" "agree" ... - attr(*, "class")= chr "likert"
Again, the change is seen in the structure above and in the revised figure.
R code
# create the chart likert_100_pct_bar(likert_list)
This approach is somewhat ad-hoc, but works as long as you are careful to write the substitutions in the correct order. If I were programming these steps, I would create additional tables (as in a database) and join the substitutions by clearly assigned key variables.
Or edit the labels first
Alternatively one can produce the same result by editing the opinion labels and question labels of the initial data frame before submitting it to likert()
. The row and column structure reflects the changes.
R code
# read prepared data dt <- fread("data/case-study-2021-count.csv") # rename columns setnames_Item(dt) setnames_opinion_labels(dt) # recode the question labels dt[, Item := question_labels] # examine the result dt[]
Item Strongly Disagree Disagree Neutral Agree Strongly Agree <char> <int> <int> <int> <int> <int> 1: Beyond the content 2 0 8 12 9 2: Analyze errors 2 2 7 14 6 3: Provide facts 1 1 5 9 15 4: Develop writing 0 2 10 12 7 5: Independent learning 2 0 6 11 12
The likert list that results is nearly identical to the previous version except the levels
vector uses the new opinion labels.
R code
# create the likert list likert_list <- likert(summary = dt) # examine the result str(likert_list)
List of 5 $ results :Classes 'data.table' and 'data.frame': 5 obs. of 6 variables: ..$ Item : chr [1:5] "Beyond the content" "Analyze errors" "Provide facts" "Develop writing" ... ..$ Strongly Disagree: num [1:5] 6.45 6.45 3.23 0 6.45 ..$ Disagree : num [1:5] 0 6.45 3.23 6.45 0 ..$ Neutral : num [1:5] 25.8 22.6 16.1 32.3 19.4 ..$ Agree : num [1:5] 38.7 45.2 29 38.7 35.5 ..$ Strongly Agree : num [1:5] 29 19.4 48.4 22.6 38.7 ..- attr(*, ".internal.selfref")=<externalptr> $ items : NULL $ grouping: NULL $ nlevels : num 5 $ levels : chr [1:5] "Strongly Disagree" "Disagree" "Neutral" "Agree" ... - attr(*, "class")= chr "likert"
R code
# create the chart likert_100_pct_bar(likert_list)
Summary percentages
Read the prepared data file in summary percentage form. The percentages are directly from the table in the source article. Like before, I rename the first column Item
for consistency with the likert()
function.
R code
# read prepared data dt <- fread("data/case-study-2021-percent.csv") # rename first column setnames_Item(dt)
Item | str_disagree | disagree | neutral | agree | str_agree |
---|---|---|---|---|---|
Q1 | 6.5 | 0.0 | 25.8 | 38.7 | 29.0 |
Q2 | 6.5 | 6.5 | 22.6 | 45.2 | 19.4 |
Q3 | 3.2 | 3.2 | 16.1 | 29.0 | 48.4 |
Q4 | 0.0 | 6.5 | 32.3 | 38.7 | 22.6 |
Q5 | 6.5 | 0.0 | 19.4 | 35.5 | 38.7 |
Option 1: Convert percentages to counts
This option is the most direct approach, assuming we know the number of respondents to each question. In this example we do (though this is not always the case). In this case study we have 31 respondents and all respondents replied to all the questions.
R code
# number of respondents in this example N_respondents <- 31 # identify the numeric columns sel_cols <- names(dt)[sapply(dt, is.numeric)] # convert percentages to integer counts dt[, c(sel_cols) := lapply(.SD, function(x) round(N_respondents * x/100, 0)), .SDcols = sel_cols]
Item | str_disagree | disagree | neutral | agree | str_agree |
---|---|---|---|---|---|
Q1 | 2 | 0 | 8 | 12 | 9 |
Q2 | 2 | 2 | 7 | 14 | 6 |
Q3 | 1 | 1 | 5 | 9 | 15 |
Q4 | 0 | 2 | 10 | 12 | 7 |
Q5 | 2 | 0 | 6 | 11 | 12 |
<>
This data structure is identical to the one we worked with in the previous section, so we know how to work with it.
Option 2: Use percentages as-is
This option might be necessary if we do not know the number of respondents replying to each question. Start by reading the data file and again rename the first column Item
for consistency with the likert() function.
R code
# read prepared data dt <- fread("data/case-study-2021-percent.csv") # rename first column setnames_Item(dt)
Item | str_disagree | disagree | neutral | agree | str_agree |
---|---|---|---|---|---|
Q1 | 6.5 | 0.0 | 25.8 | 38.7 | 29.0 |
Q2 | 6.5 | 6.5 | 22.6 | 45.2 | 19.4 |
Q3 | 3.2 | 3.2 | 16.1 | 29.0 | 48.4 |
Q4 | 0.0 | 6.5 | 32.3 | 38.7 | 22.6 |
Q5 | 6.5 | 0.0 | 19.4 | 35.5 | 38.7 |
With one row per question, the row percentages should sum to 100%. They do, but with an error due to rounding in the reported percentages.
R code
# check row totals of numeric columns sel_cols <- names(dt)[sapply(dt, is.numeric)] row_sum <- rowSums(dt[, .SD, .SDcols = sel_cols]) # examine result dt[, row_total := row_sum] dt[, rounding_error := row_sum - 100] dt[, .(Item, row_total, rounding_error)]
Item row_total rounding_error <char> <num> <num> 1: Q1 100.0 0.0 2: Q2 100.2 0.2 3: Q3 99.9 -0.1 4: Q4 100.1 0.1 5: Q5 100.1 0.1
If we ignore the rounding error, it can introduce small but noticeable errors in the bar lengths in the chart. A simple remediation is to subtract the small errors from the neutral columns so that all rows sum to 100% exactly. The adjusted Neutrals are shown below.
R code
# subtract error from neutral dt[, adjusted_neutral := neutral - rounding_error] # examine the result dt[, .(Item, neutral, rounding_error, adjusted_neutral)]
Item neutral rounding_error adjusted_neutral <char> <num> <num> <num> 1: Q1 25.8 0.0 25.8 2: Q2 22.6 0.2 22.4 3: Q3 16.1 -0.1 16.2 4: Q4 32.3 0.1 32.2 5: Q5 19.4 0.1 19.3
likert()
input
Replacing neutral with the adjusted neutral and deleting the temporary information columns yields the data structure I need for the summary percentage form:
R code
# adjust neutral dt[, neutral := adjusted_neutral] # delete temporary information columns dt[, c("row_total", "rounding_error", "adjusted_neutral") := NULL] # examine the result dt[]
Item str_disagree disagree neutral agree str_agree <char> <num> <num> <num> <num> <num> 1: Q1 6.5 0.0 25.8 38.7 29.0 2: Q2 6.5 6.5 22.4 45.2 19.4 3: Q3 3.2 3.2 16.2 29.0 48.4 4: Q4 0.0 6.5 32.2 38.7 22.6 5: Q5 6.5 0.0 19.3 35.5 38.7
Data structure:
- one row per question
- first column is named
Item
and contains the question labels - remaining columns are named for the opinion levels in increasing order left to right
- column values are the percentages of respondents choosing that option
- the sum of row percentages is exactly 100%
To prepare the data frame for graphing, I use the “edit the labels first” approach described earlier.
R code
# recode the opinion options setnames_opinion_labels(dt) # recode the question labels dt[, Item := question_labels] # examine the result dt[]
Item Strongly Disagree Disagree Neutral Agree Strongly Agree <char> <num> <num> <num> <num> <num> 1: Beyond the content 6.5 0.0 25.8 38.7 29.0 2: Analyze errors 6.5 6.5 22.4 45.2 19.4 3: Provide facts 3.2 3.2 16.2 29.0 48.4 4: Develop writing 0.0 6.5 32.2 38.7 22.6 5: Independent learning 6.5 0.0 19.3 35.5 38.7
likert()
output
To operate on this data frame, we again use the summary
argument of likert()
. The result is a list similar to that produced when we operated on summary counts and the same familiar chart.
R code
# create the likert list likert_list <- likert(summary = dt) # examine its structure str(likert_list)
List of 5 $ results :Classes 'data.table' and 'data.frame': 5 obs. of 6 variables: ..$ Item : chr [1:5] "Beyond the content" "Analyze errors" "Provide facts" "Develop writing" ... ..$ Strongly Disagree: num [1:5] 6.5 6.5 3.2 0 6.5 ..$ Disagree : num [1:5] 0 6.5 3.2 6.5 0 ..$ Neutral : num [1:5] 25.8 22.4 16.2 32.2 19.3 ..$ Agree : num [1:5] 38.7 45.2 29 38.7 35.5 ..$ Strongly Agree : num [1:5] 29 19.4 48.4 22.6 38.7 ..- attr(*, ".internal.selfref")=<externalptr> $ items : NULL $ grouping: NULL $ nlevels : num 5 $ levels : chr [1:5] "Strongly Disagree" "Disagree" "Neutral" "Agree" ... - attr(*, "class")= chr "likert"
R code
# 100% stacked bar chart likert_100_pct_bar(likert_list)
Row records
In row-record form, everything we want to know about an individual is in one row, that is, a row-record for that individual. Thus the number of rows equals the number of respondents.
I made up a practice data set in row-record form with 31 rows and 6 columns. These are fictitious data I designed specifically to have the same summary characteristics as the published summary data used earlier.
Read the prepared data file in row-record form and view the data frame.
R code
# read observed data dt <- fread("data/case-study-2021-row-record.csv") # examine the result dt[]
obs Q1 Q2 Q3 Q4 Q5 <int> <int> <int> <int> <int> <int> 1: 1 3 4 3 4 4 2: 2 5 1 5 3 5 3: 3 5 5 4 5 4 4: 4 3 4 5 4 5 5: 5 4 4 5 2 4 6: 6 4 3 5 3 4 --- 26: 26 5 5 5 4 5 27: 27 5 2 3 4 1 28: 28 3 4 5 3 5 29: 29 3 3 4 3 4 30: 30 4 4 5 3 1 31: 31 4 4 5 5 5
The first column is a fictitious respondent ID. The remaining columns represent responses to the survey questions. For basic charts like those shown here, all columns should be question responses, so I delete the ID. Though I don’t cover it here, additional non-question columns are allowed for grouping the results. See, for example, (Mudge 2019).
R code
# delete the ID column dt[, obs := NULL] # examine the result dt[]
Q1 Q2 Q3 Q4 Q5 <int> <int> <int> <int> <int> 1: 3 4 3 4 4 2: 5 1 5 3 5 3: 5 5 4 5 4 4: 3 4 5 4 5 5: 4 4 5 2 4 6: 4 3 5 3 4 --- 26: 5 5 5 4 5 27: 5 2 3 4 1 28: 3 4 5 3 5 29: 3 3 4 3 4 30: 4 4 5 3 1 31: 4 4 5 5 5
likert()
input
For the likert()
function to accept data in this form, all question response columns must be factors with identical sets of levels. Reformatting the columns and checking the structure yields,
R code
# reformat columns as factors sel_cols <- names(dt) dt[, c(sel_cols) := lapply(.SD, function(x) factor(x, levels = 1:5)), .SDcols = sel_cols] # examine the result dt[]
Q1 Q2 Q3 Q4 Q5 <fctr> <fctr> <fctr> <fctr> <fctr> 1: 3 4 3 4 4 2: 5 1 5 3 5 3: 5 5 4 5 4 4: 3 4 5 4 5 5: 4 4 5 2 4 6: 4 3 5 3 4 --- 26: 5 5 5 4 5 27: 5 2 3 4 1 28: 3 4 5 3 5 29: 3 3 4 3 4 30: 4 4 5 3 1 31: 4 4 5 5 5
Input data structure
- One row per respondent. The number of rows equals the number of respondents.
- One column per question. The column name is the question label. The number of columns equals the number of survey questions.
- Each column is a factor with an identical set of levels. The number of levels equals the number of answer options in the survey.
- Column values are the encoded opinions of the respondent: 1 (Strongly Disagree), 2 (Disagree), 3 (Neutral), etc.
likert()
output
To operate on a row-record data frame, we assign it to the items
argument of the likert()
function. The result is again a list.
However, unlike the previous output lists, the data.table structure of the input has not been preserved. I use data.table syntax in subsequent operations, so I convert both results
and items
to data.tables.
R code
# create likert list likert_list <- likert(items = dt) # convert output data frames to data.tables setDT(likert_list$results) setDT(likert_list$items) # examine the result str(likert_list)
List of 6 $ results :Classes 'data.table' and 'data.frame': 5 obs. of 6 variables: ..$ Item: chr [1:5] "Q1" "Q2" "Q3" "Q4" ... ..$ 1 : num [1:5] 6.45 6.45 3.23 0 6.45 ..$ 2 : num [1:5] 0 6.45 3.23 6.45 0 ..$ 3 : num [1:5] 25.8 22.6 16.1 32.3 19.4 ..$ 4 : num [1:5] 38.7 45.2 29 38.7 35.5 ..$ 5 : num [1:5] 29 19.4 48.4 22.6 38.7 ..- attr(*, ".internal.selfref")=<externalptr> $ items :Classes 'data.table' and 'data.frame': 31 obs. of 5 variables: ..$ Q1: Factor w/ 5 levels "1","2","3","4",..: 3 5 5 3 4 4 3 1 4 5 ... ..$ Q2: Factor w/ 5 levels "1","2","3","4",..: 4 1 5 4 4 3 2 5 5 4 ... ..$ Q3: Factor w/ 5 levels "1","2","3","4",..: 3 5 4 5 5 5 4 4 4 5 ... ..$ Q4: Factor w/ 5 levels "1","2","3","4",..: 4 3 5 4 2 3 4 3 5 5 ... ..$ Q5: Factor w/ 5 levels "1","2","3","4",..: 4 5 4 5 4 4 4 3 4 3 ... ..- attr(*, ".internal.selfref")=<externalptr> $ grouping: NULL $ factors : NULL $ nlevels : int 5 $ levels : chr [1:5] "1" "2" "3" "4" ... - attr(*, "class")= chr "likert"
The components of the list are:
results
- Data frame. Percentage of responses by question, opinion level, and group.
items
- Data frame. Copy of original row-record input.
grouping
- Copy of original grouping vector that subsets results (NULL in this example).
factors
- Copy of original vector matching columns to factors (NULL in this example).
nlevels
- Integer. Number of opinion levels used in the calculations.
levels
- Character. Ordered vector of opinion level labels.
Draft chart
With row-record data, the plot function requires both results
and items
from the output list. The chart is familiar, but the opinion labels are now the integers used to encode the survey results.
R code
likert_100_pct_bar(likert_list)
Legend key
As before, the legend key is edited via the column names of likert_list$results
. Note the corresponding changes in the likert list and chart.
R code
# recode the opinion options setnames(likert_list$results, old = as.character(1:5), new = opinion_labels, skip_absent = TRUE) # examine the result str(likert_list)
List of 6 $ results :Classes 'data.table' and 'data.frame': 5 obs. of 6 variables: ..$ Item : chr [1:5] "Q1" "Q2" "Q3" "Q4" ... ..$ Strongly Disagree: num [1:5] 6.45 6.45 3.23 0 6.45 ..$ Disagree : num [1:5] 0 6.45 3.23 6.45 0 ..$ Neutral : num [1:5] 25.8 22.6 16.1 32.3 19.4 ..$ Agree : num [1:5] 38.7 45.2 29 38.7 35.5 ..$ Strongly Agree : num [1:5] 29 19.4 48.4 22.6 38.7 ..- attr(*, ".internal.selfref")=<externalptr> $ items :Classes 'data.table' and 'data.frame': 31 obs. of 5 variables: ..$ Q1: Factor w/ 5 levels "1","2","3","4",..: 3 5 5 3 4 4 3 1 4 5 ... ..$ Q2: Factor w/ 5 levels "1","2","3","4",..: 4 1 5 4 4 3 2 5 5 4 ... ..$ Q3: Factor w/ 5 levels "1","2","3","4",..: 3 5 4 5 5 5 4 4 4 5 ... ..$ Q4: Factor w/ 5 levels "1","2","3","4",..: 4 3 5 4 2 3 4 3 5 5 ... ..$ Q5: Factor w/ 5 levels "1","2","3","4",..: 4 5 4 5 4 4 4 3 4 3 ... ..- attr(*, ".internal.selfref")=<externalptr> $ grouping: NULL $ factors : NULL $ nlevels : int 5 $ levels : chr [1:5] "1" "2" "3" "4" ... - attr(*, "class")= chr "likert"
R code
# create the chart likert_100_pct_bar(likert_list)
Question labels
With row-record data, both results
and items
data frames must be revised to edit the question labels. Note the corresponding changes in the likert list and chart.
R code
# recode Item column of $results likert_list$results[, Item := question_labels] # recode column names of $items setnames(likert_list$items, old = c("Q1", "Q2", "Q3", "Q4", "Q5"), new = question_labels, skip_absent = TRUE) # examine the result str(likert_list)
List of 6 $ results :Classes 'data.table' and 'data.frame': 5 obs. of 6 variables: ..$ Item : chr [1:5] "Beyond the content" "Analyze errors" "Provide facts" "Develop writing" ... ..$ Strongly Disagree: num [1:5] 6.45 6.45 3.23 0 6.45 ..$ Disagree : num [1:5] 0 6.45 3.23 6.45 0 ..$ Neutral : num [1:5] 25.8 22.6 16.1 32.3 19.4 ..$ Agree : num [1:5] 38.7 45.2 29 38.7 35.5 ..$ Strongly Agree : num [1:5] 29 19.4 48.4 22.6 38.7 ..- attr(*, ".internal.selfref")=<externalptr> $ items :Classes 'data.table' and 'data.frame': 31 obs. of 5 variables: ..$ Beyond the content : Factor w/ 5 levels "1","2","3","4",..: 3 5 5 3 4 4 3 1 4 5 ... ..$ Analyze errors : Factor w/ 5 levels "1","2","3","4",..: 4 1 5 4 4 3 2 5 5 4 ... ..$ Provide facts : Factor w/ 5 levels "1","2","3","4",..: 3 5 4 5 5 5 4 4 4 5 ... ..$ Develop writing : Factor w/ 5 levels "1","2","3","4",..: 4 3 5 4 2 3 4 3 5 5 ... ..$ Independent learning: Factor w/ 5 levels "1","2","3","4",..: 4 5 4 5 4 4 4 3 4 3 ... ..- attr(*, ".internal.selfref")=<externalptr> $ grouping: NULL $ factors : NULL $ nlevels : int 5 $ levels : chr [1:5] "1" "2" "3" "4" ... - attr(*, "class")= chr "likert"
R code
# create the chart likert_100_pct_bar(likert_list)
Or edit the labels first
As before, we have an alternative approach: one can produce the same result by editing the opinion labels and question labels of the data frame before submitting it to likert(). Question labels are substituted for the column names. Opinion levels (as text) are substituted for the encoded integers, i.e., 1 = Strongly Disagree through 5 = Strongly Agree.
To illustrate, I start with a fresh row-record data set.
R code
# read prepared data dt <- fread("data/case-study-2021-row-record.csv") # delete the ID column dt <- subset(dt, select = -c(obs)) # recode the question labels in the column names setnames(dt, old = c("Q1", "Q2", "Q3", "Q4", "Q5"), new = question_labels, skip_absent = TRUE) # recode integer values with opinion options sel_cols <- names(dt) dt[, (sel_cols) := lapply(.SD, function(x) fcase( x == 1, opinion_labels[1], x == 2, opinion_labels[2], x == 3, opinion_labels[3], x == 4, opinion_labels[4], x == 5, opinion_labels[5])), .SDcols = sel_cols] # convert columns to factors dt <- dt[, lapply(.SD, function(x) factor(x, levels = opinion_labels)), .SDcols = sel_cols] # examine the result dt[]
Beyond the content Analyze errors Provide facts Develop writing <fctr> <fctr> <fctr> <fctr> 1: Neutral Agree Neutral Agree 2: Strongly Agree Strongly Disagree Strongly Agree Neutral 3: Strongly Agree Strongly Agree Agree Strongly Agree 4: Neutral Agree Strongly Agree Agree 5: Agree Agree Strongly Agree Disagree 6: Agree Neutral Strongly Agree Neutral --- 26: Strongly Agree Strongly Agree Strongly Agree Agree 27: Strongly Agree Disagree Neutral Agree 28: Neutral Agree Strongly Agree Neutral 29: Neutral Neutral Agree Neutral 30: Agree Agree Strongly Agree Neutral 31: Agree Agree Strongly Agree Strongly Agree Independent learning <fctr> 1: Agree 2: Strongly Agree 3: Agree 4: Strongly Agree 5: Agree 6: Agree --- 26: Strongly Agree 27: Strongly Disagree 28: Strongly Agree 29: Agree 30: Strongly Disagree 31: Strongly Agree
Input to likert()
produces the familiar chart.
R code
# create the likert list likert_list <- likert(items = dt) # examine the result str(likert_list)
List of 6 $ results :'data.frame': 5 obs. of 6 variables: ..$ Item : chr [1:5] "Beyond the content" "Analyze errors" "Provide facts" "Develop writing" ... ..$ Strongly Disagree: num [1:5] 6.45 6.45 3.23 0 6.45 ..$ Disagree : num [1:5] 0 6.45 3.23 6.45 0 ..$ Neutral : num [1:5] 25.8 22.6 16.1 32.3 19.4 ..$ Agree : num [1:5] 38.7 45.2 29 38.7 35.5 ..$ Strongly Agree : num [1:5] 29 19.4 48.4 22.6 38.7 $ items :'data.frame': 31 obs. of 5 variables: ..$ Beyond the content : Factor w/ 5 levels "Strongly Disagree",..: 3 5 5 3 4 4 3 1 4 5 ... ..$ Analyze errors : Factor w/ 5 levels "Strongly Disagree",..: 4 1 5 4 4 3 2 5 5 4 ... ..$ Provide facts : Factor w/ 5 levels "Strongly Disagree",..: 3 5 4 5 5 5 4 4 4 5 ... ..$ Develop writing : Factor w/ 5 levels "Strongly Disagree",..: 4 3 5 4 2 3 4 3 5 5 ... ..$ Independent learning: Factor w/ 5 levels "Strongly Disagree",..: 4 5 4 5 4 4 4 3 4 3 ... $ grouping: NULL $ factors : NULL $ nlevels : int 5 $ levels : chr [1:5] "Strongly Disagree" "Disagree" "Neutral" "Agree" ... - attr(*, "class")= chr "likert"
R code
# create the chart likert_100_pct_bar(likert_list)
Data table
The results
component can also be used to construct a summary data table.
R code
likert_list$results
Item Strongly Disagree Disagree Neutral Agree 1 Beyond the content 6.451613 0.000000 25.80645 38.70968 2 Analyze errors 6.451613 6.451613 22.58065 45.16129 3 Provide facts 3.225806 3.225806 16.12903 29.03226 4 Develop writing 0.000000 6.451613 32.25806 38.70968 5 Independent learning 6.451613 0.000000 19.35484 35.48387 Strongly Agree 1 29.03226 2 19.35484 3 48.38710 4 22.58065 5 38.70968
Rounding the digits, we produce a publication-ready table. I’m assuming the abbreviated question labels are OK—if not, each could be replaced with its complete assertion. In this form, the rows of the table are in the same order as the rows of the chart—a structure that could be useful to the reader.
Item | Strongly Disagree | Disagree | Neutral | Agree | Strongly Agree |
---|---|---|---|---|---|
Beyond the content | 6.5 | 0.0 | 25.8 | 38.7 | 29.0 |
Analyze errors | 6.5 | 6.5 | 22.6 | 45.2 | 19.4 |
Provide facts | 3.2 | 3.2 | 16.1 | 29.0 | 48.4 |
Develop writing | 0.0 | 6.5 | 32.3 | 38.7 | 22.6 |
Independent learning | 6.5 | 0.0 | 19.4 | 35.5 | 38.7 |
The values in this table were computed by likert from the fictitious row-record data. The numbers agree with the source data table.
Additional software credits
likert
for manipulating and plotting Likert-style data
References
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.