Site icon R-bloggers

O is for Overview Reports

[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
O is for overview Reports with dataMaid One of the best things you can do is to create a study codebook to accompany your dataset. In it, you should include information about the study variables and how they were created/computed. It’s also nice to have a summary of your dataset, all in one place, so you can quickly check for any issues and begin cleaning your data, and/or plan your analysis. But this can be a rather tedious process of creating and formatting said report, and running the various descriptive statistics and plots. But what if an R package could do much of that for you?

dataMaid to the rescue! In addition to having some great functions to help streamline data cleaning, dataMaid can create an overview report of your dataset, containing the information you request, and generate an R Markdown file to which you could add descriptive information, like functions used to calculate variables, item text, and so on.

For today’s post, I’ll use the simulated Facebook dataset I created and shared. You can replicate the exact results I get if you use that dataset. After importing the file, I want to make certain all variables are of the correct type. If necessary, I can make some changes. This becomes important when we go to generate our report. I also want to score all my scales, so I have total and subscale scores in the file.

Facebook<-read.delim(file="simulated_facebook_set.txt", header=TRUE)
str(Facebook)

## 'data.frame': 257 obs. of  111 variables:
##  $ ID            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender        : int  1 1 1 0 1 0 1 0 1 1 ...
##  $ Rum1          : int  3 2 3 2 2 2 0 4 1 0 ...
##  $ Rum2          : int  1 2 4 2 4 1 1 2 2 2 ...
##  $ Rum3          : int  3 3 2 2 2 2 0 1 2 2 ...
##  $ Rum4          : int  1 2 4 0 2 4 1 0 2 2 ...
##  $ Rum5          : int  3 1 2 2 1 3 1 0 0 2 ...
##  $ Rum6          : int  2 3 4 3 2 2 1 1 3 2 ...
##  $ Rum7          : int  3 1 4 3 0 3 3 4 3 2 ...
##  $ Rum8          : int  1 2 4 1 0 1 1 1 3 3 ...
##  $ Rum9          : int  3 0 2 0 1 0 3 2 2 0 ...
##  $ Rum10         : int  1 1 2 2 2 2 2 0 1 1 ...
##  $ Rum11         : int  1 0 0 3 1 2 3 0 4 3 ...
##  $ Rum12         : int  0 2 2 1 0 1 0 2 2 0 ...
##  $ Rum13         : int  4 2 3 3 3 2 1 1 2 2 ...
##  $ Rum14         : int  0 1 3 1 2 2 2 2 4 2 ...
##  $ Rum15         : int  2 2 1 2 2 2 2 1 3 0 ...
##  $ Rum16         : int  2 4 4 0 1 2 0 1 2 4 ...
##  $ Rum17         : int  1 2 2 2 1 3 2 1 2 3 ...
##  $ Rum18         : int  2 2 4 1 2 2 2 1 1 1 ...
##  $ Rum19         : int  0 2 2 1 2 4 2 2 1 0 ...
##  $ Rum20         : int  1 1 2 2 1 1 1 2 4 2 ...
##  $ Rum21         : int  2 2 1 1 1 1 1 3 4 0 ...
##  $ Rum22         : int  2 1 2 2 1 2 0 1 1 1 ...
##  $ Sav1          : int  5 6 7 4 5 6 6 7 6 7 ...
##  $ Sav2          : int  3 2 6 2 2 6 3 6 3 2 ...
##  $ Sav3          : int  7 7 7 6 7 5 6 6 7 7 ...
##  $ Sav4          : int  4 5 5 4 3 5 1 5 2 5 ...
##  $ Sav5          : int  7 6 7 7 5 6 6 6 4 6 ...
##  $ Sav6          : int  4 0 6 4 2 2 3 4 6 4 ...
##  $ Sav7          : int  3 6 5 6 7 7 7 7 7 6 ...
##  $ Sav8          : int  2 2 3 3 2 4 3 3 3 5 ...
##  $ Sav9          : int  6 4 6 6 6 6 6 7 6 5 ...
##  $ Sav10         : int  2 4 1 2 1 3 2 5 1 1 ...
##  $ Sav11         : int  3 3 6 2 6 6 4 1 3 4 ...
##  $ Sav12         : int  0 3 3 3 4 4 3 4 5 3 ...
##  $ Sav13         : int  3 7 7 4 4 3 5 5 7 4 ...
##  $ Sav14         : int  2 2 5 0 3 2 2 2 3 2 ...
##  $ Sav15         : int  5 6 5 5 4 7 4 6 7 7 ...
##  $ Sav16         : int  3 2 2 6 2 3 1 3 2 2 ...
##  $ Sav17         : int  6 3 6 6 5 4 6 6 6 5 ...
##  $ Sav18         : int  2 2 2 3 2 6 3 2 1 2 ...
##  $ Sav19         : int  6 7 6 6 6 7 2 4 6 4 ...
##  $ Sav20         : int  3 2 3 4 6 6 6 3 3 7 ...
##  $ Sav21         : int  6 3 3 6 4 7 6 6 7 4 ...
##  $ Sav22         : int  1 4 2 2 2 2 2 2 3 3 ...
##  $ Sav23         : int  7 7 6 4 6 7 6 4 6 5 ...
##  $ Sav24         : int  2 1 5 1 1 1 3 1 2 2 ...
##  $ LS1           : int  3 5 6 4 7 4 4 6 6 7 ...
##  $ LS2           : int  6 2 6 5 7 5 6 5 6 4 ...
##  $ LS3           : int  7 4 6 4 3 6 5 2 6 6 ...
##  $ LS4           : int  2 6 6 3 6 6 5 6 7 5 ...
##  $ LS5           : int  3 6 7 5 4 4 4 1 4 2 ...
##  $ Extraverted   : int  5 4 6 6 3 3 7 4 4 6 ...
##  $ Critical      : int  5 5 5 3 4 5 1 6 6 5 ...
##  $ Dependable    : int  6 6 6 5 6 6 6 6 7 7 ...
##  $ Anxious       : int  6 6 6 4 6 6 5 6 5 4 ...
##  $ NewExperiences: int  7 6 6 6 6 6 7 6 3 6 ...
##  $ Reserved      : int  3 4 6 5 5 5 3 4 7 2 ...
##  $ Sympathetic   : int  7 6 7 6 5 6 3 7 6 6 ...
##  $ Disorganized  : int  6 5 6 5 5 5 3 4 7 5 ...
##  $ Calm          : int  6 7 6 5 6 7 6 3 5 6 ...
##  $ Conventional  : int  3 4 2 3 2 2 2 2 3 3 ...
##  $ Health1       : int  1 1 4 2 1 3 2 3 3 0 ...
##  $ Health2       : int  2 1 1 0 2 1 0 2 2 1 ...
##  $ Health3       : int  2 1 2 0 1 2 3 2 2 1 ...
##  $ Health4       : int  0 2 0 0 1 0 0 1 0 0 ...
##  $ Health5       : int  1 1 1 0 1 1 1 0 3 0 ...
##  $ Health6       : int  0 0 2 0 0 0 1 0 1 0 ...
##  $ Health7       : int  0 0 3 2 0 1 1 2 1 0 ...
##  $ Health8       : int  2 3 4 2 1 1 1 0 2 2 ...
##  $ Health9       : int  2 3 3 1 2 4 4 2 2 3 ...
##  $ Health10      : int  0 0 1 1 0 1 1 1 0 0 ...
##  $ Health11      : int  0 1 2 1 2 0 0 0 2 0 ...
##  $ Health12      : int  0 2 1 2 0 1 1 0 0 0 ...
##  $ Health13      : int  0 2 2 0 2 3 1 1 0 1 ...
##  $ Health14      : int  2 2 3 0 0 2 1 2 2 0 ...
##  $ Health15      : int  2 1 1 0 1 0 0 0 1 0 ...
##  $ Health16      : int  1 1 3 0 2 3 2 1 0 0 ...
##  $ Health17      : int  0 0 0 2 2 3 0 2 1 0 ...
##  $ Health18      : int  1 4 1 1 0 0 0 1 2 0 ...
##  $ Health19      : int  0 2 2 0 0 1 0 0 2 1 ...
##  $ Health20      : int  2 1 2 0 1 1 0 0 0 0 ...
##  $ Health21      : int  1 0 1 1 0 2 0 1 1 1 ...
##  $ Health22      : int  3 1 2 0 2 4 2 2 0 2 ...
##  $ Health23      : int  1 0 3 2 2 0 2 3 2 2 ...
##  $ Health24      : int  0 0 1 1 1 0 0 2 1 0 ...
##  $ Health25      : int  0 3 1 2 2 0 2 0 1 0 ...
##  $ Health26      : int  1 0 0 0 0 0 2 1 1 2 ...
##  $ Health27      : int  2 1 0 1 1 0 0 1 0 1 ...
##  $ Health28      : int  0 3 2 0 1 3 0 2 3 2 ...
##  $ Health29      : int  1 2 1 0 1 1 2 1 1 2 ...
##  $ Health30      : int  0 0 0 2 0 0 0 0 0 0 ...
##  $ Health31      : int  1 0 0 0 0 2 1 0 0 0 ...
##  $ Health32      : int  2 1 2 1 2 2 2 1 2 0 ...
##  $ Dep1          : int  0 0 2 0 1 1 1 1 0 0 ...
##  $ Dep2          : int  0 1 0 0 0 0 0 0 1 2 ...
##  $ Dep3          : int  0 1 0 0 0 0 1 0 2 2 ...
##  $ Dep4          : int  1 0 1 1 0 0 0 1 1 0 ...
##   [list output truncated]

Facebook$ID<-as.character(Facebook$ID)
Facebook$gender<-factor(Facebook$gender, labels=c("Male","Female"))
Rumination<-Facebook[,3:24]
Savoring<-Facebook[,25:48]
SatwithLife<-Facebook[,49:53]
CHIPS<-Facebook[,64:95]
CESD<-Facebook[,96:111]
Facebook$RRS<-rowSums(Facebook[,3:24])
Facebook$RRS_D<-rowSums(Facebook[,c(3,4,5,6,8,10,11,16,19,20,21,24)])
Facebook$RRS_R<-rowSums(Facebook[,c(9,13,14,22,23)])
Facebook$RRS_B<-rowSums(Facebook[,c(7,12,15,17,18)])
reverse<-function(max,min,x) {
  y<-(max+min)-x
  return(y)
  }
Facebook$Sav2R<-reverse(7,1,Facebook$Sav2)
Facebook$Sav4R<-reverse(7,1,Facebook$Sav4)
Facebook$Sav6R<-reverse(7,1,Facebook$Sav6)
Facebook$Sav8R<-reverse(7,1,Facebook$Sav8)
Facebook$Sav10R<-reverse(7,1,Facebook$Sav10)
Facebook$Sav12R<-reverse(7,1,Facebook$Sav12)
Facebook$Sav14R<-reverse(7,1,Facebook$Sav14)
Facebook$Sav16R<-reverse(7,1,Facebook$Sav16)
Facebook$Sav18R<-reverse(7,1,Facebook$Sav18)
Facebook$Sav20R<-reverse(7,1,Facebook$Sav20)
Facebook$Sav22R<-reverse(7,1,Facebook$Sav22)
Facebook$Sav24R<-reverse(7,1,Facebook$Sav24)
Facebook$SBI<-Facebook$Sav2R+Facebook$Sav4R+Facebook$Sav6R+
  Facebook$Sav8R+Facebook$Sav10R+Facebook$Sav12R+Facebook$Sav14R+
  Facebook$Sav16R+Facebook$Sav18R+Facebook$Sav20R+Facebook$Sav22R+
  Facebook$Sav24R+Facebook$Sav1+Facebook$Sav3+Facebook$Sav5+
  Facebook$Sav7+Facebook$Sav9+Facebook$Sav11+Facebook$Sav13+Facebook$Sav15+
  Facebook$Sav17+Facebook$Sav19+Facebook$Sav21+Facebook$Sav23
Facebook$SavPos<-Facebook$Sav2R+Facebook$Sav4R+Facebook$Sav6R+
  Facebook$Sav8R+Facebook$Sav10R+Facebook$Sav12R+Facebook$Sav14R+
  Facebook$Sav16R+Facebook$Sav18R+Facebook$Sav20R+Facebook$Sav22R+
  Facebook$Sav24R
Facebook$SavNeg<-Facebook$Sav1+Facebook$Sav3+Facebook$Sav5+
  Facebook$Sav7+Facebook$Sav9+Facebook$Sav11+Facebook$Sav13+Facebook$Sav15+
  Facebook$Sav17+Facebook$Sav19+Facebook$Sav21+Facebook$Sav23
Facebook$Anticipating<-Facebook$Sav1+Facebook$Sav4R+Facebook$Sav7+
  Facebook$Sav10R+Facebook$Sav13+Facebook$Sav16R+Facebook$Sav19+Facebook$Sav22R
Facebook$Moment<-Facebook$Sav2R+Facebook$Sav5+Facebook$Sav8R+
  Facebook$Sav11+Facebook$Sav14R+Facebook$Sav17+Facebook$Sav20R+Facebook$Sav23
Facebook$Reminiscing<-Facebook$Sav3+Facebook$Sav6R+Facebook$Sav9+
  Facebook$Sav12R+Facebook$Sav15+Facebook$Sav18R+Facebook$Sav21+Facebook$Sav24R
Facebook$SWLS<-rowSums(SatwithLife)
Facebook$CritR<-reverse(7,1,Facebook$Critical)
Facebook$AnxR<-reverse(7,1,Facebook$Anxious)
Facebook$ResR<-reverse(7,1,Facebook$Reserved)
Facebook$DisR<-reverse(7,1,Facebook$Disorganized)
Facebook$ConvR<-reverse(7,1,Facebook$Conventional)
Facebook$Extraversion<-(Facebook$Extraverted+Facebook$ResR)/2
Facebook$Agree<-(Facebook$CritR+Facebook$Sympathetic)/2
Facebook$Consc<-(Facebook$Dependable+Facebook$DisR)/2
Facebook$EmoSt<-(Facebook$AnxR+Facebook$Calm)/2
Facebook$Openness<-(Facebook$NewExperiences+Facebook$ConvR)/2
Facebook$Health<-rowSums(CHIPS)
Facebook$Dep4R<-reverse(3,0,Facebook$Dep4)
Facebook$Dep8R<-reverse(3,0,Facebook$Dep8)
Facebook$Dep12R<-reverse(3,0,Facebook$Dep12)
Facebook$CESD<-Facebook$Dep1+Facebook$Dep2+Facebook$Dep3+Facebook$Dep4R+Facebook$Dep5+
  Facebook$Dep6+Facebook$Dep7+Facebook$Dep8R+Facebook$Dep9+Facebook$Dep10+Facebook$Dep11+
  Facebook$Dep12R+Facebook$Dep13+Facebook$Dep14+Facebook$Dep15+Facebook$Dep16
library(dataMaid)

## Loading required package: ggplot2

To generate an overview report, we’ll use the function, makeDataReport, which gives you a great overview of your dataset, with summary statistics, some analysis to assist with data cleaning, and customization options. And you can very easily add codebook information to your data report once it’s in R Markdown format.

The makeDataReport function has many arguments, which you can read all about here. Today, we’ll focus on some of the key arguments in the function. The very first argument is the dataset you want to perform the function on, in this case Facebook. You could run the function with only this argument if you wanted, and this might be fine if you have a small dataset and want to take a quick look.

Next, define what kind of file you want the function to output, “pdf”, “word”, or “html”. The default, NULL, produces one kind of output based on three checks: 1) Is there a LaTeX installation available? If yes, PDF. If no: 2) Does the computer have Windows? If yes, Word. If no:, 3) HTML if the first two checks produces no’s. After that is render, meaning R will render whatever output file you select, and save that file. If you plan on making changes to the R Markdown file after it is created, you want to set this to FALSE. The .Rmd file will automatically open, for you to edit, and will be saved in your working directory.

You can also specify file name, and/or a volume number (for updated reports) with vol=#. If you’ve already generated a report and try to create a new one without specifying a new file name, you’ll get an error; you can override that and force dataMaid to save over the existing file with replace=TRUE.

Putting some of these arguments together will generate a report for that includes all variables in the dataset, with default summaries and visualization.

makeDataReport(Facebook, output="html", render=FALSE, file="report_Facebook.Rmd")

You can take a look at the example report I generated here. In fact, by running this report, I noticed some problems with the simulated dataset. Instead of using R to generate it for me, I used Winsteps (the Rasch program I use). I didn’t notice until now that some of the items have values out of range for the rating scale used, such as Rumination item scores greater than 3. Thanks to this report, I identified (and fixed) these problems, then updated the simulated dataset available here. (This the same link used in previous posts; regardless of where you’re clicking from, this will take you to the updated file.) I also noticed that I miscoded gender when creating that factor, and switched Male and Female. So I can go back up to that code and fix that as well.

But a report with every single variable might be more than I actually I want. I may want to specify only certain variables, such as those that appear problematic based on the built-in cleaning analysis dataMaid does. I can set that with onlyProblematic=TRUE:

makeDataReport(Facebook, output="html", render=FALSE, file="report_Facebook_sum.Rmd",
               onlyProblematic = TRUE)

You can check out that report here.

I always used “render=FALSE” so I could make changes to the report before knitting it to an HTML document. Since this opens an R Markdown file, I could add whatever information I wanted to the report, just by clicking to that point in the document and adding the information I want. For instance, I could add a note on the reverse function I used to reverse-code variables. I could include code for how I defined the gender factor – just to make certain I don’t do it wrong again! You can add code to only display (rather than run) by putting ‘ on either side of it. I generated one last data report, using the cleaned data, with gender factor correctly coded, and added some codebook details. You can take a look at that document here.

To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.