[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
dataMaid to the rescue! In addition to having some great functions to help streamline data cleaning, dataMaid can create an overview report of your dataset, containing the information you request, and generate an R Markdown file to which you could add descriptive information, like functions used to calculate variables, item text, and so on.
For today’s post, I’ll use the simulated Facebook dataset I created and shared. You can replicate the exact results I get if you use that dataset. After importing the file, I want to make certain all variables are of the correct type. If necessary, I can make some changes. This becomes important when we go to generate our report. I also want to score all my scales, so I have total and subscale scores in the file.
Facebook<-read.delim(file="simulated_facebook_set.txt", header=TRUE) str(Facebook) ## 'data.frame': 257 obs. of 111 variables: ## $ ID : int 1 2 3 4 5 6 7 8 9 10 ... ## $ gender : int 1 1 1 0 1 0 1 0 1 1 ... ## $ Rum1 : int 3 2 3 2 2 2 0 4 1 0 ... ## $ Rum2 : int 1 2 4 2 4 1 1 2 2 2 ... ## $ Rum3 : int 3 3 2 2 2 2 0 1 2 2 ... ## $ Rum4 : int 1 2 4 0 2 4 1 0 2 2 ... ## $ Rum5 : int 3 1 2 2 1 3 1 0 0 2 ... ## $ Rum6 : int 2 3 4 3 2 2 1 1 3 2 ... ## $ Rum7 : int 3 1 4 3 0 3 3 4 3 2 ... ## $ Rum8 : int 1 2 4 1 0 1 1 1 3 3 ... ## $ Rum9 : int 3 0 2 0 1 0 3 2 2 0 ... ## $ Rum10 : int 1 1 2 2 2 2 2 0 1 1 ... ## $ Rum11 : int 1 0 0 3 1 2 3 0 4 3 ... ## $ Rum12 : int 0 2 2 1 0 1 0 2 2 0 ... ## $ Rum13 : int 4 2 3 3 3 2 1 1 2 2 ... ## $ Rum14 : int 0 1 3 1 2 2 2 2 4 2 ... ## $ Rum15 : int 2 2 1 2 2 2 2 1 3 0 ... ## $ Rum16 : int 2 4 4 0 1 2 0 1 2 4 ... ## $ Rum17 : int 1 2 2 2 1 3 2 1 2 3 ... ## $ Rum18 : int 2 2 4 1 2 2 2 1 1 1 ... ## $ Rum19 : int 0 2 2 1 2 4 2 2 1 0 ... ## $ Rum20 : int 1 1 2 2 1 1 1 2 4 2 ... ## $ Rum21 : int 2 2 1 1 1 1 1 3 4 0 ... ## $ Rum22 : int 2 1 2 2 1 2 0 1 1 1 ... ## $ Sav1 : int 5 6 7 4 5 6 6 7 6 7 ... ## $ Sav2 : int 3 2 6 2 2 6 3 6 3 2 ... ## $ Sav3 : int 7 7 7 6 7 5 6 6 7 7 ... ## $ Sav4 : int 4 5 5 4 3 5 1 5 2 5 ... ## $ Sav5 : int 7 6 7 7 5 6 6 6 4 6 ... ## $ Sav6 : int 4 0 6 4 2 2 3 4 6 4 ... ## $ Sav7 : int 3 6 5 6 7 7 7 7 7 6 ... ## $ Sav8 : int 2 2 3 3 2 4 3 3 3 5 ... ## $ Sav9 : int 6 4 6 6 6 6 6 7 6 5 ... ## $ Sav10 : int 2 4 1 2 1 3 2 5 1 1 ... ## $ Sav11 : int 3 3 6 2 6 6 4 1 3 4 ... ## $ Sav12 : int 0 3 3 3 4 4 3 4 5 3 ... ## $ Sav13 : int 3 7 7 4 4 3 5 5 7 4 ... ## $ Sav14 : int 2 2 5 0 3 2 2 2 3 2 ... ## $ Sav15 : int 5 6 5 5 4 7 4 6 7 7 ... ## $ Sav16 : int 3 2 2 6 2 3 1 3 2 2 ... ## $ Sav17 : int 6 3 6 6 5 4 6 6 6 5 ... ## $ Sav18 : int 2 2 2 3 2 6 3 2 1 2 ... ## $ Sav19 : int 6 7 6 6 6 7 2 4 6 4 ... ## $ Sav20 : int 3 2 3 4 6 6 6 3 3 7 ... ## $ Sav21 : int 6 3 3 6 4 7 6 6 7 4 ... ## $ Sav22 : int 1 4 2 2 2 2 2 2 3 3 ... ## $ Sav23 : int 7 7 6 4 6 7 6 4 6 5 ... ## $ Sav24 : int 2 1 5 1 1 1 3 1 2 2 ... ## $ LS1 : int 3 5 6 4 7 4 4 6 6 7 ... ## $ LS2 : int 6 2 6 5 7 5 6 5 6 4 ... ## $ LS3 : int 7 4 6 4 3 6 5 2 6 6 ... ## $ LS4 : int 2 6 6 3 6 6 5 6 7 5 ... ## $ LS5 : int 3 6 7 5 4 4 4 1 4 2 ... ## $ Extraverted : int 5 4 6 6 3 3 7 4 4 6 ... ## $ Critical : int 5 5 5 3 4 5 1 6 6 5 ... ## $ Dependable : int 6 6 6 5 6 6 6 6 7 7 ... ## $ Anxious : int 6 6 6 4 6 6 5 6 5 4 ... ## $ NewExperiences: int 7 6 6 6 6 6 7 6 3 6 ... ## $ Reserved : int 3 4 6 5 5 5 3 4 7 2 ... ## $ Sympathetic : int 7 6 7 6 5 6 3 7 6 6 ... ## $ Disorganized : int 6 5 6 5 5 5 3 4 7 5 ... ## $ Calm : int 6 7 6 5 6 7 6 3 5 6 ... ## $ Conventional : int 3 4 2 3 2 2 2 2 3 3 ... ## $ Health1 : int 1 1 4 2 1 3 2 3 3 0 ... ## $ Health2 : int 2 1 1 0 2 1 0 2 2 1 ... ## $ Health3 : int 2 1 2 0 1 2 3 2 2 1 ... ## $ Health4 : int 0 2 0 0 1 0 0 1 0 0 ... ## $ Health5 : int 1 1 1 0 1 1 1 0 3 0 ... ## $ Health6 : int 0 0 2 0 0 0 1 0 1 0 ... ## $ Health7 : int 0 0 3 2 0 1 1 2 1 0 ... ## $ Health8 : int 2 3 4 2 1 1 1 0 2 2 ... ## $ Health9 : int 2 3 3 1 2 4 4 2 2 3 ... ## $ Health10 : int 0 0 1 1 0 1 1 1 0 0 ... ## $ Health11 : int 0 1 2 1 2 0 0 0 2 0 ... ## $ Health12 : int 0 2 1 2 0 1 1 0 0 0 ... ## $ Health13 : int 0 2 2 0 2 3 1 1 0 1 ... ## $ Health14 : int 2 2 3 0 0 2 1 2 2 0 ... ## $ Health15 : int 2 1 1 0 1 0 0 0 1 0 ... ## $ Health16 : int 1 1 3 0 2 3 2 1 0 0 ... ## $ Health17 : int 0 0 0 2 2 3 0 2 1 0 ... ## $ Health18 : int 1 4 1 1 0 0 0 1 2 0 ... ## $ Health19 : int 0 2 2 0 0 1 0 0 2 1 ... ## $ Health20 : int 2 1 2 0 1 1 0 0 0 0 ... ## $ Health21 : int 1 0 1 1 0 2 0 1 1 1 ... ## $ Health22 : int 3 1 2 0 2 4 2 2 0 2 ... ## $ Health23 : int 1 0 3 2 2 0 2 3 2 2 ... ## $ Health24 : int 0 0 1 1 1 0 0 2 1 0 ... ## $ Health25 : int 0 3 1 2 2 0 2 0 1 0 ... ## $ Health26 : int 1 0 0 0 0 0 2 1 1 2 ... ## $ Health27 : int 2 1 0 1 1 0 0 1 0 1 ... ## $ Health28 : int 0 3 2 0 1 3 0 2 3 2 ... ## $ Health29 : int 1 2 1 0 1 1 2 1 1 2 ... ## $ Health30 : int 0 0 0 2 0 0 0 0 0 0 ... ## $ Health31 : int 1 0 0 0 0 2 1 0 0 0 ... ## $ Health32 : int 2 1 2 1 2 2 2 1 2 0 ... ## $ Dep1 : int 0 0 2 0 1 1 1 1 0 0 ... ## $ Dep2 : int 0 1 0 0 0 0 0 0 1 2 ... ## $ Dep3 : int 0 1 0 0 0 0 1 0 2 2 ... ## $ Dep4 : int 1 0 1 1 0 0 0 1 1 0 ... ## [list output truncated] Facebook$ID<-as.character(Facebook$ID) Facebook$gender<-factor(Facebook$gender, labels=c("Male","Female")) Rumination<-Facebook[,3:24] Savoring<-Facebook[,25:48] SatwithLife<-Facebook[,49:53] CHIPS<-Facebook[,64:95] CESD<-Facebook[,96:111] Facebook$RRS<-rowSums(Facebook[,3:24]) Facebook$RRS_D<-rowSums(Facebook[,c(3,4,5,6,8,10,11,16,19,20,21,24)]) Facebook$RRS_R<-rowSums(Facebook[,c(9,13,14,22,23)]) Facebook$RRS_B<-rowSums(Facebook[,c(7,12,15,17,18)]) reverse<-function(max,min,x) { y<-(max+min)-x return(y) } Facebook$Sav2R<-reverse(7,1,Facebook$Sav2) Facebook$Sav4R<-reverse(7,1,Facebook$Sav4) Facebook$Sav6R<-reverse(7,1,Facebook$Sav6) Facebook$Sav8R<-reverse(7,1,Facebook$Sav8) Facebook$Sav10R<-reverse(7,1,Facebook$Sav10) Facebook$Sav12R<-reverse(7,1,Facebook$Sav12) Facebook$Sav14R<-reverse(7,1,Facebook$Sav14) Facebook$Sav16R<-reverse(7,1,Facebook$Sav16) Facebook$Sav18R<-reverse(7,1,Facebook$Sav18) Facebook$Sav20R<-reverse(7,1,Facebook$Sav20) Facebook$Sav22R<-reverse(7,1,Facebook$Sav22) Facebook$Sav24R<-reverse(7,1,Facebook$Sav24) Facebook$SBI<-Facebook$Sav2R+Facebook$Sav4R+Facebook$Sav6R+ Facebook$Sav8R+Facebook$Sav10R+Facebook$Sav12R+Facebook$Sav14R+ Facebook$Sav16R+Facebook$Sav18R+Facebook$Sav20R+Facebook$Sav22R+ Facebook$Sav24R+Facebook$Sav1+Facebook$Sav3+Facebook$Sav5+ Facebook$Sav7+Facebook$Sav9+Facebook$Sav11+Facebook$Sav13+Facebook$Sav15+ Facebook$Sav17+Facebook$Sav19+Facebook$Sav21+Facebook$Sav23 Facebook$SavPos<-Facebook$Sav2R+Facebook$Sav4R+Facebook$Sav6R+ Facebook$Sav8R+Facebook$Sav10R+Facebook$Sav12R+Facebook$Sav14R+ Facebook$Sav16R+Facebook$Sav18R+Facebook$Sav20R+Facebook$Sav22R+ Facebook$Sav24R Facebook$SavNeg<-Facebook$Sav1+Facebook$Sav3+Facebook$Sav5+ Facebook$Sav7+Facebook$Sav9+Facebook$Sav11+Facebook$Sav13+Facebook$Sav15+ Facebook$Sav17+Facebook$Sav19+Facebook$Sav21+Facebook$Sav23 Facebook$Anticipating<-Facebook$Sav1+Facebook$Sav4R+Facebook$Sav7+ Facebook$Sav10R+Facebook$Sav13+Facebook$Sav16R+Facebook$Sav19+Facebook$Sav22R Facebook$Moment<-Facebook$Sav2R+Facebook$Sav5+Facebook$Sav8R+ Facebook$Sav11+Facebook$Sav14R+Facebook$Sav17+Facebook$Sav20R+Facebook$Sav23 Facebook$Reminiscing<-Facebook$Sav3+Facebook$Sav6R+Facebook$Sav9+ Facebook$Sav12R+Facebook$Sav15+Facebook$Sav18R+Facebook$Sav21+Facebook$Sav24R Facebook$SWLS<-rowSums(SatwithLife) Facebook$CritR<-reverse(7,1,Facebook$Critical) Facebook$AnxR<-reverse(7,1,Facebook$Anxious) Facebook$ResR<-reverse(7,1,Facebook$Reserved) Facebook$DisR<-reverse(7,1,Facebook$Disorganized) Facebook$ConvR<-reverse(7,1,Facebook$Conventional) Facebook$Extraversion<-(Facebook$Extraverted+Facebook$ResR)/2 Facebook$Agree<-(Facebook$CritR+Facebook$Sympathetic)/2 Facebook$Consc<-(Facebook$Dependable+Facebook$DisR)/2 Facebook$EmoSt<-(Facebook$AnxR+Facebook$Calm)/2 Facebook$Openness<-(Facebook$NewExperiences+Facebook$ConvR)/2 Facebook$Health<-rowSums(CHIPS) Facebook$Dep4R<-reverse(3,0,Facebook$Dep4) Facebook$Dep8R<-reverse(3,0,Facebook$Dep8) Facebook$Dep12R<-reverse(3,0,Facebook$Dep12) Facebook$CESD<-Facebook$Dep1+Facebook$Dep2+Facebook$Dep3+Facebook$Dep4R+Facebook$Dep5+ Facebook$Dep6+Facebook$Dep7+Facebook$Dep8R+Facebook$Dep9+Facebook$Dep10+Facebook$Dep11+ Facebook$Dep12R+Facebook$Dep13+Facebook$Dep14+Facebook$Dep15+Facebook$Dep16 library(dataMaid) ## Loading required package: ggplot2
To generate an overview report, we’ll use the function, makeDataReport, which gives you a great overview of your dataset, with summary statistics, some analysis to assist with data cleaning, and customization options. And you can very easily add codebook information to your data report once it’s in R Markdown format.
The makeDataReport function has many arguments, which you can read all about here. Today, we’ll focus on some of the key arguments in the function. The very first argument is the dataset you want to perform the function on, in this case Facebook. You could run the function with only this argument if you wanted, and this might be fine if you have a small dataset and want to take a quick look.
Next, define what kind of file you want the function to output, “pdf”, “word”, or “html”. The default, NULL, produces one kind of output based on three checks: 1) Is there a LaTeX installation available? If yes, PDF. If no: 2) Does the computer have Windows? If yes, Word. If no:, 3) HTML if the first two checks produces no’s. After that is render, meaning R will render whatever output file you select, and save that file. If you plan on making changes to the R Markdown file after it is created, you want to set this to FALSE. The .Rmd file will automatically open, for you to edit, and will be saved in your working directory.
You can also specify file name, and/or a volume number (for updated reports) with vol=#. If you’ve already generated a report and try to create a new one without specifying a new file name, you’ll get an error; you can override that and force dataMaid to save over the existing file with replace=TRUE.
Putting some of these arguments together will generate a report for that includes all variables in the dataset, with default summaries and visualization.
makeDataReport(Facebook, output="html", render=FALSE, file="report_Facebook.Rmd")