T is for tibble
[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
According to the tibble overview on the tidyverse website:
Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code.What does this mean? Well, remember when I noted that a character variable in my measures data frame had been changed to a factor? I manually changed it back to character. But had I simply created a tibble with that information, I wouldn’t have had to do anything. Data frames will also do partial matching on variable names – so if I requested Facebook$R, it would have given me all variables in that set starting with R. If I tried that with a tibble, I’d get an error message, because it matches variable references literally.
There are a few ways to create a tibble, one using the tibble packages and the other using the readr package. Fortunately, you don’t need to worry about that, because we’re just going to use the tidyverse package, which contains those two and more.
install.packages("tidyverse") ## Installing package into '~/R/win-library/3.4' ## (as 'lib' is unspecified) library(tidyverse) ## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse: readr ## Loading tidyverse: purrr ## Loading tidyverse: dplyr ## Conflicts with tidy packages ---------------------------------------------- ## filter(): dplyr, stats ## lag(): dplyr, stats
First, let’s create a new tibble from scratch. The syntax is almost exactly the same as it was in the data frame post.
measures<-tibble( meas_id = c(1:6), name = c("Ruminative Response Scale","Savoring Beliefs Inventory", "Satisfaction with Life Scale","Ten-Item Personality Measure", "Cohen-Hoberman Inventory of Physical Symptoms", "Center for Epidemiologic Studies Depression Scale"), num_items = c(22,24,5,10,32,16), rev_items = c(FALSE, TRUE, FALSE, TRUE, FALSE, TRUE) ) measures ## # A tibble: 6 x 4 ## meas_id name num_items ## <int> <chr> <dbl> ## 1 1 Ruminative Response Scale 22 ## 2 2 Savoring Beliefs Inventory 24 ## 3 3 Satisfaction with Life Scale 5 ## 4 4 Ten-Item Personality Measure 10 ## 5 5 Cohen-Hoberman Inventory of Physical Symptoms 32 ## 6 6 Center for Epidemiologic Studies Depression Scale 16 ## # ... with 1 more variables: rev_items <lgl>
As you can see, the name variable is character, not factor. I didn’t have to do anything. Alternatively, you could convert an existing data frame, whether it’s one you created or one that came with R/an R package.
car<-as_tibble(mtcars) car ## # A tibble: 32 x 11 ## mpg cyl disp hp drat wt qsec vs am gear carb ## * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## # ... with 22 more rows
But chances are you’ll be reading in data from an external file. The readr package can handle delimited and fixed width files. For instance, to read in the Facebook dataset I’ve been using, I just need the function read_tsv.
Facebook<-read_tsv("small_facebook_set.txt",col_names=TRUE) ## Parsed with column specification: ## cols( ## .default = col_integer() ## ) ## See spec(...) for full column specifications. Facebook ## # A tibble: 257 x 111 ## ID gender Rum1 Rum2 Rum3 Rum4 Rum5 Rum6 Rum7 Rum8 Rum9 ## <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> ## 1 1 1 3 1 3 2 3 1 2 1 1 ## 2 2 1 1 1 1 1 1 1 0 0 1 ## 3 3 1 4 3 3 4 3 4 2 3 3 ## 4 4 0 4 0 0 2 0 0 4 0 2 ## 5 5 1 2 2 2 1 2 1 1 1 1 ## 6 6 0 2 4 3 4 2 3 2 2 3 ## 7 7 1 1 2 3 2 0 2 3 1 2 ## 8 8 0 2 1 1 2 0 2 3 3 3 ## 9 9 1 4 1 4 4 3 2 2 1 1 ## 10 10 1 4 2 0 3 4 2 4 1 2 ## # ... with 247 more rows, and 100 more variables: Rum10 <int>, ## # Rum11 <int>, Rum12 <int>, Rum13 <int>, Rum14 <int>, Rum15 <int>, ## # Rum16 <int>, Rum17 <int>, Rum18 <int>, Rum19 <int>, Rum20 <int>, ## # Rum21 <int>, Rum22 <int>, Sav1 <int>, Sav2 <int>, Sav3 <int>, ## # Sav4 <int>, Sav5 <int>, Sav6 <int>, Sav7 <int>, Sav8 <int>, ## # Sav9 <int>, Sav10 <int>, Sav11 <int>, Sav12 <int>, Sav13 <int>, ## # Sav14 <int>, Sav15 <int>, Sav16 <int>, Sav17 <int>, Sav18 <int>, ## # Sav19 <int>, Sav20 <int>, Sav21 <int>, Sav22 <int>, Sav23 <int>, ## # Sav24 <int>, LS1 <int>, LS2 <int>, LS3 <int>, LS4 <int>, LS5 <int>, ## # Extraverted <int>, Critical <int>, Dependable <int>, Anxious <int>, ## # NewExperiences <int>, Reserved <int>, Sympathetic <int>, ## # Disorganized <int>, Calm <int>, Conventional <int>, Health1 <int>, ## # Health2 <int>, Health3 <int>, Health4 <int>, Health5 <int>, ## # Health6 <int>, Health7 <int>, Health8 <int>, Health9 <int>, ## # Health10 <int>, Health11 <int>, Health12 <int>, Health13 <int>, ## # Health14 <int>, Health15 <int>, Health16 <int>, Health17 <int>, ## # Health18 <int>, Health19 <int>, Health20 <int>, Health21 <int>, ## # Health22 <int>, Health23 <int>, Health24 <int>, Health25 <int>, ## # Health26 <int>, Health27 <int>, Health28 <int>, Health29 <int>, ## # Health30 <int>, Health31 <int>, Health32 <int>, Dep1 <int>, ## # Dep2 <int>, Dep3 <int>, Dep4 <int>, Dep5 <int>, Dep6 <int>, ## # Dep7 <int>, Dep8 <int>, Dep9 <int>, Dep10 <int>, Dep11 <int>, ## # Dep12 <int>, Dep13 <int>, Dep14 <int>, Dep15 <int>, Dep16 <int>
Finally, if you’re working with SAS, SPSS, or Stata files, you can read those in with the tidyverse package, haven, and the functions read_sas, read_sav, and read_dta, respectively.
If for some reason you need a data frame rather than a tibble, you can convert a tibble to a data frame with class(as.data.frame(tibble_name)).
You can learn more about tibbles here and here.
To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.