Introduction to tibbles
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
A tibble, or
tbl_df
, is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code. Tibbles also have an enhancedprint method()
which makes them easier to use with large datasets containing complex objects.
Source: https://tibble.tidyverse.org/
In this post, we will explore tibbles. To be more precise, we will learn:
- how tibbles are different from data frames?
- how to create tibbles?
- how to manipulate tibbles?
Libraries, Code & Data
We will use the following packages:
The code can be found here.
library(tibble) library(dplyr)
Creating tibbles
tibble can be created using any of the following:
tibble()
as_tibble()
tribble()
Let us start with tibble()
.
tibble(x = letters, y = 1:26, z = sample(100, 26)) ## # A tibble: 26 x 3 ## x y z ## <chr> <int> <int> ## 1 a 1 74 ## 2 b 2 83 ## 3 c 3 100 ## 4 d 4 7 ## 5 e 5 95 ## 6 f 6 80 ## 7 g 7 42 ## 8 h 8 88 ## 9 i 9 20 ## 10 j 10 72 ## # ... with 16 more rows
We mentioned the column names followed by the data. If you do not specify the
column names, tibble()
will supply them. Ensure that the length of each column
is same.
tibble features
- never changes input’s types
tibble()
will never alter the input’s type. For example, if you supply a
character vector it will not be converted to factor unlike data.frame where
you need to set stringsAsFactors
to FALSE
.
tibble(x = letters, y = 1:26, z = sample(100, 26)) ## # A tibble: 26 x 3 ## x y z ## <chr> <int> <int> ## 1 a 1 86 ## 2 b 2 65 ## 3 c 3 88 ## 4 d 4 57 ## 5 e 5 77 ## 6 f 6 19 ## 7 g 7 10 ## 8 h 8 43 ## 9 i 9 27 ## 10 j 10 80 ## # ... with 16 more rows
- never adjusts variable names
tibble()
will never modify the column names. In the below example, you can
observe that while data.frame
adds a .
, tibble()
retains the column names
as is.
names(data.frame(`order value` = 10)) ## [1] "order.value" names(tibble(`order value` = 10)) ## [1] "order value"
- never prints all rows
tibble()
will never print all the rows and clutter your console. It will only
print the first 10 rows and only as many columns that fit the width of the
console.
x <- 1:100 y <- letters[1] z <- sample(c(TRUE, FALSE), 100, replace = TRUE) tibble(x, y, z) ## # A tibble: 100 x 3 ## x y z ## <int> <chr> <lgl> ## 1 1 a TRUE ## 2 2 a TRUE ## 3 3 a TRUE ## 4 4 a FALSE ## 5 5 a TRUE ## 6 6 a FALSE ## 7 7 a TRUE ## 8 8 a TRUE ## 9 9 a TRUE ## 10 10 a FALSE ## # ... with 90 more rows
- never recycles vector of length greater than 1
Recycling vectors of length greater than 1 often leads to errors and as such
tibble()
will only recycle vectors of length 1.
x <- 1:100 y <- letters z <- sample(c(TRUE, FALSE), 100, replace = TRUE) tibble(x, y, z) Error in overscope_eval_next(overscope, expr) : object 'y' not found
Membership Testing
We can test if an object is a tibble using is_tibble()
.
is_tibble(mtcars) ## [1] FALSE is_tibble(as_tibble(mtcars)) ## [1] TRUE
Tribble
Another way to create tibbles is using tribble()
:
- it is short for transposed tibbles
- it is customized for data entry in code
- column names start with
~
- and values are separated by commas
tribble( ~x, ~y, ~z, #--|--|---- 1, TRUE, 'a', 2, FALSE, 'b' ) ## # A tibble: 2 x 3 ## x y z ## <dbl> <lgl> <chr> ## 1 1 TRUE a ## 2 2 FALSE b
Column Names
Names of the columns in tibbles need not be valid R variable names. They can contain unusual characters like a space or a smiley but must be enclosed in ticks.
tibble( ` ` = 'space', `2` = 'integer', `:)` = 'smiley' ) ## # A tibble: 1 x 3 ## ` ` `2` `:)` ## <chr> <chr> <chr> ## 1 space integer smiley
Add Rows
Let us add data related to Safari browser to the web traffic data using
add_row()
.
browsers <- enframe(c(chrome = 40, firefox = 20, edge = 30)) browsers ## # A tibble: 3 x 2 ## name value ## <chr> <dbl> ## 1 chrome 40 ## 2 firefox 20 ## 3 edge 30 add_row(browsers, name = 'safari', value = 10) ## # A tibble: 4 x 2 ## name value ## <chr> <dbl> ## 1 chrome 40 ## 2 firefox 20 ## 3 edge 30 ## 4 safari 10
If we want to add the data at a particular row, we can specify the row number
using the .before
argument. Let us add the data related to Safari browser
in the second row instead of the last row.
add_row(browsers, name = 'safari', value = 10, .before = 2) ## # A tibble: 4 x 2 ## name value ## <chr> <dbl> ## 1 chrome 40 ## 2 safari 10 ## 3 firefox 20 ## 4 edge 30
Add Columns
add_column()
adds a new column to tibbles.
browsers <- enframe(c(chrome = 40, firefox = 20, edge = 30, safari = 10)) add_column(browsers, visits = c(4000, 2000, 3000, 1000)) ## # A tibble: 4 x 3 ## name value visits ## <chr> <dbl> <dbl> ## 1 chrome 40 4000 ## 2 firefox 20 2000 ## 3 edge 30 3000 ## 4 safari 10 1000
Rownames
The tibble package provides a set of functions to deal
with rownames. Remember, tibble
does not have rownames
unlike data.frame
.
To check whether a data set has rownames, use has_rownames()
.
has_rownames(mtcars) ## [1] TRUE
Remove Rownames
remove_rownames(mtcars) ## mpg cyl disp hp drat wt qsec vs am gear carb ## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 ## 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 ## 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 ## 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 ## 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ## 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 ## 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 ## 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ## 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 ## 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 ## 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 ## 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 ## 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 ## 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 ## 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 ## 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 ## 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 ## 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 ## 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 ## 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 ## 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 ## 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Rownames to Column
head(rownames_to_column(mtcars)) ## rowname mpg cyl disp hp drat wt qsec vs am gear carb ## 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Column to Rownames
To convert the first column in the data set to rownames, use column_to_rownames()
:
mtcars_tbl <- rownames_to_column(mtcars) column_to_rownames(mtcars_tbl) ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 ## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 ## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 ## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 ## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 ## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 ## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 ## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 ## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 ## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Glimpse
Use glimpse()
to get an overview of the data.
glimpse(mtcars) ## Observations: 32 ## Variables: 11 ## $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.... ## $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, ... ## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 1... ## $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, ... ## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.9... ## $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3... ## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 2... ## $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, ... ## $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, ... ## $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, ... ## $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, ...
Check Column
has_name()
can be used to check if a tibble has a specific column.
has_name(mtcars, 'cyl') ## [1] TRUE has_name(mtcars, 'gears') ## [1] FALSE
Summary
Creating tibbles
- use
tibble()
to create tibbles - use
as_tibble()
to coerce other objects to tibble - use
enframe()
to coerce vector to tibble - use
tribble()
to create tibble using data entry
Modifying tibbles
- use
add_row()
to add a new row - use
add_column()
to add a new column - use
remove_rownames()
to remove rownames from data - use
rownames_to_colum()
to coerce rowname to first column - use
column_to_rownames()
to coerce first column to rownames
Testing tibbles
- use
is_tibble()
to test if an object is a tibble - use
has_rownames()
to check whether a data set has rownames - use
has_name()
to check if tibble has a specific column - use
glimpse()
to get an overview of data
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.