Data types in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This article presents the different data types in R. To learn about the different variable types from a statistical point of view, read “Variable types and examples”.
What data types exist in R?
There are five data types in R:
- Numeric
- Integer
- Complex
- Character
- Logical
Datasets in R are often a combination of these 5 different data types. Below we explore in more details each data types one by one, except the data type “complex” as we focus on the main ones and this data type is rarely used in practice.
Numeric
The most common data type in R is numeric. A variable or a series will be stored as numeric data if the values are numbers or if the values contains decimals. For example, the following two series are stored as numeric by default:
# numeric series without decimals num_data <- c(3, 7, 2) num_data ## [1] 3 7 2 class(num_data) ## [1] "numeric" # numeric series with decimals num_data_dec <- c(3.4, 7.1, 2.9) num_data_dec ## [1] 3.4 7.1 2.9 class(num_data_dec) ## [1] "numeric" # also possible to check the class thanks to str() str(num_data_dec) ## num [1:3] 3.4 7.1 2.9
In other words, if you assign one or several numbers to an object in R, it will be stored as numeric by default (numbers with decimals), unless specified otherwise.
Integer
Integer data type is actually a special case of numeric data. Integers are numeric data without decimals. It can be used if you are sure that the numbers you store will never contains decimals. For example, let’s say you are interested in the number of children in a sample of 10 families. This variable is a discrete variable (see a reminder on the variable types if you do not remember what is a discrete variable) and will never have decimals. Therefore, it can be stored as integer data thanks to the as.integer()
command:
children ## [1] 1 3 2 2 4 4 1 1 1 4 children <- as.integer(children) class(children) ## [1] "integer"
Note that if your variable does not have decimals, R will automatically set the type as integers instead of numeric.
Character
The data type character is used when storing text, known as strings in R. The simplest ways to store data under the character format is by using ""
around the piece of text:
char <- "some text" char ## [1] "some text" class(char) ## [1] "character"
If you want to force any kind of data to be stored as character, you can do it by using the command as.character()
:
char2 <- as.character(children) char2 ## [1] "1" "3" "2" "2" "4" "4" "1" "1" "1" "4" class(char2) ## [1] "character"
Note that everything inside ""
will be considered as character, no matter if it looks like character or not. For example:
chars <- c("7.42") chars ## [1] "7.42" class(chars) ## [1] "character"
Furthermore, as soon as there is at least one character value inside a variable or vector, the whole variable or vector will be considered as character:
char_num <- c("text", 1, 3.72, 4) char_num ## [1] "text" "1" "3.72" "4" class(char_num) ## [1] "character"
Last but not least, although space does not matter in numeric data, it does matter for character data:
num_space <- c(1 ) num_nospace <- c(1) # is num_space equal to num_nospace? num_space == num_nospace ## [1] TRUE char_space <- "text " char_nospace <- "text" # is char_space equal to char_nospace? char_space == char_nospace ## [1] FALSE
As you can see from the results above, a space within character data (i.e., within ""
) makes it a different string in R!
Logical
A logical variable is a variable with only two values; TRUE
or FALSE
:
value1 <- 7 value2 <- 9 # is value1 greater than value2? greater <- value1 > value2 greater ## [1] FALSE class(greater) ## [1] "logical" # is value1 less than or equal to value2? less <- value1 <= value2 less ## [1] TRUE class(less) ## [1] "logical"
It is also possible to transform logical data into numeric data. After the transformation from logical to numeric with the as.numeric()
command, FALSE
values equal to 0 and TRUE
values equal to 1:
greater_num <- as.numeric(greater) sum(greater) ## [1] 0 less_num <- as.numeric(less) sum(less) ## [1] 1
Conversely, numeric data can be converted to logical data, with FALSE
for all values equal to 0 and TRUE
for all other values.
x <- 0 as.logical(x) ## [1] FALSE y <- 5 as.logical(y) ## [1] TRUE
Thanks for reading. I hope this article helped you to understand the basic data types in R and their particularities. If you would like to learn more about the different variable types from a statistical point of view, read “Variable types and examples”. As always, if you find a mistake/bug or if you have any questions do not hesitate to let me know in the comment section below, raise an issue on GitHub or contact me. Get updates every time a new article is published by subscribing to this blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.