Data types in R

[This article was first published on R on Stats and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This article presents the different data types in R. To learn about the different variable types from a statistical point of view, read “Variable types and examples”.

What data types exist in R?

There are the 6 most common data types in R:

  1. Numeric
  2. Integer
  3. Complex
  4. Character
  5. Factor
  6. Logical

Datasets in R are often a combination of these 6 different data types. Below we explore in more detail each data types one by one, except the data type “complex” as we focus on the main ones and this data type is rarely used in practice.

Numeric

The most common data type in R is numeric. A variable or a series will be stored as numeric data if the values are numbers or if the values contains decimals. For example, the following two series are stored as numeric by default:

# numeric series without decimals
num_data <- c(3, 7, 2)
num_data
## [1] 3 7 2
class(num_data)
## [1] "numeric"
# numeric series with decimals
num_data_dec <- c(3.4, 7.1, 2.9)
num_data_dec
## [1] 3.4 7.1 2.9
class(num_data_dec)
## [1] "numeric"
# also possible to check the class thanks to str()
str(num_data_dec)
##  num [1:3] 3.4 7.1 2.9

In other words, if you assign one or several numbers to an object in R, it will be stored as numeric by default (numbers with decimals), unless specified otherwise.

Integer

Integer data type is actually a special case of numeric data. Integers are numeric data without decimals. It can be used if you are sure that the numbers you store will never contains decimals. For example, let’s say you are interested in the number of children in a sample of 10 families. This variable is a discrete variable (see a reminder on the variable types if you do not remember what is a discrete variable) and will never have decimals. Therefore, it can be stored as integer data thanks to the as.integer() command:

children
##  [1] 1 3 2 2 4 4 1 1 1 4
children <- as.integer(children)
class(children)
## [1] "integer"

Note that if your variable does not have decimals, R will automatically set the type as integers instead of numeric.

Character

The data type character is used when storing text, known as strings in R. The simplest ways to store data under the character format is by using "" around the piece of text:

char <- "some text"
char
## [1] "some text"
class(char)
## [1] "character"

If you want to force any kind of data to be stored as character, you can do it by using the command as.character():

char2 <- as.character(children)
char2
##  [1] "1" "3" "2" "2" "4" "4" "1" "1" "1" "4"
class(char2)
## [1] "character"

Note that everything inside "" will be considered as character, no matter if it looks like character or not. For example:

chars <- c("7.42")
chars
## [1] "7.42"
class(chars)
## [1] "character"

Furthermore, as soon as there is at least one character value inside a variable or vector, the whole variable or vector will be considered as character:

char_num <- c("text", 1, 3.72, 4)
char_num
## [1] "text" "1"    "3.72" "4"
class(char_num)
## [1] "character"

Last but not least, although space does not matter in numeric data, it does matter for character data:

num_space <- c(1)
num_nospace <- c(1)
# is num_space equal to num_nospace?
num_space == num_nospace
## [1] TRUE
char_space <- "text "
char_nospace <- "text"
# is char_space equal to char_nospace?
char_space == char_nospace
## [1] FALSE

As you can see from the results above, a space within character data (i.e., within "") makes it a different string in R!

Factor

Factor variables are a special case of character variables in the sense that it also contains text. However, factor variables are used when there are a limited number of unique character strings. It often represents a categorical variable. For instance, the gender will usually take on only two values, “female” or “male” (and will be considered as a factor variable) whereas the name will generally have lots of possibilities (and thus will be considered as a character variable). To create a factor variable use the factor() function:

gender <- factor(c("female", "female", "male", "female", "male"))
gender
## [1] female female male   female male  
## Levels: female male

To know the different levels of a factor variable, use levels():

levels(gender)
## [1] "female" "male"

By default, the levels are sorted alphabetically. You can reorder the levels with the argument levels in the factor() function:

gender <- factor(gender, levels = c("male", "female"))
levels(gender)
## [1] "male"   "female"

Character strings can be converted to factors with as.factor():

text <- c("test1", "test2", "test1", "test1") # create a character vector
class(text) # to know the class
## [1] "character"
text_factor <- as.factor(text) # transform to factor
class(text_factor) # recheck the class
## [1] "factor"

The character strings have been transformed to factors, as shown by its class of the type factor.

Logical

A logical variable is a variable with only two values; TRUE or FALSE:

value1 <- 7
value2 <- 9

# is value1 greater than value2?
greater <- value1 > value2
greater
## [1] FALSE
class(greater)
## [1] "logical"
# is value1 less than or equal to value2?
less <- value1 <= value2
less
## [1] TRUE
class(less)
## [1] "logical"

It is also possible to transform logical data into numeric data. After the transformation from logical to numeric with the as.numeric() command, FALSE values equal to 0 and TRUE values equal to 1:

greater_num <- as.numeric(greater)
sum(greater)
## [1] 0
less_num <- as.numeric(less)
sum(less)
## [1] 1

Conversely, numeric data can be converted to logical data, with FALSE for all values equal to 0 and TRUE for all other values.

x <- 0
as.logical(x)
## [1] FALSE
y <- 5
as.logical(y)
## [1] TRUE

Thanks for reading. I hope this article helped you to understand the basic data types in R and their particularities. If you would like to learn more about the different variable types from a statistical point of view, read the article “Variable types and examples”.

As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.

To leave a comment for the author, please follow the link and comment on their blog: R on Stats and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)