Site icon R-bloggers

Missing values and column types when reading data into R

[This article was first published on indiacrunchin » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Reading data into R when dealing with column types and values that need to be considered as NA

Below are code snippets to introduce a few arguments of the read.csv function in R

# Create sample data
strVals <- do.call("c",lapply(1:1000,function(x)paste(sample(letters,sample(5:20,1)),collapse="")))
miscVals <- sample(c("","999","—-","MISS"),100,replace=T)
numVals <- rnorm(1000)

# Scenario 1 : Pure numeric and strings
dataTemp<-data.frame(numericVals = numVals, stringVals = strVals)
write.csv(dataTemp,file="inputData.csv",quote=F,row.names=F)
inData <- read.csv("inputData.csv",header=T)
sapply(inData,class)
# Col: stringVals is type factor

# Using the function argument stringsAsFactors = FALSE mitigates character columns
# being turned into factor type
inData <- read.csv("inputData.csv",header=T,stringsAsFactors=FALSE)
sapply(inData,class)

# Using function argument colClasses
# predefine the column types in the input file
inData <- read.csv("inputData.csv",header=T,colClasses = c("numeric","character"))
sapply(inData,class)

# If you have data values that need to be considered as NA
# Add values from miscVals ( "","999","—-","MISS" ) to numVals and strVals
numMiscVals <- sample(c(numVals,miscVals),1000)
strMiscVals <- sample(c(strVals,miscVals),1000)

dataTemp<-data.frame(numericVals = numMiscVals, stringVals = strMiscVals)
write.csv(dataTemp,file="inputData.csv",quote=F,row.names=F)
inData 0

# Use na.strings argument
inData <- read.csv("inputData.csv",header=T,stringsAsFactors=FALSE,na.strings = c("","999","—-","MISS"))
sapply(inData,class)
# The columns have the right type numericVals is numeric and stringVals is character
sum(c("","999","—-","MISS") %in% inData$numericVals)
# should return 0


To leave a comment for the author, please follow the link and comment on their blog: indiacrunchin » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.