Missing values and column types when reading data into R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Reading data into R when dealing with column types and values that need to be considered as NA
Below are code snippets to introduce a few arguments of the read.csv
function in R
# Create sample data
strVals <- do.call("c",lapply(1:1000,function(x)paste(sample(letters,sample(5:20,1)),collapse="")))
miscVals <- sample(c("","999","—-","MISS"),100,replace=T)
numVals <- rnorm(1000)# Scenario 1 : Pure numeric and strings
dataTemp<-data.frame(numericVals = numVals, stringVals = strVals)
write.csv(dataTemp,file=”inputData.csv”,quote=F,row.names=F)
inData <- read.csv("inputData.csv",header=T)
sapply(inData,class)
# Col: stringVals is type factor# Using the function argument stringsAsFactors = FALSE mitigates character columns
# being turned into factor type
inData <- read.csv("inputData.csv",header=T,stringsAsFactors=FALSE)
sapply(inData,class)# Using function argument colClasses
# predefine the column types in the input file
inData <- read.csv("inputData.csv",header=T,colClasses = c("numeric","character"))
sapply(inData,class)# If you have data values that need to be considered as NA
# Add values from miscVals ( “”,”999″,”—-“,”MISS” ) to numVals and strVals
numMiscVals <- sample(c(numVals,miscVals),1000)
strMiscVals <- sample(c(strVals,miscVals),1000)dataTemp<-data.frame(numericVals = numMiscVals, stringVals = strMiscVals)
write.csv(dataTemp,file=”inputData.csv”,quote=F,row.names=F)
inData 0# Use na.strings argument
inData <- read.csv("inputData.csv",header=T,stringsAsFactors=FALSE,na.strings = c("","999","—-","MISS"))
sapply(inData,class)
# The columns have the right type numericVals is numeric and stringVals is character
sum(c(“”,”999″,”—-“,”MISS”) %in% inData$numericVals)
# should return 0
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.