Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have different sets of monthly data that I want to align and evaluate once a month. The different sources report the timestamp of the monthly data differently – one reports the date without the day, another as the last business day, and another as the last day of the month. For what I want to do, I’m content to align the data to the last day of the month. How do I do that in R?
The solution for this isn’t obvious in Excel or OpenOffice, either, but I thought it would be at least similarly simple in R. I’d looked around, and the best solution I’d found on r-help was something of a kludge:
index(x) = as.POSIXct(as.Date(as.yearmon(index(x)),frac=1), tz="UTC")
I said it wasn’t pretty, didn’t I? You are reading that right – convert the index first to yearmon
, then to Date
(with frac
representing the fraction of a month with a number between 1 and 0), then finally to POSIXct
.
That’s worked fine for me until I recently found strange, eight-second misalignments between my data sets that seemed to be caused by the transformation above. Timezone issues, perhaps? On Jeff Ryan’s advice to keep all my date transformations in POSIX
, I finally found a better solution in r-help that had been posted by Whit Armstrong years and years ago. I’m not sure why I didn’t find it earlier, but his solution was somewhat more general than what I have been looking for. I’ve stripped his answer down a bit to the following utility function:
eom <- function(date) { # date character string containing POSIXct date date.lt <- as.POSIXlt(date) # add a month, then subtract a day: mon <- date.lt$mon + 2 year <- date.lt$year year <- year + as.integer(mon==13) # if month was December add a year mon[mon==13] <- 1 iso = ISOdate(1900+year, mon, 1, hour=0, tz=attr(date,"tz")) result = as.POSIXct(iso) - 86400 # subtract one day result + (as.POSIXlt(iso)$isdst - as.POSIXlt(result)$isdst)*3600 }
This wasn’t as simple as I was hoping for, but once functionalized it becomes simple. The premise is the same as the spreadsheet solution – add a month then subtract a day.
# Whit's example x <- seq(as.POSIXct("2001-01-10"),as.POSIXct("2005-12-10"),by="months") data.frame(before=x,after=eom(x)) before after 1 2001-01-10 2001-01-31 2 2001-02-10 2001-02-28 3 2001-03-10 2001-03-31 4 2001-04-10 2001-04-30 5 2001-05-10 2001-05-31 6 2001-06-10 2001-06-30 ... snip ...
Ah, much better. Thanks, Whit!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.