Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I am working on v.0.4.0 of the padr package this summer. Two new features that will be added are wrappers around seq.Date
and seq.POSIXt
. Since it is going to take a while before the new release is on CRAN, I go ahead and do an early presentation of these functions. Date and datetime parsing in base R are powerful and comprehensive, but also tedious. They can slow you down in your programming or analysis. Luckily, good wrappers and alternatives exist, at least the ymd{_h}{m}{s}
suite from lubridate and Dirk Eddelbuettel’s anytime. These functions remove much of the overhead of date and datetime parsing, allowing for quick formatting of vectors in all kinds of formats. They also alleviate the pain of using seq.Date()
and seq.POSIXt
a little, because the from
and the to
arguments should be parsed dates or datetimes. Take the following example.
seq(as.POSIXct("2017-07-25 00:00:00"), as.POSIXct("2017-07-25 03:00:00"), by = "hour")
## [1] "2017-07-25 00:00:00 CEST" "2017-07-25 01:00:00 CEST" ## [3] "2017-07-25 02:00:00 CEST" "2017-07-25 03:00:00 CEST"
library(lubridate) seq(ymd_h("20170725 00"), ymd_h("20170725 03"), by = "hour")
## [1] "2017-07-25 00:00:00 UTC" "2017-07-25 01:00:00 UTC" ## [3] "2017-07-25 02:00:00 UTC" "2017-07-25 03:00:00 UTC"
I think, however, that there is still some overhead in the second specification. By overhead I mean specifying things that feel redundant, things that could be set to some kind of default. Since the whole idea behind padr
is automating away redundant and tedious actions in preparing datetime data, providing alternative functions that ask for as little as possible are a natural addition. This resulted in span_date
and span_time
. They remove overhead by:
-
allowing for specification of
from
andto
directly as integer or character in lubridatish format. -
setting the unspecified datetime parts to a default of 1 for month and day, and 0 for hour, minute, and second.
-
assuming the desired interval (the
by
statement inseq.Date
andseq.POSIXt
) as the lowest of the datetime parts specified in eitherfrom
ortwo
.
If this is a little abstract still, let me give some examples. The above becomes example becomes:
devtools::install_github("EdwinTh/padr") # download the dev version library(padr) span_time("20170725 00", "20170725 03")
## [1] "2017-07-25 00:00:00 UTC" "2017-07-25 01:00:00 UTC" ## [3] "2017-07-25 02:00:00 UTC" "2017-07-25 03:00:00 UTC"
We can simplify this even further, specifying the 00 for the hour in from
is not strictly necesarry. Since the hour is specified in to
already the interval
will remain hour if we omit it.
span_time("20170725", "20170725 03")
## [1] "2017-07-25 00:00:00 UTC" "2017-07-25 01:00:00 UTC" ## [3] "2017-07-25 02:00:00 UTC" "2017-07-25 03:00:00 UTC"
We can even use an integer instead of a character for from
. When there are no time parts involved, a character is not necesarry. Since we use it in span_time
it will be parsed to POSIXct
, not to Date
.
span_time(20170725, "20170725 03")
## [1] "2017-07-25 00:00:00 UTC" "2017-07-25 01:00:00 UTC" ## [3] "2017-07-25 02:00:00 UTC" "2017-07-25 03:00:00 UTC"
to
does not have to be specified, we can use len_out
instead. The interval
is derived only from from
then. To get Jan 1st, from 2010 to 2014 we can do both
span_date(2010, 2014)
## [1] "2010-01-01" "2011-01-01" "2012-01-01" "2013-01-01" "2014-01-01"
and
span_date(2010, len_out = 5)
## [1] "2010-01-01" "2011-01-01" "2012-01-01" "2013-01-01" "2014-01-01"
If you want the interval
to be different from the default, you can specify it.
span_date(2016, 2017, interval = "month")
## [1] "2016-01-01" "2016-02-01" "2016-03-01" "2016-04-01" "2016-05-01" ## [6] "2016-06-01" "2016-07-01" "2016-08-01" "2016-09-01" "2016-10-01" ## [11] "2016-11-01" "2016-12-01" "2017-01-01"
Note however, that you can often also specify the interval
by providing more information in from
or to
.
span_date(201601, 2017)
## [1] "2016-01-01" "2016-02-01" "2016-03-01" "2016-04-01" "2016-05-01" ## [6] "2016-06-01" "2016-07-01" "2016-08-01" "2016-09-01" "2016-10-01" ## [11] "2016-11-01" "2016-12-01" "2017-01-01"
I hope you find these little wrappers around seq.Date
and seq.POSIXt
useful and that they will enable you to conquer dates and datetimes a little quicker. You can obtain the function by downloading the dev version of padr
as I did above. If you can think of improvements of the functions before it hits CRAN please tell me. Issues filed, pull requests, emails, and tweets are much appreciated.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.