Easily Converting Strings to Times and Dates in R with flipTime
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Date conversion in R can be a real pain. However, it is a very important initial step when you first get your data into R to ensure that it has the correct type (e.g. Date
). This gives you the the correct functionality for working with data of that type. R provides a number of handy features for working with date-time data. However, the sheer number of options/packages available can make things seem overwhelming at first. There are more than 10 packages providing support for working with date-time data in R. In this post, I will provide an introduction to the functionality R offers for converting strings to dates. In doing so, I discuss common pitfalls and give helpful tips to make working with dates in R less painful. Finally, I introduce some code that my colleagues and I wrote to make things a bit easier (with the flipTime
package).
Background
When dates are provided in the format of year followed by month followed by day, such as 2017-12-02
, you can use the as.Date
function. This tells R to think of them as being dates. For example:
months(as.Date("2017-12-02"))
returns a value ofDecember
weekdays(as.Date("2017-12-02"))
returns a value ofSaturday
as.Date("2017-06-09") - as.Date("2016-05-01")
returns a value of of404
and prints on the screenTime difference of 404 days.
difftime(as.Date("2017-06-09"), as.Date("2016-05-01"), units = "hours")
returns a value of9696
and prints on the screenTime difference of 9696
hours
.format(as.Date("2017-01-02"), "%A, %d-%b. %Y")
prints"Monday, 02-Jan. 2017"
Unfortunately, more often than not, we have dates in some other format. As a result, as.Date
does not work. For example, as.Date("17-12-2010")
returns 0017-12-20
.
Things get even more complicated when input data contain times, as then we need to handle issues like time zones and leap seconds. R provides the classes POSIXct
and POSIXlt
for working with date-time data. POSIXct
corresponds to the POSIX standard for calendar time and POSIXlt
corresponds to the POSIX standard for local time. Also, the POSIXct
class is more convenient for inclusion in R data frames.
I am now going to review some of the more useful packages. However, if you are a bit of a guru, just skip to the section on flipTime
, where I have documented the stuff we have done.
lubridate
The lubridate
package provides a number of useful functions for reading, manipulating, and doing arithmetic with dates in R. It provides the functions parse_date_time()
and parse_date_time2()
, which can be used to quickly convert strings to date-time objects.Their convenience stems from allowing the user to specify orders to convert the strings, but without the need to specify how to separate the different components.
The parse_date_time()
function allows the user to specify multiple orders at once. Additionally, it determines internally which is best to use to convert the input strings. It does this by training itself on a subset of the input strings and ranking the supplied orders. (The ranking is based on how often they successfully convert the strings in the subset). By contrast, the parse_date_time2()
function does not allow multiple orders to be specified at once since it supports fewer orders overall. However, it is faster when you need to convert a large number of strings.
For example, if we use:
parse_date_time("10-31/2010", orders = c("ymd", "dmy", "mdy"))
As a result, we get 2010-10-31 UTC
For shorter input vectors, lubridate
can give strange results because it is so “aggressive” when performing the conversion. (Because I want to make things easier for skim reading, for the rest of the post I will put the output immediately beneath the code, with ##
indicating it is the result of running the code.)
parse_date_time("July/1998", orders = c("bdy", "bY")) ## [1] "1998-07-19 UTC" parse_date_time2("Jan 128", orders = "mdy") ## [1] "2008-01-12 UTC" parse_date_time2("3.122", orders = "ymd") ## [1] "2003-12-02 UTC"
anytime
Another popular package for reading date strings into R is anytime, which uses the Boost date_time
C++ library. It provides functions anytime()
and anydate()
for date conversion. The package supports converting other R classes such as integer and factor to dates in addition to converting character strings. The user does not need to specify any orders or formats, as anytime()
and anydate()
will guess the format (from a default list of supported formats). Furthermore, you have the possibility of including additional formats using the addFormats()
function.
As with lubridate
, anytime
can give strange results because of how aggressive it is with trying to convert the strings. For example, it does not support formats with two-digit years by default. Furthermore, it does not support at all strings containing “AM/PM” indicators. It is inconvenient, sometimes impossible, in some situations to specify whether a numeric month comes before or after the day in a date string.
Additionally, there may be situations where there is ambiguity (eg: is “01/02” January 2nd or February 1st?). In these situations, we’d like to be able to tell the function whether day comes before month or not. Where we are not sure, it’s helpful to have a warning. Unfortunately we don’t get that here.
library(anytime) anydate("3.145") ## [1] "1400-03-14" anydate(3.145) ## [1] "1970-01-04" anytime(c("10-11-2011 5:30AM", "16-10-2011 10:10pm")) ## [1] "2011-10-10 23:00:00 AEDT" NA
flipTime
The package flipTime
provides utilities for working with time series and date-time data. The package can be installed from GitHub
using
require(devtools) install_github("Displayr/flipTime")
I will discuss only two functions from the package in this post, AsDate()
and AsDateTime().
These are used for the conversion of date and date-time strings, respectively. These functions build on the convenience and speed of the lubridate
function. Furthermore, the flipTime
functions provide additional functionality (making them easier to use). The functions are smart about identifying the proper format to use. So the user doesn’t need to specify the format(s) as inputs. At the same time, both AsDate()
and AsDateTime()
are careful to not convert any strings to dates when they are not formatted as dates. Additionally, it will also warn the user when the dates are not in an unambiguous format.
AsDate()
and AsDateTime()
are very flexible with respect to what they permit as characters to separate the components of the date strings.
library(flipTime) AsDate("Jan. 10, 2016") ## [1] "2016-01-10" AsDate("Jan/10 - 2016") ## [1] "2016-01-10"
However, they are also careful to not convert strings to dates that are clearly not dates:
AsDate("Jan 128") ## Error in AsDate("Jan 128"): Could not parse "Jan 128" into a valid ## date in any format. AsDate("3.122") ## Error in AsDate("3.122"): Could not parse "3.122" into a valid date in ## any format.
The above example also demonstrates the default behaviour of the functions to throw an error. This occurs when the date strings cannot be interpreted as dates. Both functions have an argument on.parse.failure
, which is used to control this behaviour.
AsDate("foo", on.parse.failure = "warn") ## Warning in AsDate("foo", on.parse.failure = "warn"): Could not parse ## "foo" into a valid date in any format. ## [1] NA AsDateTime("foo", on.parse.failure = "silent") ## [1] NA
Both functions provide an argument us.format
, to allow the user to specify whether the date strings are in a U.S. or international format. U.S. format is with the month coming before the day, such as Jan.
2, 1988. By contrast, international format, has the day before the month, such as 21-10-1999
. The default behaviour is to check both formats. In this case, if the format is ambiguous, the date strings will be converted assuming the U.S. format. The user will also receive a warning.
AsDateTime("9/10/2010 10:20PM") ## Warning: Date formats are ambiguous, US format has been used. ## [1] "2010-09-10 22:20:00 UTC" AsDateTime("9/10/2010 10:20PM", us.format = FALSE) ## [1] "2010-10-09 22:20:00 UTC"
We can also combine the flipTime
functions with functions from lubridate
(By the way, due to a WordPress bug, I am using = instead of a more righteous assignment operator; please write to them rather than me, as I am a good guy.)
library(lubridate) dt = AsDateTime("10/30/08 11:10AM") dt + dminutes(6) ## [1] "2008-10-30 11:16:00 UTC" birthday = "Dec. 8, 86" days.alive = (AsDate(Sys.time()) - AsDate(birthday)) / ddays(1) days.alive ## [1] 11322
The function AsDate()
is also able to interpret date intervals or periods, which can be useful when working with aggregated data. If the function encounters date periods, it will convert the start of the period to a date and return it.
AsDate("10/20/2015-12/02/2016") ## [1] "2015-10-20" AsDate("may 2017-september 2017") ## [1] "2017-05-01" AsDate("Dec/Apr 16") ## [1] "2015-12-01"
The following example shows how AsDate()
can be useful when working with dates inside a custom function. Say we have the following data on monthly returns of Apple and Yahoo. (A full copy of the dataframe can be found on this Displayr page here).
head(df) ## YHOO AAPL ## 01/2007-02/2007 0.09516441 -0.006488240 ## 02/2007-03/2007 0.09007418 -0.013061224 ## 03/2007-04/2007 0.01393390 0.097601323 ## 04/2007-05/2007 -0.10386705 0.074604371 ## 05/2007-06/2007 0.02353780 0.213884993 ## 06/2007-07/2007 -0.05470383 0.006932409
We can plot this data as time series with formatted axis labels. For instance, we might write the following function and it produces the plot below.
PlotSeries = function(data, max.labels = 20, ...){ n = nrow(data) xlabs = AsDate(rownames(data), on.parse.failure = "silent") if (any(is.na(xlabs))) # no dates present, use original rownames xlabs = rownames(data)[seq.int(1, n, length.out = max.labels)] else{ # dates, present; format for pretty labels xlabs = seq(xlabs[1L], xlabs[n], length.out = max.labels) xlabs = format(xlabs, "%b, %Y") } matplot(data, type = "l", xaxt = "n", ...) axis(1, labels = xlabs, at = seq.int(1, n, length.out = max.labels), las = 2) legend("bottomright", names(data), lty = 1:ncol(data), col = 1:ncol(data)) } PlotSeries(df, lwd = 2, ylab = "Return")
References
The source code for flipTime can be viewed and downloaded here.
More information about working with dates and times in R can be found in the following sources.
- Bonnie Ross – Using Dates and Times in R
- Phil Spector – Dates and Times in R
- Cole Beck – Handling date-times in R
- For an introduction to
lubridate
, see Garrett Grolemund and Hadley Wickham: R for Data Science - For an introduction to
anytime
, see Dirk Eddelbuettel’s blog
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.