Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R package data.table has become a tool of choice when working with big tabular data thanks to its versatility and performance. Its Python counterpart py datatable follows R cousin in performance and steadily catches up in functionality. A notable omission – temporal data types – were introduced in version 1.0 by means of two new types:
- datatable.Type.date32 to represent and store particular calendar date without a time component and
- datatable.Type.time64 to store specific moment in time (i.e. date with a time component)
and the datatable.time family of functions: https://datatable.readthedocs.io/en/latest/api/time.html
Let’s have a brief overview of how to use them.
datatable.Type.date32
This type represents a calendar date without a time component and internally stores date as a 32-bit signed integer counting the number of days since (positive) or before (negative) the epoch (1970-01-01). Thus, this type includes dates within the range of approximately ±5.8 million years which places the oldest stored date into the Late Miocene Epoch and the maximum one into completely unknown even to science fiction year 5,879,610 of the 58797th century in the future:
There are various ways to initialize and/or create date32 column inside datatable:
or
Remember to use ISO 8601 format when representing dates as strings, otherwise parsing fails silently:
If a frame already contains dates as strings then using combination of the functions datatable.time.ymd() constructor (to create date32 type), datatable.as_type() (to convert str to int) and datatable.str.slice() (to substring date elements) suffices to parse a string and create corresponding date32 value all within datatable API:
datatable.Type.time64
This type represents a specific moment in time and is stored iinternally as a 64-bit integer containing the number of nanoseconds since the epoch (1970-01-01) in UTC:
Similarly time64 can be created in the same fashion as date32 type above, for example:
As before time string should use ISO 8601 format as well. To create time from its components use datatable.time.ymdt():
datatable.time.* Functions
To effectively use datatable date32 and time64 types there are special functions included that are part of datatable.time family:
- constructors ymd() and ymdt() and
- date and time part functions: day(), day_of_week(), hour(), minute(), month(), nanosecond(), second(), year()
Using constructors was showcased already and part functions will come handy when filtering data etc., e.g.:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.