Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The EARL SF 2017 conference was just held June 5 – 7 in San Francisco, CA. There were some amazing presentations illustrating how R is truly being embraced in enterprises. We gave a three-part presentation on tidyquant
for financial data science at scale, timekit
for time series machine learning, and Business Science enterprise applications. We’ve uploaded the EARL presentation to YouTube. Check out the presentation, and don’t forget to check out our announcements and to follow us on social media to stay up on the latest Business Science news, events and information!
EARL 2017 Presentation
If you’re interested in financial analysis, forecasting, and business applications, check out our 30 minute presentation from EARL SF 2017! The presentation is three-in-one:
- Financial data science at scale with
tidyquant
(0:45) - Time series machine learning with
timekit
(9:10) - Enterprise applications with Business Science (23:00)
Forecasting daily CRAN downloads
One of the big areas of interest on twitter leading up to the presentation was this tweet from Hadley showing growth in daily CRAN downloads are up to 1.25M per day:
Total daily CRAN downloads for the RStudio mirror for the last 3 years. #rstats pic.twitter.com/Wo5zz3xZyc
— Hadley Wickham (@hadleywickham) June 2, 2017
…and our response showing that it’s quite possible to exceed 2M downloads per day by end of the year!
What the future may bring… pic.twitter.com/NObnNTXDcv
— Matt Dancho (@mdancho84) June 2, 2017
How we made the CRAN daily download forecast graph
Several in the #rstats community wanted to know how this forecast was made:
It turns out that it’s actually a combination (or ensemble) of four separate predictions:
prophet
with linear growthprophet
with logistic growthtimekit
using a linear regression on the time series signaturetimekit
using a spline first to track trend and then a linear regression on the augmented data frame including the times series signature and the spline
We first made a log transformation and then calculated the for separate models. The key takeaway is that individually, none of the forecasts was a silver bullet! Each had issues with either the training set or the test set. The prophet
models tended to detect trend better while the timekit
models tended to detect pattern better.
However, when combined via a simple average of the models, the ensemble prediction exhibited both low training and test error.
If you’d like to take a deep dive into the code, the cran_dload_prediction.R file is available for download on the Business Science GitHub site.
Download Presentation and Code on GitHub
The slide deck and code from the EARL SF 2017 presentation can be downloaded from the Business Science GitHub site.
Announcements
-
We have completed the new package,
sweep
, which aims at “tidying” up theforecast
workflow by applyingbroom
concepts to the various model functions (auto.arima()
,ets()
, etc) andforecast()
output. It’s not on CRAN quite yet, but we are encouraging testing. You can download from github:devtools::install_github("business-science/sweep")
. Please provide feedback on the sweep github site! -
We are working on a name change of the
timekit
package. While we love the name, there’s also timekit.io that shares a product by the same name. To make it easier differentiating the software products, we are considering the change totimekitr
. This transition is expected to take place in July.
Follow Business Science on Social Media
- @bizScienc is on twitter!
- Check us out on LinkedIn!
- Sign up for our insights blog to stay updated!
- If you like our software, star our GitHub packages 🙂
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.