Twitter’s new R package for anomaly detection
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For Twitter, finding anomalies — sudden spikes or dips — in a time series is important to keep the microblogging service running smoothly. A sudden spike in shared photos may signify an “trending” event, whereas a sudden dip in posts might represent a failure in one of the back-end services that needs to be addressed. To detect such anomalies, the engineering team at Twitter created the AnomalyDetection R package, which they recently released as open source. (Late last year Twitter released a separate but related R package to detect “breakouts” in time series.)
Finding spikes and dips is relatively easy when they are extreme enough to extend beyond the natural seasonal variation in the time series. (Twitter calls these “global anomalies”.) The real trick is in identifying “local anomalies”: small variations on the seasonal trend, but which don't extend beyond the usual range of values.
The AnomalyDetection package uses the Seasonal Hybrid ESD (S-H-ESD) algorithm, which combines seasonal decomposition with robust statistical methods to identify local and global anomalies. The package can also be used to detect anomalies in non-time-series (unordered) data, though in this case the concept of “local” anomalies doesn't apply. You can find out more information about the package and how it's used at Twitter at the link below, or install it from Github for use with R.
Twitter Engineering Blog: Introducing practical and robust anomaly detection in a time series
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.