Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m excited to announce the release of tidyquant
version 0.4.0!!! The release is yet again sizable. It includes integration with the PerformanceAnalytics
package, which now enables full financial analyses to be performed without ever leaving the “tidyverse” (i.e. with DATA FRAMES). The integration includes the ability to perform performance analysis and portfolio attribution at scale (i.e. with many stocks or many portfolios at once)! But wait there’s more… In addition to an introduction vignette, we created five (yes, five!) topic-specific vignettes designed to reduce the learning curve for financial data scientists. We also have new ggplot2
themes to assist with creating beautiful and meaningful financial charts. We included tq_get
support for “compound getters” so multiple data sources can be brought into a nested data frame all at once. Last, we have added new tq_index()
and tq_exchange()
functions to make collecting stock data with tq_get
even easier. I’ll briefly touch on several of the updates. The package is open source, and you can view the code on the tidyquant github page.
Table of Contents
Prerequisites
First, update to tidyquant
v0.4.0.
Next, load tidyquant
.
Load the FANG
data set, which will be used in the examples. The FANG
data set contains the historical stock prices for FB, AMZN, NFLX, and GOOG from the beginning of 2013 through the end of 2016.
I also recommend the open-source RStudio IDE, which makes R Programming easy and efficient especially for financial analysis.
Overview
tidyquant: Bringing financial analysis to the tidyverse
Before I dive into the updates, if you are new to tidyquant
there’s a few core functions that you need to be aware of:
-
Getting Financial Data from the web:
tq_get()
. This is a one-stop shop for getting web-based financial data in a “tidy” data frame format. Get data for daily stock prices (historical), key statistics (real-time), key ratios (historical), financial statements, dividends, splits, economic data from the FRED, FOREX rates from Oanda. -
Manipulating Financial Data:
tq_transmute()
andtq_mutate()
. Integration for many financial functions fromxts
,zoo
,quantmod
andTTR
packages.tq_mutate()
is used to add a column to the data frame, andtq_transmute()
is used to return a new data frame which is necessary for periodicity changes. Important: In v0.4.0,tq_transmute()
replacestq_transform()
for consistency withdplyr::transmute()
. -
Coercing Data To and From xts and tibble:
as_tibble()
andas_xts()
. There are a ton of Stack Overflow articles on converting data frames to and from xts. These two functions can be used to answer 99% of these questions. -
Performance Analysis and Portfolio Analysis:
tq_performance()
andtq_portfolio()
. The newest additions to thetidyquant
family integratePerformanceAnalytics
functions.tq_performance()
converts investment returns into performance metrics.tq_portfolio()
aggregates a group (or multiple groups) of asset returns into one or more portfolios.
To learn more, browse the new and improved vignettes.
v0.4.0 Updates
We’ve got some neat examples to show off the new capabilities:
- PerformanceAnalytics Integration
- New User-Friendly Vignettes
- New ggplot2 Themes
- “Compound Getters” in tq_get
- tq_index and tq_exchange
1: PerformanceAnalytics Integration
The PerformanceAnalytics
package does two things very well. First, it enables performance analysis of investment returns using a wide variety of metrics that are detailed in the text, “Practical Portfolio Performance Measurement and Attribution” by Carl Bacon. Second, it enables portfolio aggregation, the process of aggregating a weighted group of stocks or investments into a single set of returns. When combined, this functionality enables portfolio attribution, a set of techniques used to explain a portfolio’s performance versus a benchmark.
The next few examples show off some of the basic capability. These examples scratch the surface of the full capability. Below is a figure demonstrating multiple portfolio analysis, which is an advanced topic discussed in the vignette.
A: Stock Performance Analysis
The Sharpe ratio is commonly used in finance as a measure of return per unit risk. The larger the value, the better the reward-to-risk trade off. The PerformanceAnalytics
package contains a function SharpeRatio
(and SharpeRatio.modified
) that can be used to quickly calculate from a set of returns. We’ll use tq_performance
to calculate the Sharpe ratio in a “tidy” way, using the PerformanceAnalytics
integration. Call tq_performance_fun_options()
to see a full list of integrated functions. Spoiler alert: there’s 128 functions divided into 14 categories.
tq_performance()
allows us to apply SharpeRatio
to “tidy” data frames. The tq_performance()
function uses Ra
and Rb
to specify the asset returns and baseline returns, respectively. These values get passed to the performance_fun
, which in our case will be SharpeRatio
. The ...
allows the user to pass additional arguments to the underlying PerformanceAnalytics
function. The arguments are shown below.
To understand the end goal, we need to analyze the SharpeRatio
function. The arguments are displayed below. It contains R
a set of returns, Rf
the risk-free rate, p
the confidence level, and FUN
the value of the denominator (default returns Sharpe ratio using all three), and a few other functions that are not used in this example. It’s important to recognize that R
in the SharpeRatio()
function is specified using asset returns (Ra
) in the tq_performance()
function. The baseline returns argument (Rb
) in the tq_performance()
function is not required since the baseline is not required to calculate SharpeRatio
. Just keep in mind that you will either see R
or the combination of Ra, Rb
in the PerformanceAnalytics
function arguments, which indicates whether or not Rb
is required in tq_performance()
.
Now that we understand the function, we can easily begin the task of getting the Sharpe ratios for the “FANG” stocks. It involves three steps:
- Get data with
tq_get
(already done since we haveFANG
loaded). Make sure to group by symbol if the tibble includes prices for multiple stocks. - Transmute to period returns with
tq_transmute(mutate_fun = periodReturn)
- Calculate Sharpe ratio with
tq_performance(performance_fun = SharpeRatio)
It’s very easy to get performance metrics for multiple stocks. Next, we’ll take a look at portfolio performance.
B: Basic Portfolio Performance
Combining a group of assets into a portfolio is one of the most useful techniques for controlling risk versus reward. The blending of assets naturally diversifies and can reduce downside risk. Further, portfolio attribution is a set of techniques used to analyze a portfolio or set of portfolios against a benchmark. The newest vignette, Performance Analysis with tidyquant, breaks the process into several steps shown in the workflow diagram below.
The process for a single portfolio aggregation without a benchmark is shown below. Portfolio aggregation requires a set of weights that can be applied to the various assets (stocks) in the portfolio. Our portfolio consists of FB, AMZN, NFLX, and GOOG. Passing the weights of 50%, 25%, 25%, and 0% blends and aggregates into one set of portfolio returns.
At this point, it’s nice to visualize using a wealth index, which shows the growth of the portfolio. The wealth index is actually an option in tq_portfolio
, but it can also be created by converting the portfolio returns using the cumprod()
function shown below.
We can even get some performance metrics using PerformanceAnalytics
functions. The table functions are the most useful since they calculate groups of portfolio attribution metrics. Eighteen different table functions are available. We’ll use the table.Stats
function, which returns a “tidy” set of 15 summary statistics on the stock returns including arithmetic mean, standard deviation, skewness, kurtosis, and more.
There’s also capability for performance attribution (comparing portfolio performance against a benchmark) and scaling analyses to multiple portfolios. For those interested in furthering the analysis, please visit the new vignette, Performance Analysis with tidyquant.
2: New User-Friendly Vignettes
Financial analysis can be overwhelming due to the depth and breadth of various topics. Add to it a new package with new functions and workflows, and the task can seem impossible. The good news is we understand.
We are actively taking steps to reduce the learning curve so you can get up to speed quickly. While the work is not done yet, we believe that the vignettes are a good place to start. The goal is to break down complex tasks without overloading the user with everything at once. There is now one main “introduction” that links to five topic-specific vignettes. Each topical vignette covers the basics behind the package including real-world examples so you can see how the package can be implemented. You can access the new vignettes here.
3: New ggplot2 Themes
tidyquant
ships with some new themes to assist with creating beautiful and meaningful financial charts: theme_tq()
and some extra fun ones including theme_tq_dark()
and theme_tq_green()
. To coordinate aesthetic colors and fills with the appropriate theme, we’ve added scale_color_tq(theme = "light")
. You can modify the theme
arg to get the colors to correspond with the different themes. In addition, we have palette_light()
, palette_dark()
and palette_green()
for those interested in using the color palette. Here’s a quick example.
For those interested in learning more about the tidyquant
charting capabilities, please visit the updated vignette, Charting with tidyquant.
4: “Compound Getters” in tq_get
Compound getters are a nice tool for those looking to get multiple data sets for one stock symbol. For example, one may want the “key.ratios” and the “key.stats”, which provides key fundamental and financial ratio data on both a historical and real-time basis, respectively. You can now pull this information in one call to tq_get
using a “compound getter”.
Let’s examine what’s in the “key.ratios” column using unnest()
.
Like peeling away layers we can see whats inside. Let’s do one more unnest
.
We can do the same thing with the “key.stats”. Set .drop = TRUE
to remove the “key.ratios” column.
The benefit to “compound getters” is that all your data is stored in one data frame. To access it, you can simply unnest
the list columns. Additionally, the “compound getters” can be scaled in the same way that a single get can be scaled: with a vector of stock symbols or a data frame of stock symbols with the symbols in the first column. See the next section for scaling using the new tq_index()
and tq_exchange()
functions.
5. tq_index and tq_exchange
We got some really good feedback from a certain someone at RStudio on combining two calls to tq_get()
in a row for retrieving an index of stock symbols (e.g. “SP500”) and then the scaling the retrieval of data for the stock symbols. The advice was really good because (1) it was ugly having two calls to tq_get()
in a row and (2) more importantly it got us thinking how we can improve scaling data collection. Here’s the significant change from “old way” to the “new way”.
The separation of a stock list from a call to retrieve the data for each of the stocks is fundamentally a good idea because now we can have more lists. For example, if you want to download stock prices for every stock covered on the NASDAQ exchange, you can use the new tq_exchange("NASDAQ")
to retrieve the stock list and then pipe (%>%
) the list to tq_get
.
Piping to tq_get
. (Warning: A word of caution that this could take 10-20 minutes to download the stock prices for all 3169 stock symbols.)
The combination of tq_index
and tq_exchange
now gives the user access to a wide range of stock lists. To get the full list of options, use tq_index_options()
and tq_exchange_options()
, respectively.
Conclusions
This is an exciting release for a few reasons. First, the PerformanceAnalytics
integration fills a big gap that now allows full financial analysis to be performed within the “tidyverse” (i.e. using data frames only). You can start a workflow with a symbol or set of symbols and through piping (%>%
) to tq_get
, tq_transmute
, and tq_performance
can end with performance metrics all in a few lines of code. Previously this was impossible.
Second, portfolio attribution and performance analysis is now possible in the “tidyverse”. This is very interesting because with the data science workflow discussed in R for Data Science the scale at which portfolios can be modeled and analyzed is limitless (refer to many models and the purrr
package).
Third, data science is a rapidly evolving field with new people joining the community by the second. With this influx we recognize it’s important to reduce the learning curve for “financial data scientists”, those looking to apply data science to finance. As a result, we are actively taking steps to reduce the learning curve. The first step of providing a set of improved vignettes is complete. We will continue to focus on this area in the future.
Recap
This post was meant to give users and potential users a flavor for the new additions to tidyquant
v0.4.0. We took a peek at the new PerformanceAnalytics
integration, which enables performance analysis and portfolio aggregation. We introduced the new vignettes, which are topical and are designed to get users up to speed quickly. We discussed several other important new features such as new ggplot2
themes, the new support for “compound getters” in tq_get
, and the new tq_index
and tq_exchange
functions for retrieving stock lists. There are a number of other changes not specifically addressed. Those interested can view the NEWS here.
Further Reading
-
Tidyquant Vignettes: This overview just scratches the surface of
tidyquant
. The vignettes explain much, much more! -
R for Data Science: A free book that thoroughly covers the “tidyverse”. A prerequisite for maximizing your abilities with
tidyquant
.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.