tidyquant 0.5.0: select, rollapply, and Quandl
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We’ve got some good stuff cooking over at Business Science. Yesterday, we had the fifth official release (0.5.0) of tidyquant
to CRAN. The release includes some great new features. First, the Quandl integration is complete, which now enables getting Quandl data in “tidy” format. Second, we have a new mechanism to handle selecting which columns get sent to the mutation functions. The new argument name is… select
, and it provides increased flexibility which we show off in a rollapply
example. Finally, we have added several PerformanceAnalytics
functions that deal with modifying returns to the mutation functions. In this post, we’ll go over a few of the new features in version 5.
Quandl integration
We couldn’t wait to talk about the new Quandl integration for tidyquant
! Two weeks ago, we went ahead and released a Quandl blog post using the GitHub development version. However, it is worth mentioning again now that it is officially being released. Definitely go check out that post to learn more!
We would like to highlight one feature that that post didn’t cover: data tables. These are data sets from Quandl that aren’t in a time series format, but are more of a treasure trove of other information. Many of them are premium datasets, meaning they cost money, but a few are free or have free samples. One of them is the Fundamentals Condensed datatable from Zacks. According to Quandl,
“Updated daily, this database contains over 70 fundamental indicators, including income statement, balance sheet, cash flow line items and precalculated ratios, for over 17,000 US and Canadian Equities, including 9,000+ delisted stocks.”
Without a premium Quandl account, we won’t be able to access all of it, but Quandl still offers a sample of the data. Let’s see how to grab this. First, we will want to set the Quandl API key linked to our account so that we have more than 50 calls per day.
Now, we can use tq_get(get = "quandl.datatable")
with the ZACKS/FC
Quandl code. There’s a lot of data here, including mailing address, company officers, and numerous fundamentals stats!
m_ticker | ticker | comp_name | comp_name_2 | exchange | currency_code | per_end_date | per_type | per_code | per_fisc_year | per_fisc_qtr | per_cal_year | per_cal_qtr | data_type_ind | filing_type | qtr_nbr | zacks_sector_code | zacks_x_ind_code | zacks_metrics_ind_code | fye_month | comp_cik | per_len | sic_code | filing_date | last_changed_date | state_incorp_name | bus_address_line_1 | bus_city | bus_state_name | bus_post_code | bus_phone_nbr | bus_fax_nbr | mail_address_line_1 | mail_city | mail_state_name | mail_post_code | country_name | country_code | home_exchange_name | emp_cnt | emp_pt_cnt | emp_ft_cnt | emp_other_cnt | comm_share_holder | auditor | auditor_opinion | comp_url | email_addr | nbr_shares_out | shares_out_date | officer_name_1 | officer_title_1 | officer_name_2 | officer_title_2 | officer_name_3 | officer_title_3 | officer_name_4 | officer_title_4 | officer_name_5 | officer_title_5 | rpt_0_date | tot_revnu | cost_good_sold | gross_profit | tot_deprec_amort | int_exp_oper | int_invst_income_oper | res_dev_exp | in_proc_res_dev_exp_aggr | tot_sell_gen_admin_exp | rental_exp_ind_broker | pension_post_retire_exp | other_oper_income_exp | tot_oper_exp | oper_income | non_oper_int_exp | int_cap | asset_wdown_impair_aggr | restruct_charge | merger_acq_income_aggr | rental_income | spcl_unusual_charge | impair_goodwill | litig_aggr | gain_loss_sale_asset_aggr | gain_loss_sale_invst_aggr | stock_div_subsid | income_loss_equity_invst_other | pre_tax_minority_int | int_invst_income | other_non_oper_income_exp | tot_non_oper_income_exp | pre_tax_income | tot_provsn_income_tax | income_aft_tax | minority_int | equity_earn_subsid | invst_gain_loss_other | other_income | income_cont_oper | income_discont_oper | income_bef_exord_acct_change | exord_income_loss | cumul_eff_acct_change | consol_net_income_loss | non_ctl_int | net_income_parent_comp | pref_stock_div_other_adj | net_income_loss_share_holder | eps_basic_cont_oper | eps_basic_discont_oper | eps_basic_acct_change | eps_basic_extra | eps_basic_consol | eps_basic_parent_comp | basic_net_eps | eps_diluted_cont_oper | eps_diluted_discont_oper | eps_diluted_acct_change | eps_diluted_extra | eps_diluted_consol | eps_diluted_parent_comp | diluted_net_eps | dilution_factor | avg_d_shares | avg_b_shares | norm_pre_tax_income | norm_aft_tax_income | ebitda | ebit | rpt_1_date | cash_sterm_invst | note_loan_rcv | rcv_est_doubt | rcv_tot | invty | prepaid_expense | def_charge_curr | def_tax_asset_curr | asset_discont_oper_curr | other_curr_asset | tot_curr_asset | gross_prop_plant_equip | tot_accum_deprec | net_prop_plant_equip | net_real_estate_misc_prop | cap_software | lterm_invst | adv_dep | lterm_rcv | invty_lterm | goodwill_intang_asset_tot | def_charge_non_curr | def_tax_asset_lterm | asset_discont_oper_lterm | pension_post_retire_asset | other_lterm_asset | tot_lterm_asset | tot_asset | note_pay | acct_pay | div_pay | other_pay | accrued_exp | other_accrued_exp | curr_portion_debt | curr_portion_cap_lease | curr_portion_tax_pay | defer_revnu_curr | defer_tax_liab_curr | liab_discont_oper_curr | other_curr_liab | tot_curr_liab | tot_lterm_debt | defer_revnu_non_curr | pension_post_retire_liab | defer_tax_liab_lterm | mand_redeem_pref_sec_subsid | pref_stock_liab | min_int | liab_disc_oper_lterm | other_non_curr_liab | tot_lterm_liab | tot_liab | tot_pref_stock | comm_stock_net | addtl_paid_in_cap | retain_earn_accum_deficit | equity_equiv | treas_stock | compr_income | def_compsn | other_share_holder_equity | tot_comm_equity | tot_share_holder_equity | tot_liab_share_holder_equity | comm_shares_out | pref_stock_shares_out | tang_stock_holder_equity | rpt_2_date | net_income_loss | tot_deprec_amort_cash_flow | other_non_cash_item | tot_non_cash_item | change_acct_rcv | change_invty | change_acct_pay | change_acct_pay_accrued_liab | change_income_tax | change_asset_liab | tot_change_asset_liab | oper_activity_other | cash_flow_oper_activity | net_change_prop_plant_equip | net_change_intang_asset | net_acq_divst | net_change_sterm_invst | net_change_lterm_invst | net_change_invst_tot | invst_activity_other | cash_flow_invst_activity | net_lterm_debt | net_curr_debt | debt_issue_retire_net_tot | net_comm_equity_issued_repurch | net_pref_equity_issued_repurch | net_tot_equity_issued_repurch | tot_comm_pref_stock_div_paid | fin_activity_other | cash_flow_fin_activity | fgn_exchange_rate_adj | disc_oper_misc_cash_flow_adj | incr_decr_cash | beg_cash | end_cash | stock_based_compsn | comm_stock_div_paid | pref_stock_div_paid | tot_deprec_amort_qd | stock_based_compsn_qd | cash_flow_oper_activity_qd | net_change_prop_plant_equip_qd | comm_stock_div_paid_qd | pref_stock_div_paid_qd | tot_comm_pref_stock_div_qd | wavg_shares_out | wavg_shares_out_diluted | eps_basic_net | eps_diluted_net |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AAPL | AAPL | APPLE INC | Apple Inc. | NSDQ | USD | 2011-09-30 | A | NA | 2011 | 4 | 2011 | 3 | F | 10-K | 2011 | 10 | 199 | 9 | 9 | 0000320193 | 12 | 3571 | 2011-10-26 | 2011-10-26 | CA | ONE INFINITE LOOP | CUPERTINO | CA | 95014 | 408-996-1010 | 408-996-0275 | ONE INFINITE LOOP | CUPERTINO | CA | 95014 | UNITED STATES | X1 | NA | 63300 | 2900 | 60400 | NA | 28543 | Ernst & Young LLP | Fair | www.apple.com | [email protected] | 6505863000 | 2011-10-14 | Timothy D. Cook | Chief Executive Officer and Director | Peter Oppenheimer | Senior Vice President and Chief Financial Officer | Betsy Rafael | Vice President and Corporate Controller | William V. Campbell | Director | Millard S. Drexler | Director | 2013-10-30 | 108249 | 64431 | 43818 | NA | NA | NA | 2429 | NA | 7599 | NA | NA | NA | 74459 | 33790 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 415 | 415 | 34205 | 8283 | 25922 | NA | NA | NA | NA | 25922 | NA | 25922 | NA | NA | 25922 | NA | 25922 | NA | 25922 | 4.0066 | NA | NA | NA | 4.0066 | 4.0066 | 4.0071 | 3.9536 | NA | NA | NA | 3.9536 | 3.9536 | 3.9543 | 4.3340 | 6556.515 | 6469.806 | 34205 | 25922 | 35604 | 33790 | 2012-10-31 | 25952 | NA | NA | 11717 | 776 | NA | NA | 2014 | NA | 4529 | 44988 | 11768 | 3991 | 7777 | NA | NA | 55618 | NA | NA | NA | 4432 | NA | NA | NA | NA | 3556 | 71383 | 116371 | NA | 14632 | NA | NA | 9247 | NA | NA | NA | NA | 4091 | NA | NA | NA | 27970 | NA | 1686 | NA | NA | NA | NA | NA | NA | 10100 | 11786 | 39756 | NA | 13331 | NA | 62841 | NA | NA | 443 | NA | NA | 76615 | 76615 | 116371 | 6504.939 | NA | 72183 | 2013-10-30 | 25922 | 1814 | 4036 | 5850 | 143 | 275 | 2515 | NA | NA | 2824 | 5757 | NA | 37529 | -4260 | -3192 | -244 | -32464 | NA | -32464 | -259 | -40419 | NA | NA | NA | 831 | NA | 831 | NA | 613 | 1444 | NA | NA | -1446 | 11261 | 9815 | 1168 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | 6469.806 | 6556.515 | 4.0071 | 3.9543 |
AAPL | AAPL | APPLE INC | Apple Inc. | NSDQ | USD | 2012-09-30 | A | NA | 2012 | 4 | 2012 | 3 | F | 10-K | 2012 | 10 | 199 | 9 | 9 | 0000320193 | 12 | 3571 | 2012-10-31 | 2012-10-31 | CA | ONE INFINITE LOOP | CUPERTINO | CA | 95014 | 408-996-1010 | 408-996-0275 | ONE INFINITE LOOP | CUPERTINO | CA | 95014 | UNITED STATES | X1 | NA | 76100 | 3300 | 72800 | NA | 27696 | Ernst & Young LLP | Fair | www.apple.com | [email protected] | 6584844000 | 2012-10-19 | Timothy D. Cook | Chief Executive Officer and Director | Peter Oppenheimer | Senior Vice President and Chief Financial Officer | William V. Campbell | Director | Millard S. Drexler | Director | Al Gore | Director | 2014-10-27 | 156508 | 87846 | 68662 | NA | NA | NA | 3381 | NA | 10040 | NA | NA | NA | 101267 | 55241 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 522 | 522 | 55763 | 14030 | 41733 | NA | NA | NA | NA | 41733 | NA | 41733 | NA | NA | 41733 | NA | 41733 | NA | 41733 | 6.3776 | NA | NA | NA | 6.3776 | 6.3776 | 6.3771 | 6.3065 | NA | NA | NA | 6.3065 | 6.3065 | 6.3071 | 23.3164 | 6617.485 | 6543.726 | 55763 | 41733 | 58518 | 55241 | 2013-10-30 | 29129 | NA | NA | 18692 | 791 | NA | NA | 2583 | NA | 6458 | 57653 | 21887 | 6435 | 15452 | NA | NA | 92122 | NA | NA | NA | 5359 | NA | NA | NA | NA | 5478 | 118411 | 176064 | NA | 21175 | NA | NA | 11414 | NA | NA | NA | NA | 5953 | NA | NA | NA | 38542 | NA | 2648 | NA | NA | NA | NA | NA | NA | 16664 | 19312 | 57854 | NA | 16422 | NA | 101289 | NA | NA | 499 | NA | NA | 118210 | 118210 | 176064 | 6574.456 | NA | 112851 | 2014-10-27 | 41733 | 3277 | 6145 | 9422 | -5551 | -15 | 4467 | NA | NA | 800 | -299 | NA | 50856 | -8295 | -1107 | -350 | -38427 | NA | -38427 | -48 | -48227 | NA | NA | NA | 665 | NA | 665 | -2488 | 125 | -1698 | NA | NA | 931 | 9815 | 10746 | 1740 | -2488 | NA | NA | NA | NA | NA | NA | NA | NA | 6543.726 | 6617.483 | 6.3800 | 6.3100 |
AAPL | AAPL | APPLE INC | Apple Inc. | NSDQ | USD | 2013-09-30 | A | NA | 2013 | 4 | 2013 | 3 | F | 10-K | 2013 | 10 | 199 | 9 | 9 | 0000320193 | 12 | 3571 | 2013-10-30 | 2013-10-29 | CA | ONE INFINITE LOOP | CUPERTINO | CA | 95014 | 408-996-1010 | 408-974-2483 | ONE INFINITE LOOP | CUPERTINO | CA | 95014 | UNITED STATES | X1 | NA | 84400 | 4100 | 80300 | NA | 24710 | Ernst & Young LLP | Fair | www.apple.com | [email protected] | 6298166000 | 2013-10-18 | Timothy D. Cook | Chief Executive Officer and Director | Peter Oppenheimer | Senior Vice President and Chief Financial Officer | Luca Maestri | Vice President and Corporate Controller | William V. Campbell | Director | Millard S. Drexler | Director | 2015-10-28 | 170910 | 106606 | 64304 | NA | NA | NA | 4475 | NA | 10830 | NA | NA | NA | 121911 | 48999 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1156 | 1156 | 50155 | 13118 | 37037 | NA | NA | NA | NA | 37037 | NA | 37037 | NA | NA | 37037 | NA | 37037 | NA | 37037 | 5.7180 | NA | NA | NA | 5.7180 | 5.7180 | 5.7186 | 5.6791 | NA | NA | NA | 5.6791 | 5.6791 | 5.6786 | 5.8789 | 6521.634 | 6477.317 | 50155 | 37037 | 55756 | 48999 | 2014-10-27 | 40546 | NA | NA | 20641 | 1764 | NA | NA | 3453 | NA | 6882 | 73286 | 28519 | 11922 | 16597 | NA | NA | 106215 | NA | NA | NA | 5756 | NA | NA | NA | NA | 5146 | 133714 | 207000 | NA | 22367 | NA | NA | 13856 | NA | NA | NA | NA | 7435 | NA | NA | NA | 43658 | 16960 | 2625 | NA | NA | NA | NA | NA | NA | 20208 | 39793 | 83451 | NA | 19764 | NA | 104256 | NA | NA | -471 | NA | NA | 123549 | 123549 | 207000 | 6294.494 | NA | 117793 | 2015-10-28 | 37037 | 6757 | 3394 | 10151 | -2172 | -973 | 2340 | NA | NA | 7283 | 6478 | NA | 53666 | -8165 | -911 | -496 | -24042 | NA | -24042 | -160 | -33774 | 16896 | NA | 16896 | -22330 | NA | -22330 | -10564 | -381 | -16379 | NA | NA | 3513 | 10746 | 14259 | 2253 | -10564 | NA | NA | NA | NA | NA | NA | NA | NA | 6477.320 | 6521.634 | 5.7200 | 5.6800 |
Quandl is a very extensive data source. Finding the right series can sometimes be a challenge, but their documentation and search function has greatly improved from a few years ago. In addition, with quandl_search()
users can even find data sets from within the R console. Let’s search for the WTI crude prices within the FRED database.
id | dataset_code | database_code | name | refreshed_at | newest_available_date | oldest_available_date | column_names | frequency | type | premium | database_id |
---|---|---|---|---|---|---|---|---|---|---|---|
34399442 | WTISPLC | FRED | Spot Crude Oil Price: West Texas Intermediate (WTI) | 2017-03-18T04:39:31.083Z | 2017-02-01 | 1946-01-01 | DATE, VALUE | monthly | Time Series | FALSE | 118 |
120621 | ACOILWTICO | FRED | Crude Oil Prices: West Texas Intermediate (WTI) – Cushing, Oklahoma | 2016-12-11T19:35:20.310Z | 2015-01-01 | 1986-01-01 | DATE, VALUE | annual | Time Series | FALSE | 118 |
120619 | MCOILWTICO | FRED | Crude Oil Prices: West Texas Intermediate (WTI) – Cushing, Oklahoma | 2017-03-17T15:46:22.490Z | 2017-02-01 | 1986-01-01 | DATE, VALUE | monthly | Time Series | FALSE | 118 |
select
ing columns: A flexible approach
Previously, when selecting columns with tq_mutate()
and tq_transmute()
, we would use the ohlc_fun
argument to select the columns to mutate on. ohlc_fun
leveraged the numerous functions from quantmod
that extract specific columns from OHLC (Open High Low Close) price data such as Ad()
(adjusted) or Cl()
(close). While this was great for OHLC data, it was not ideal for datasets not in the OHLC form. It also kept users from performing rolling regressions using custom functions with rollapply()
. To fix this, we’ve come up with a new approach that replaces ohlc_fun
through a new argument, select
. This new approach uses dplyr
for selecting data frame columns instead of relying on quantmod
, and as a result it has a “tidyverse” feel. While this seems like a small change, in the following examples you will see just how much more flexibility this brings.
select
VS ohlc_fun
Here’s a quick example demonstrating the difference in the approaches. Let’s first grab the stock price data for IBM.
What if you wanted to calculate the daily returns of the adjusted stock price? Previously, you might have done something like this…
date | open | high | low | close | volume | adjusted | daily.returns |
---|---|---|---|---|---|---|---|
2007-01-03 | 97.18 | 98.40 | 96.26 | 97.27 | 9196800 | 77.73997 | 0.0000000 |
2007-01-04 | 97.25 | 98.79 | 96.88 | 98.31 | 10524500 | 78.57116 | 0.0106919 |
2007-01-05 | 97.60 | 97.95 | 96.91 | 97.42 | 7221300 | 77.85985 | -0.0090530 |
2007-01-08 | 98.50 | 99.50 | 98.35 | 98.90 | 10340000 | 79.04270 | 0.0151920 |
2007-01-09 | 99.08 | 100.33 | 99.07 | 100.07 | 11108200 | 79.97778 | 0.0118301 |
At this point, ohlc_fun
will still work, but you’ll notice that a warning appears notifying you that we’ve deprecated ohlc_fun
. In version 6, we will remove the argument completely. Let’s see the new approach with select
.
date | open | high | low | close | volume | adjusted | daily.returns |
---|---|---|---|---|---|---|---|
2007-01-03 | 97.18 | 98.40 | 96.26 | 97.27 | 9196800 | 77.73997 | 0.0000000 |
2007-01-04 | 97.25 | 98.79 | 96.88 | 98.31 | 10524500 | 78.57116 | 0.0106919 |
2007-01-05 | 97.60 | 97.95 | 96.91 | 97.42 | 7221300 | 77.85985 | -0.0090530 |
2007-01-08 | 98.50 | 99.50 | 98.35 | 98.90 | 10340000 | 79.04270 | 0.0151920 |
2007-01-09 | 99.08 | 100.33 | 99.07 | 100.07 | 11108200 | 79.97778 | 0.0118301 |
As you can see, the new approach is similar to dplyr::mutate
in that you simply specify the name of the columns that get mutated. You can even select multiple columns with the dplyr shorthand. Here we succinctly select the columns open, high, low, and close and change to a weekly periodicity. Since we change periodicity, we have to use tq_transmute()
to drop the other columns.
date | open | high | low | close |
---|---|---|---|---|
2007-01-05 | 97.60 | 97.95 | 96.91 | 97.42 |
2007-01-12 | 98.99 | 99.69 | 98.50 | 99.34 |
2007-01-19 | 95.00 | 96.85 | 94.55 | 96.17 |
2007-01-26 | 97.52 | 97.83 | 96.84 | 97.45 |
2007-02-02 | 99.10 | 99.73 | 98.88 | 99.17 |
One final note. If you want to mutate the entire dataset, you can leave off the select
argument so that it keeps its default value of NULL
which passes the entire data frame to the mutate function.
Working with non-OHLC data
The select
argument is more than a surface level change. The change from the OHLC functions allows us to work with non-OHLC data straight from tq_mutate()
. This used to be a little awkward. Since the columns of non-OHLC data could not be selected with a quantmod
function, we had to use tq_mutate_xy()
to find them. The intended use of the *_xy()
functions is for working with functions that require mutating both x and y in parallel, so this definitely wasn’t best practice. Check out this old example of working with some crude oil prices from FRED, and applying a 1 period lag.
tq_mutate_xy()
was the only thing that worked in the old version, but it just felt wrong!
Enter, select
. This works much better!
Getting things rolling with rollapply
Something that we are very excited about with this new version is the ability to perform custom rolling calculations with rollapply
. This is a byproduct of the new select
argument, and has some pretty cool use cases. In the example below, we will walk through a rolling CAPM analysis of Apple stock.
CAPM stands for Capital Asset Pricing Model, and is a way to determine the theoretically appropriate expected stock return, based on exposure to market risk. Essentially, the CAPM model performs a linear regression with Apple’s excess return as the dependent variable, and the “market’s” excess return as the independent variable. We will pull from the Fama French website for the risk free rate and the market return. Here is the regression formula for CAPM:
Where r_f is the risk free return, and epsilon is the normally distributed error term. The key terms of interest are alpha and beta, the coefficients estimated from the regression. Alpha is the excess return of a stock relative to a market index, and beta is a measure of the volatility of a stock in relation to the market. It would be interesting to evaluate the stability of the model, and one way of doing this is by performing a rolling regression to see how alpha and beta change throughout time. First, we will need two datasets: Fama French factors and Apple stock returns.
Fama French factors
Quandl actually has a cleaned up series for the Fama French factors going back to 1926! As of writing this, the most recent data point in their weekly series is 02/24/2017, so for reproducibility purposes we will stop the series there.
We only need two columns for the CAPM analysis, mkt.rf
and rf
, so let’s select them.
Apple weekly stock returns
This one is easy, we will use tq_get()
to pull the stock prices, and then use tq_transmute()
for the returns. For this analysis, we will pull 15 years of prices.
Now, let’s get the weekly returns. We have to use tq_transmute()
because the periodicity changes, and we will use the periodReturn()
function from quantmod
as our mutate_fun
. Lastly, we change the return to a percentage because that is the form that the Fama French factors are in.
Merge the two datasets
Alright, now that we have the two data sets, let’s join them together. We want to keep all of apple_return
, and just match the fama_french
data to it. A left_join
will serve that purpose.
Remembering that the left side of the CAPM formula is apple return minus the risk free rate, we calculate that as well.
Rolling CAPM
We are almost ready for the rolling CAPM. The last thing we need is a function that is applied at each iteration of our roll, which here will be a regression. We are really only interested in the coefficients from the regression, so that is what we will return. Note that the incoming data_xts
will be an xts
object, so we need to convert to a data.frame
in the lm()
function. The output should be a numeric vector. tq_mutate
will take care of adding the appropriate columns provided the custom function (in our case regr_fun()
) is handled in this manner.
Now let’s work some magic. We will leave select
out of this call to tq_mutate()
since we can just pass the entire dataset into rollapply
. To have enough data for a good analysis, we will set width
equal to 260 for a 5 year rolling window of data per call. The final argument, by.column
, is especially important so that rollapply
doesn’t try and apply our regression to each column individually.
We had to scroll down to row 260 before we find any values for alpha and beta, but it’s more useful to create a plot. Looking at the rolling coefficient for alpha, it seems to be steadily trending downward over time.
The rolling coefficient for beta is just as interesting, with a huge drop around 2008 likely having to deal with the financial crisis, but consistently over or near 1.
Hopefully you can see the power that tidyquant
gives to the user by allowing them to perform these rolling calculations with data frames!
As an aside, there are some improvements to this model that could be of interest to some users:
-
This says nothing to how well the CAPM model fits Apple’s return data. It only monitors the stability of the coefficients.
-
The
mkt.rf
data pulled from the Fama-French data set was only one part of a more detailed extension of CAPM, the Fama French 3 Factor Model. It would be interesting the stability of this model over time as well.
PerformanceAnalytics Mutation Functions
We have added a few new functions from the PerformanceAnalytics
package to tq_mutate()
and tq_transmute()
, which can be reviewed using tq_mutate_fun_options()$PerformanceAnalytics
. These differ from the PerformanceAnalytics
functions available for use with tq_performance()
(refer to tq_performance_fun_options()
) in that the deal with transforming the returns data rather than calculating performance metrics.
We can use the functions to clean and format returns for the FANG
data set. The example below uses three progressive applications of tq_transmute()
to apply various quant functions to the grouped stock prices from the FANG data set. First, we calculate daily returns using periodReturn
from the quantmod
package. Next, we use Return.clean
to clean outliers from the return data. The alpha
parameter is the percentage of outliers to be cleaned. Finally, Return.excess
is used to calculate the excess returns using a risk-free rate of 3% (divided by 252 for 252 trade days in one year). Finally, we calculate general statistics on the returns by passing to tq_performance(performance_fun = table.Stats)
.
symbol | ArithmeticMean | GeometricMean | Kurtosis | LCLMean(0.95) | Maximum | Median | Minimum | NAs | Observations | Quartile1 | Quartile3 | SEMean | Skewness | Stdev | UCLMean(0.95) | Variance |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FB | 0.0015 | 0.0013 | 35.2725 | 2e-04 | 0.2960 | 1e-03 | -0.0695 | 0 | 1008 | -0.0098 | 0.0119 | 7e-04 | 2.9098 | 0.0220 | 0.0029 | 0.0005 |
AMZN | 0.0011 | 0.0009 | 9.6476 | -1e-04 | 0.1412 | 6e-04 | -0.1101 | 0 | 1008 | -0.0082 | 0.0111 | 6e-04 | 0.4804 | 0.0194 | 0.0023 | 0.0004 |
NFLX | 0.0026 | 0.0021 | 35.9090 | 6e-04 | 0.4221 | -1e-04 | -0.1938 | 0 | 1008 | -0.0119 | 0.0149 | 1e-03 | 3.0745 | 0.0324 | 0.0046 | 0.0011 |
GOOG | 0.0007 | 0.0006 | 22.1306 | -2e-04 | 0.1604 | 0e+00 | -0.0533 | 0 | 1008 | -0.0067 | 0.0078 | 5e-04 | 2.2165 | 0.0148 | 0.0017 | 0.0002 |
Getting, cleaning and calculating excess returns is that easy with the additional PerformanceAnalytics
mutation functions.
Further Reading
That’s all for now. We hope you enjoy the new release, and be on the lookout for more to come!
-
Tidyquant Vignettes: The Core Functions in tidyquant vignette has a nice section on the features related to the Quandl integration! The R Quantitative Analysis Package Integrations in tidyquant vignette includes another example of working with
rollapply
. -
R for Data Science: A free book that thoroughly covers the “tidyverse”. A prerequisite for maximizing your abilities with
tidyquant
.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.