Site icon R-bloggers

Major update to BatchGetSymbols

[This article was first published on R on msperlin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I just released a long due update to package BatchGetSymbols. The files are under review in CRAN and you should get the update soon. Meanwhile, you can install the new version from Github:

if (!require(devtools)) install.packages('devtools')
devtools::install_github('msperlin/BatchGetSymbols')

The main innovations are:

In the next chunks of code I show some of the innovations:

library(BatchGetSymbols)
## Loading required package: rvest
## Loading required package: xml2
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
# download Ibovespa stocks
my.tickers <- GetSP500Stocks()$tickers[1:5] # lets keep it light

# set dates
first.date <- '2017-01-01'
last.date <- '2018-01-01'

# set folder for cache system
my.temp.cache.folder <- 'BGS_CACHE'

# get data and time it
time.nocache <- system.time({
my.l <- BatchGetSymbols(tickers = my.tickers, first.date, last.date, 
                        cache.folder = my.temp.cache.folder, do.cache = FALSE)
})
## 
## Running BatchGetSymbols for:
##    tickers = MMM, ABT, ABBV, ABMD, ACN
##    Downloading data for benchmark ticker
## MMM | yahoo (1|5) - OK!
## ABT | yahoo (2|5) - OK!
## ABBV | yahoo (3|5) - Boa!
## ABMD | yahoo (4|5) - Nice!
## ACN | yahoo (5|5) - Nice!
time.withcache <- system.time({
my.l <- BatchGetSymbols(tickers = my.tickers, first.date, last.date, 
                        cache.folder = my.temp.cache.folder, do.cache = TRUE)
})
## 
## Running BatchGetSymbols for:
##    tickers = MMM, ABT, ABBV, ABMD, ACN
##    Downloading data for benchmark ticker | Not Cached
## MMM | yahoo (1|5) | Not Cached - Youre doing good!
## ABT | yahoo (2|5) | Not Cached - OK!
## ABBV | yahoo (3|5) | Not Cached - You got it!
## ABMD | yahoo (4|5) | Not Cached - Good job!
## ACN | yahoo (5|5) | Not Cached - Well done!
cat('\nTime with no cache:', time.nocache['elapsed'])
## 
## Time with no cache: 5.146
cat('\nTime with cache:', time.withcache['elapsed'])
## 
## Time with cache: 1.693

Now let’s check the default output with data in the long format:

dplyr::glimpse(my.l)
## List of 2
##  $ df.control:'data.frame':  5 obs. of  6 variables:
##   ..$ ticker              : Factor w/ 5 levels "MMM","ABT","ABBV",..: 1 2 3 4 5
##   ..$ src                 : Factor w/ 1 level "yahoo": 1 1 1 1 1
##   ..$ download.status     : Factor w/ 1 level "OK": 1 1 1 1 1
##   ..$ total.obs           : int [1:5] 251 251 251 251 251
##   ..$ perc.benchmark.dates: num [1:5] 1 1 1 1 1
##   ..$ threshold.decision  : Factor w/ 1 level "KEEP": 1 1 1 1 1
##  $ df.tickers:'data.frame':  1255 obs. of  10 variables:
##   ..$ price.open         : num [1:1255] 179 178 178 177 178 ...
##   ..$ price.high         : num [1:1255] 180 179 179 179 178 ...
##   ..$ price.low          : num [1:1255] 177 178 177 176 177 ...
##   ..$ price.close        : num [1:1255] 178 178 178 178 177 ...
##   ..$ volume             : num [1:1255] 2509300 1542000 1447800 1625000 1617800 ...
##   ..$ price.adjusted     : num [1:1255] 171 171 170 171 170 ...
##   ..$ ref.date           : Date[1:1255], format: "2017-01-03" ...
##   ..$ ticker             : chr [1:1255] "MMM" "MMM" "MMM" "MMM" ...
##   ..$ ret.adjusted.prices: num [1:1255] NA 0.00152 -0.00342 0.00293 -0.00539 ...
##   ..$ ret.closing.prices : num [1:1255] NA 0.00152 -0.00342 0.00293 -0.00539 ...

And change the format of the long dataframe to wide:

l.wide <- reshape.wide(my.l$df.tickers) 

Now we check the matrix of prices:

print(head(l.wide$price.adjusted))
##     ref.date     ABBV   ABMD      ABT      ACN      MMM
## 1 2017-01-03 57.95297 112.36 37.47772 112.1350 170.6262
## 2 2017-01-04 58.77012 115.74 37.77525 112.4046 170.8849
## 3 2017-01-05 59.21585 114.81 38.10156 110.7197 170.3004
## 4 2017-01-06 59.23442 115.42 39.13807 111.9810 170.7987
## 5 2017-01-09 59.62442 117.11 39.09969 110.7293 169.8787
## 6 2017-01-10 59.49442 112.24 39.62754 110.7870 169.2175

and matrix of returns:

print(head(l.wide$ret.adjusted.prices))
##     ref.date          ABBV         ABMD           ABT           ACN
## 1 2017-01-03            NA           NA            NA            NA
## 2 2017-01-04  0.0141002957  0.030081853  0.0079387696  0.0024043154
## 3 2017-01-05  0.0075841938 -0.008035252  0.0086381959 -0.0149904655
## 4 2017-01-06  0.0003135985  0.005313126  0.0272039787  0.0113922416
## 5 2017-01-09  0.0065840607  0.014642203 -0.0009808097 -0.0111780039
## 6 2017-01-10 -0.0021803315 -0.041584860  0.0135001340  0.0005217229
##            MMM
## 1           NA
## 2  0.001516438
## 3 -0.003420717
## 4  0.002925953
## 5 -0.005386183
## 6 -0.003892418
To leave a comment for the author, please follow the link and comment on their blog: R on msperlin.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.