When Yahoo Finance doesn’t have de-listed tickers needed

[This article was first published on R on Redwall Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Click to see R set-up code

# Libraries
if(!require("pacman")) {
  install.packages("pacman")
}
pacman::p_load(
  data.table
  )

# Set knitr params
knitr::opts_chunk$set(
  comment = NA,
  fig.width = 12,
  fig.height = 8,
  out.width = '100%'
)

Introduction

As we discussed in our last post Introducing the Redwall ‘Red Flag’ Explorer with New Constructs Data, we were able to test the response of 125,000 quarterly and annual financial statements to incidence of “red flag” ratios, but some of the most interesting results may have been hidden in de-listed tickers, often not available to the default R {quantmod} pricing sources (Yahoo Finance and Alpha Vantage). As a result, 15-25% of financial statements in New Constructs the first five years of data, could not be matched to return history, exposing our analysis to “survivor bias”. Redwall’s mission is multifaceted: to conduct research on topic of interest in available financial and government data, but also to explore and share how to best solve problems encountered while using R.

We would love to see market sponsors make weekly or monthly prices available to the open source community, as has become almost expected in so many other areas where open data is used. Even though every closing price was once publicly disclosed in the newspaper, that doesn’t mean that those prices are available for analysis, and it takes a lot of work to collect them. One of the only options suggested online, was to go to the local library and look through the stacks, but this was not feasible for a personal project involving over 900 securities. It turns out that there are several providers of financial statement data for a price. Since many other R users will likely face this same problem, one objective of this post will be to lay out the options for others looking for older and de-listed securities.

Financial Data API’s for De-Listed Securities

Just to make clear, this is not the first article about what to do when Yahoo Finance fails. In our case, we looked at four providers, the first being Sharadar, as above mentioned. Sharadar had a substantial listing of 16,512 tickers, going back to 1986 and including ADRs, Canadian and US companies. We discovered that we could match the large majority of our still-missing tickers, but the cost for the level of access allowing the full history needed would be a full year’s access at their recently reduced price of $299 (through the Quandl API). We also looked at Algoseek, but their data only went back to January 2007, so that wouldn’t solve the problem. Norgate offered access to de-listed securities back to 1990 for $346 under their 6-month “Platinum” plan option. Norgate also offered a free 3-week trial, but that only included 2 years of data, which would involve time to figure out the API and wouldn’t solve the problem, unless we went for the subscription. Lastly, there was EOD Historical Data, which offered $20 per month for world prices going back 30+ years.

Finding Matching Prices

Sharadar’s comprehensive spreadsheet, offering certainty that needed prices would actually be available without having to download through the API (a helpful feature while deciding how to proceed). Given that this is only a personal project, we thought we would try EOD first to see if we could access the data we needed for only $20, though this forced us to invest time to get the API working from R without knowing if what we were looking for would even be there.

EOD gave a code example for accessing the API, but we struggled at first to use it with the suggested default .csv format. The error messages were confusing, and sometimes unclear if the data we needed was not available (404) or if there was a problem with the API preventing the download. Customer support was very responsive, considering they are not charging a lot for it, and after some wheel spinning, helped us to understand that using JSON would work better. By adding “&fmt=json” to the set-up string and parsing with {jsonlite} as in the code example below worked perfectly, and we were able to collect over 800 of our 930 missing tickers.

As we were doing this, we also discovered other problems in our Yahoo Finance prices, so decided to collect the tickers we previously thought we would use from Yahoo. The All-World package allows up to 100,000 calls per day, which was more than enough, and we would be able to expand beyond the US to 60 overseas exchanges. EOD was felt quite a bit faster than the traditional {quantmod}, even when we used {BatchGetSymbols} for collecting a lot of tickers. Unlike with Alpha Vantage, these were adjusted prices. All in all, EOD seemed to be perfect for our use-case.

Click to see R code sample API call

# Tickers and API token
api.token <- "OeAFFmMliFG5orCUuwAKQ8l4WWFQ67YX"
tickers <- c("AAPL.US")
ticker_list <- list()

# Loop to collect prices
for ( ticker in tickers ) {
  
  # Set up API string
  ticker.link <-
    paste(
      "http://nonsecure.eodhistoricaldata.com/api/eod/",
      ticker,
      "?api_token=",
      api.token,
      "&period=w&order=d&fmt=json&from=1997-01-01",
      sep = ""
    )
  
  # Call to API
  ticker_prices <- try(jsonlite::fromJSON(ticker.link))
  
  # Wait to avoid overloading API
  Sys.sleep(2)
  
  # Append new ticker data to list
  ticker_list <-
    append(ticker_list, list(ticker_prices))
  
}

# Name list by ticker
names(ticker_list) <- tickers
ticker_list$AAPL.US[1:10,]
         date    open     high    low  close adjusted_close    volume
1  2021-08-23 148.310 150.8600 147.80 148.36       148.3600 166528933
2  2021-08-16 148.535 151.6800 144.50 148.19       148.1900 429022231
3  2021-08-09 146.200 149.4444 145.30 149.10       149.1000 299579344
4  2021-08-02 146.360 148.0450 145.18 146.14       146.1400 284559336
5  2021-07-26 148.270 149.8300 142.54 145.86       145.6418 423324004
6  2021-07-19 143.750 148.7177 141.67 148.56       148.3378 441563672
7  2021-07-12 146.210 150.0000 143.63 146.39       146.1710 504249353
8  2021-07-06 140.070 145.6500 140.07 145.11       144.8929 418559704
9  2021-06-28 133.410 140.0000 133.35 139.96       139.7506 321360121
10 2021-06-21 130.300 134.6400 129.21 133.11       132.9109 354155886

Some thoughts on the collected data

We collected pricing data for an additional 400 tickers than with our original sources, so we are still missing price histories for about 500. Of these, only about 200 had 10 or more filings, so many of these were short-lived listings, and possibly not as relevant for our analysis. We also learned that in cases where the ticker had been used more than once, EOD price history would generally have the most recent, but not for the previous entity. For example, “DELL” went private in 2014 and was re-listed in 2016, so only data for the latter entity was available with EOD. In these cases, we were often able to use the Yahoo data.

In addition, we learned that Yahoo sometimes gave surprising results. In the case if ticker “EDO” below (not to be mistaken with EOD who have provided the data), the reliability of the Yahoo data on the same dates during the early period, is shown to be very bad. Here we are showing for prices when there should be none, very small and unchanging prices in some periods, and then jumping all over the place when the history should have ended.

Click to see R function used to generate output

get_ticker <- function(ticker) {
  # Load prices for ticker from EOD and Yahoo
  path <- "~/Desktop/David/Projects/new_constructs_targets/data/"
  eod <-
    fread(cmd = paste0("grep ", ticker, " ", path, "eod_weekly_prices.csv"))
  setnames(eod, c("ticker", "date", "adjusted.close"))
  yahoo <-
    fread(cmd = paste0("grep ", ticker, " ", path, "historical_prices/nc_complete_prices2.csv"))
  yahoo <- yahoo[, c(1:2, 8)]
  setnames(yahoo, c("ticker", "date", "adjusted.close"))
  
  # Rbind, order by date and dcast for comparison
  prices <- list(eod, yahoo)
  names(prices) <- c("eod", "yahoo")
  prices <-
    rbindlist(prices, idcol = "source")[order(date)]
  
  # Return
  prices[
    , dcast(
      .SD, 
      date ~ source,
      fun = identity,
      fill = NA_real_)]
}
get_ticker("EDO")
            date   eod yahoo
   1: 1997-12-31 8.688    NA
   2: 1998-01-02    NA  0.51
   3: 1998-01-05 8.438  0.51
   4: 1998-01-12 8.625  0.51
   5: 1998-01-20 8.500  0.51
  ---                       
1217: 2021-04-12    NA  1.82
1218: 2021-04-19    NA  1.82
1219: 2021-04-26    NA  1.82
1220: 2021-05-03    NA  1.82
1221: 2021-05-10    NA  1.82

Here is another example of SKP illustrating the potential problems, considering our desire to find accurate return data after a given date. These price sequences would give a lot of cases of zero returns, and others with very positive or negative returns.

get_ticker("SKP")
            date eod yahoo
   1: 1998-01-02  NA 20018
   2: 1998-01-05  NA 20018
   3: 1998-01-12  NA 20018
   4: 1998-01-20  NA 20018
   5: 1998-01-26  NA 20018
  ---                     
1216: 2021-04-12  NA 52380
1217: 2021-04-19  NA 52380
1218: 2021-04-26  NA 52380
1219: 2021-05-03  NA 52380
1220: 2021-05-10  NA 52380

For this reason, we favored EOD prices, followed by Alpha Vantage when not available, and finally Yahoo Finance as the last resort, which meant the large majority were coming from EOD as shown in the table below.

path <- "~/Desktop/David/Projects/redwall-analytics/content/post/2021-08-19-when-yahoo-finance-doesn-t-have-de-listed-tickers-needed/"
source(paste0(path, "prices_source_table.R"))
prices_source_table()
   source       N
1:    eod 3680589
2:  yahoo  267814
3:     av  406068

Now, we find that we are missing about 200 tickers out of the 5,500 we set out to match. Many of these were ending in “Q” (so were already on the pink sheets), are foreign or may have bad tickers. In any case, we have probably covered the bulk of the meaningful companies with our analysis.

  [1] "AABA"       "AAMRQ"      "AANI"       "ABCWQ"      "ABII"      
  [6] "ABKFQ"      "ABLSQ"      "ACHI"       "ACME_33987" "ACPIQ"     
 [11] "ACTT"       "AEMI"       "AFR"        "AGCCQ"      "AHMMQ"     
 [16] "AKRXQ"      "ALTV"       "AMOA"       "ANCCQ"      "ANLD"      
 [21] "ANRZQ"      "ANSVQ"      "ANVGQ"      "APAC"       "APXSQ"     
 [26] "ARDI"       "ARMP_1725"  "ASCL"       "ASYTQ"      "ATNY"      
 [31] "ATPAQ"      "AWA"        "BBAO"       "BCKDY"      "BFCF"      
 [36] "BHEL.BO"    "BKUNQ"      "BLGM"       "BOW"        "BRLC"      
 [41] "BSBN"       "CAMD"       "CBCG"       "CBSS"       "CCOWQ"     
 [46] "CDWC"       "CEMJQ"      "CHMT"       "CHZS_8434"  "CLK"       
 [51] "CNVX"       "COHM"       "CPGVY"      "CRGIY"      "CRNM"      
 [56] "CSKEQ"      "CTCI"       "CTGI"       "CTRA"       "CUNO"      
 [61] "CZZ"        "DCGNQ"      "DDIC"       "DEU.F"      "DHI.KS"    
 [66] "DIVX"       "DOLE"       "DRTE"       "EBHIQ"      "EGLSQ"     
 [71] "EKDKQ"      "EMRG"       "ENMC"       "EPEXQ"      "ERPLQ"     
 [76] "EZEM"       "FBNIQ"      "FCE.A"      "FCHDQ"      "FCSE"      
 [81] "FFKY"       "FILE"       "FNBP"       "FSNM"       "FWMHQ"     
 [86] "GCORE"      "GDPMQ"      "GDYS"       "GISX"       "GNKOQ"     
 [91] "GVHR"       "HECL.F"     "HEVV"       "HOFF"       "HRVE"      
 [96] "HSTN"       "HYDP"       "IBCPD"      "IION"       "IMNR"      
[101] "INFS"       "ISOOE"      "ITWO"       "JHTXQ"      "JRCO"      
[106] "JWLR"       "KNGGY"      "KVPHQ"      "LAWE"       "LCAV"      
[111] "LEHMQ"      "LFGRQ"      "LGFTY"      "LINEQ"      "LNX"       
[116] "LPHIQ"      "LPS"        "LQI"        "MCEL"       "MDRIQ_5781"
[121] "MECAQ"      "MHRCQ"      "MILLQ"      "MKTS"       "MODT"      
[126] "MSNW"       "MSSN"       "MTLQQ"      "MUSE"       "MYG"       
[131] "NCOC"       "NHR"        "NMGA"       "NNDS"       "NRVHQ"     
[136] "NUI"        "NVTP"       "NWACQ"      "NWEC"       "OLAB"      
[141] "OO"         "OSCIQ"      "PENX"       "PILLQ"      "PMRY"      
[146] "PRFS"       "PRLI"       "PRTLQ"      "PRXZ"       "QDHC"      
[151] "QRCP"       "RCOCQ"      "RCRC"       "REVUQ"      "RGAA"      
[156] "RGFC"       "RITA"       "RSTO"       "SBIT"       "SBLKE"     
[161] "SEN.ETR"    "SEPR"       "SGGHU"      "SHRPQ"      "SHZ"       
[166] "SKRRF"      "SMF"        "SPCBQ"      "SPNVD"      "SQAA"      
[171] "SRNAE"      "SUZ.DEU"    "SVNT"       "SZ"         "TARRQ"     
[176] "TCPTF"      "THQIQ"      "TLCR"       "TMG"        "TNSIQ"     
[181] "TOBC"       "TORCQ"      "TRMS"       "TWLL"       "TWTRQ"     
[186] "UNISQ"      "UPFC"       "USFC"       "USPI"       "UTA"       
[191] "UVSLQ"      "VARI"       "VIONQ"      "VLCY"       "VRAIE"     
[196] "VRLKQ"      "VRST"       "VSE"        "VTAI"       "VTRO"      
[201] "WAVXQ"      "WC"         "WFTIQ_8616" "WGATQ"      "WG.L"      
[206] "WGRD"       "WLS"        "WMANQ"      "WNNB"       "WPSL"      
[211] "WWCA"       "XYBRQ"      "YAKC"      

Conclusion

In the end, our return data is still likely not perfect, but is pretty comprehensive, and certainly the data visualization reflected in our Red Flag App should be close to an unbiased reflection. Most of the four million stock returns we collected did not cost an unreasonable amount (as long as not used for commercial purposes). In our next post, we will explore the life and death of companies since 1986 using some of the data we collected in this project.

To leave a comment for the author, please follow the link and comment on their blog: R on Redwall Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)