Site icon R-bloggers

Economic Calendar

[This article was first published on R - datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I needed an offline copy of an economic calendar with all of the major international economic events. After grubbing around the internet I found the Economic Calendar on Myfxbook which had everything that I needed.

Here’s a screenshot of that calendar.

This seemed like a good candidate for some simple web scraping. However, the big orange button at the bottom was an indication of some minor challenges afoot. If you wanted to get the full calendar then you’d need to press this button repeatedly to retrieve additional pages of data. And, for the purpose of web scraping, this would need to be automated.

In addition there are a couple of modal dialogs that pop up on the page when you first visit the site. I’d like to get those out of the way too.

Choice of Tools

I had to choose between either (1) diagnosing the API behind the site or (2) running a browser tool to automate interaction with the site.

After taking a look at the network requests going back and forth between my browser and the server I concluded that the second approach would be best. My preferred tools for this are either Selenium or Playwright. Recently I have been leaning towards the latter.

Implementation

The scraper script (implemented in Python) ultimately consisted of a few components:

  1. spin up Playwright and launch a Chromium instance;
  2. navigate to the calendar URL;
  3. deal with the various popups on initial page launch (probably not strictly necessary but good to emulate real user interaction);
  4. keep on smashing (with liberal pauses) the button until all data retrieved; and
  5. parse the resulting table, then dump to CSV.

After developing and testing I deployed this as a job that’s run daily via GitLab CI/CD.

Results

The resulting CSV files contains all of the columns shown in the screenshot above. Here, for example, I load the file into R and display the first 20 records.

calendar <- read.csv("calendar.csv") |>
  rename(iso = currency) |>
  mutate(date = strptime(date, "%Y-%m-%d %H:%M:%S")) |>
  select(-previous, -consensus, -actual)

head(calendar, n = 20)

                  date iso                                 event impact
1  2024-10-02 00:00:00 CNY              National Day Golden Week   None
2  2024-10-02 00:01:00 AUD   CoreLogic Dwelling Prices MoM (Sep)   None
3  2024-10-02 05:00:00 JPY             Consumer Confidence (Sep)   High
4  2024-10-02 07:00:00 EUR             Unemployment Change (Sep)   High
5  2024-10-02 07:00:00 EUR            Tourist Arrivals YoY (Aug)    Low
6  2024-10-02 07:15:00 EUR                    ECB Guindos Speech   High
7  2024-10-02 08:00:00 EUR               Unemployment Rate (Aug)   High
8  2024-10-02 09:00:00 EUR                Retail Sales YoY (Aug)    Low
9  2024-10-02 09:00:00 EUR               Unemployment Rate (Aug)    Low
10 2024-10-02 09:00:00 EUR               Unemployment Rate (Aug)   High
11 2024-10-02 09:00:00 GBP          5-Year Treasury Gilt Auction    Low
12 2024-10-02 09:30:00 EUR                  10-Year Bund Auction Medium
13 2024-10-02 09:30:00 EUR                       ECB Lane Speech    Low
14 2024-10-02 09:45:00 EUR                       ECB Buch Speech    Low
15 2024-10-02 10:00:00 EUR               Unemployment Rate (Sep)    Low
16 2024-10-02 10:10:00 EUR                  3-Month Bill Auction    Low
17 2024-10-02 10:10:00 EUR                  6-Month Bill Auction    Low
18 2024-10-02 10:30:00 EUR                  Budget Balance (Aug)    Low
19 2024-10-02 11:00:00 USD MBA Mortgage Refinance Index (Sep/27)    Low
20 2024-10-02 11:00:00 USD           MBA Purchase Index (Sep/27)    Low

In the interests of brevity I have omitted the previous, consensus, and actual columns, however, these are included in the CSV data. You can slice and dice these data as required. For example, here are the high impact events during the first two trading days of November 2024.

calendar |>
  filter(
    impact == "High",
    date >= "2024-11-01",
    date < "2024-11-05"
  ) 

                  date iso                              event impact
1  2024-11-01 01:45:00 CNY     Caixin Manufacturing PMI (Oct)   High
2  2024-11-01 08:00:00 EUR            Unemployment Rate (Oct)   High
3  2024-11-01 08:30:00 CHF procure.ch Manufacturing PMI (Oct)   High
4  2024-11-01 09:00:00 EUR S&P Global Manufacturing PMI (Oct)   High
5  2024-11-01 09:30:00 GBP S&P Global Manufacturing PMI (Oct)   High
6  2024-11-01 12:30:00 USD     Nonfarm Payrolls Private (Oct)   High
7  2024-11-01 12:30:00 USD              U-6 Unemployment Rate   High
8  2024-11-01 12:30:00 USD            Non Farm Payrolls (Oct)   High
9  2024-11-01 12:30:00 USD            Unemployment Rate (Oct)   High
10 2024-11-01 13:30:00 CAD S&P Global Manufacturing PMI (Oct)   High
11 2024-11-01 13:45:00 USD S&P Global Manufacturing PMI (Oct)   High
12 2024-11-01 14:00:00 USD        ISM Manufacturing PMI (Oct)   High
13 2024-11-04 08:15:00 EUR       HCOB Manufacturing PMI (Oct)   High
14 2024-11-04 08:45:00 EUR       HCOB Manufacturing PMI (Oct)   High
15 2024-11-04 08:50:00 EUR       HCOB Manufacturing PMI (Oct)   High
16 2024-11-04 08:55:00 EUR       HCOB Manufacturing PMI (Oct)   High
17 2024-11-04 09:00:00 EUR       HCOB Manufacturing PMI (Oct)   High
18 2024-11-04 22:00:00 AUD       Judo Bank Services PMI (Oct)   High

The CSV file with these data can be downloaded here and will be updated daily.

To leave a comment for the author, please follow the link and comment on their blog: R - datawookie.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version