Making Of: A Free API For COVID-19 Data

Posted on April 1, 2020 by Sebastian Heinz in R bloggers | 0 Comments

[This article was first published on r-bloggers | STATWORX, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently, some colleagues and I attended the 2-day COVID-19 hackathon #wirvsvirus, organized by the German government. Thereby, we’ve developed a great application for simulating COVID-19 curves based on estimations of governmental measure effectiveness (FlatCurver). As there are many COVID-related dashboards and visualizations out there, I thought that gathering the underlying data from a single point of truth would be a minor issue. However, I soon realized that there are plenty of different data sources, mostly relying on the Johns Hopkins University COVID-19 case data. At first, I thought that’s great, but at a second glance, I revised my initial thought. The JHU datasets have some quirky issues to it that makes it a bit cumbersome to prepare and analyze it:

weird column names including special characters
countries and states „in the mix“
wide format, quite unhandy for data analysis
import problems due to line break issues
etc.

For all of you, who have been or are working with COVID-19 time series data and want to step up your data-pipeline game, let me tell you: we have an API for that! The API uses official data from the European Centre for Disease Prevention and Control and delivers a clear and concise data structure for further processing, analysis, etc.

Overview of our COVID-19 API

Our brand new COVID-19-API brings you the latest case number time series right into your application or analysis, regardless of your development environment. For example, you can easily import the data into Python using the requests package:

import requests
import json
import pandas as pd

# POST to API
payload = {'country': 'Germany'} # or {'code': 'DE'}
URL = 'https://api.statworx.com/covid'
response = requests.post(url=URL, data=json.dumps(payload))

# Convert to data frame
df = pd.DataFrame.from_dict(json.loads(response.text))

Or if you’re an R aficionado, use httr and jsonlite to grab the lastest data and turn it into a cool plot.

library(httr)
library(dplyr)
library(jsonlite)
library(ggplot2)

# Post to API
payload <- list(code = "ALL")
response <- httr::POST(url = "https://api.statworx.com/covid",
                       body = toJSON(payload, auto_unbox = TRUE), encode = "json")

# Convert to data frame
content <- rawToChar(response$content)
df <- data.frame(fromJSON(content))

# Make a cool plot
df %>%
  mutate(date = as.Date(date)) %>%
  filter(cases_cum > 100) %>%
  filter(code %in% c("US", "DE", "IT", "FR", "ES")) %>%
  group_by(code) %>%
  mutate(time = 1:n()) %>%
  ggplot(., aes(x = time, y = cases_cum, color = code)) +
  xlab("Days since 100 cases") + ylab("Cumulative cases") +
  geom_line() + theme_minimal()

Developing the API using Flask

Developing a simple web app using Python is straightforward using Flask. Flask is a web framework for Python. It allows you to create websites, web applications, etc. right from Python. Flask is widely used to develop web services and APIs. A simple Flask app looks something like this.

from flask import Flask
app = Flask(__name__)

@app.route('/')
def handle_request():
  """ This code gets executed """
  return 'Your first Flask app!'

In the example above, app.route decorator defines at which URL our function should be triggered. You can specify multiple decorators to trigger different functions for each URL. You might want to check out our code in the Github repository to see how we build the API using Flask.

Deployment using Google Cloud Run

Developing the API using Flask is straightforward. However, building the infrastructure and auxiliary services around it can be challenging, depending on your specific needs. A couple of things you have to consider when deploying an API:

Authentification
Security
Scalability
Latency
Logging
Connectivity

We’ve decided to use Google Cloud Run, a container-based serverless computing framework on Google Cloud. Basically, GCR is a fully managed Kubernetes service, that allows you to deploy scalable web services or other serverless functions based on your container. This is how our Dockerfile looks like.

# Use the official image as a parent image
FROM python:3.7

# Copy the file from your host to your current location
COPY ./main.py /app/main.py
COPY ./requirements.txt /app/requirements.txt

# Set the working directory
WORKDIR /app

# Run the command inside your image filesystem
RUN pip install -r requirements.txt

# Inform Docker that the container is listening on the specified port at runtime.
EXPOSE 80

# Run the specified command within the container.
CMD ["python", "main.py"]

You can develop your container locally and then push it in to the container registry of your GCP project. To do so, you have to tag your local image using docker tag according to the following scheme: [HOSTNAME]/[PROJECT-ID]/[IMAGE]. The hostname is one of the following: gcr.io, us.gcr.io, eu.gcr.io, asia.gcr.io. Afterward, you can push using gcloud push, followed by your image tag. From there, you can easily connect the container to the Google Cloud Run service:

When deploying the service, you can define parameters for scaling, etc. However, this is not in scope for this post. Furthermore, GCR allows custom domain mapping to functions. That’s why we have the neat API endpoint https://api.statworx.com/covid.

Conclusion

Building and deploying a web service is easier than ever. We hope that you find our new API useful for your projects and analyses regarding COVID-19. If you have any questions or remarks, feel free to contact us or to open an issue on Github. Lastly, if you make use of our free API, please add a link to our website, https://www.statworx.com to your project. Thanks in advance and stay healthy!

Über den Autor

Sebastian Heinz

I am the founder and CEO of STATWORX. I enjoy writing about machine learning and AI, especially about neural networks and deep learning. In my spare time, I love to cook, eat and drink as well as traveling the world.

ABOUT US

STATWORX
is a consulting company for data science, statistics, machine learning and artificial intelligence located in Frankfurt, Zurich and Vienna. Sign up for our NEWSLETTER and receive reads and treats from the world of data science and AI. If you have questions or suggestions, please write us an e-mail addressed to blog(at)statworx.com.

Der Beitrag Making Of: A Free API For COVID-19 Data erschien zuerst auf STATWORX.

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers | STATWORX.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Making Of: A Free API For COVID-19 Data

Overview of our COVID-19 API

Developing the API using Flask

Deployment using Google Cloud Run

Conclusion

Über den Autor

Sebastian Heinz

ABOUT US

Related

Overview of our COVID-19 API

Developing the API using Flask

Deployment using Google Cloud Run

Conclusion

Über den Autor

Sebastian Heinz

ABOUT US

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)