Site icon R-bloggers

R and GPX – How to Read and Visualize GPX Files in R

[This article was first published on Tag: r - Appsilon | Enterprise R Shiny Dashboards, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Geospatial data is everywhere around us, and it’s essential for data professionals to know how to work with it. One common way to store this type of data is in GPX files. Today you’ll learn everything about it, from theory and common questions to R and GPX file parsing.

We’ll start simple – with just a bit of theory and commonly asked questions. This is needed to get a deeper understanding of how storing geospatial data works. If you’re already familiar with the topic, feel free to skip the first section.

New to geomapping in R? Follow this guide to make stunning geomaps in R with Leaflet.

Table of contents:


Introduction to R and GPX

Online route mapping services such as Strava and Komoot store the routes in GPX file format. It’s an easy and convenient way to analyze, visualize, and display different types of geospatial data, such as geolocation (latitude, longitude), elevation, and many more.

For example, take a look at the following image. It represents a Strava cycling route in Croatia I plan to embark on later this summer. It’s the highest paved road in the country, and I expect the views to be breathtaking:

Why is this relevant? Because Strava allows you to export any route or workout in GPX file format. But what is GPX anyway?

What is a GPX file?

Put simply, GPX stands for GPS eXchange Format, and it’s nothing but a simple text file with geographical information, such as latitude, longitude, elevation, time, and so on. If you plot these points on a map, you’ll know exactly where you need to go, and what sort of terrain you might expect, at least according to the elevation.

The Strava route we’ll analyze today is just a plain route and has 1855 latitude, longitude, and elevation data points. If I was to complete this route and export the file from workouts, it would also include timestamps.

These data points are ridiculously easy to load into R. You don’t need a dedicated package to combine R and GPX – all is done with an XML parser. More on that in a bit.

What is the difference between GPS and GPX?

This is a common question beginners have. GPS stands for Global Positioning System which provides users with positioning, navigation, and timing services. GPX, on the other hand, is a file format used to exchange GPS data by storing geographical information at given intervals. These data include waypoints, tracks, elevation, and routes.

If you’re working on GPS programs or plan to build navigation applications, GPX files are a common map data format used. GPX is an open standard in the geospatial world that has been around for 2 decades. It’s important you know how to work with them.

What program opens a GPX file?

You can’t open a GPX file without dedicated software or a programming language. Downloadable software includes Google Earth Pro and Garmin BaseCamp, just to name a few.

If you’re into coding, you should know that any major programming language can load and parse GPX files, R and Python included.

How to Load and Parse GPX files in R

Now you’ll learn how to combine R and GPX. First things first, we’ll load a GPX file into R. To do so, we’ll have to install a library for parsing XML files. Yes – GPX is just a fancier version of XML:

install.packages("XML")

We can now use the XML::htmlTreeParse() function to read a GPX file. Make sure you know where your file is saved beforehand:

library(XML)

gpx_parsed <- htmlTreeParse(file = "croatia_bike.gpx", useInternalNodes = TRUE)
gpx_parsed

The gpx_parsed variable contains the following:

Image 2 – Contents of a GPX file loaded into R

If you think that looks like a mess, you are not wrong. The file is pretty much unreadable in this form, but you can spot a structure if you focus for long enough.

The trkpt element contains latitude and longitude information for every point, and there’s also an ele tag which contains the elevation.

Use the following R code to extract and store them in a more readable data structure – data.frame:

coords <- xpathSApply(doc = gpx_parsed, path = "//trkpt", fun = xmlAttrs)
elevation <- xpathSApply(doc = gpx_parsed, path = "//trkpt/ele", fun = xmlValue)

df <- data.frame(
  lat = as.numeric(coords["lat", ]),
  lon = as.numeric(coords["lon", ]),
  elevation = as.numeric(elevation)
)

head(df, 10)
tail(df, 10)

Image 3 – First 10 rows of the GPX file

Image 4 – Last 10 rows of the GPX file

The route represents a roundtrip, so starting and ending data points will be almost identical. The fun part happens in the middle, but we can’t know that for sure before inspecting the data further.

The best way to do so is graphically, so next, we’ll go over a couple of options for visualizing GPX data in R.

How to Visualize GPX files in R

When it comes to data visualization and GPX files, you have options. You can go as simple as using a built-in plot() function or you can pay for custom solutions.

The best approach would be to use the ggmap package, but it requires a GCP subscription to an API which isn’t free. We won’t cover it in the article, but we’ll go over the next best thing.

For starters, let’s explore the most basic option. It boils down to plotting a line chart that has all individual data points connected:

plot(x = df$lon, y = df$lat, type = "l", col = "black", lwd = 3,
     xlab = "Longitude", ylab = "Latitude")

Image 5 – Plotting GPX data points with R’s built-in function

The route looks on point, but the visualization is useless. There’s no underlying map below it, so we have no idea where this route takes place.

The other, significantly better alternative is the leaflet package. It’s designed for visualizing geospatial data, so it won’t have any trouble working with our data frame:

library(leaflet)

leaflet() %>%
  addTiles() %>%
  addPolylines(data = df, lat = ~lat, lng = ~lon, color = "#000000", opacity = 0.8, weight = 3)

Image 6 – Plotting GPX data points with Leaflet

Now we’re getting somewhere! The route looks almost identical to the one shown earlier on Strava, but we don’t have to stop here. You can invest hours into producing a perfect geospatial visualization, but for the purpose of this article, we’ll display one additional thing – elevation.

Leaflet doesn’t ship with an easy way of using elevation data (numeric) for coloring purposes, so we have to be somewhat creative. The get_color() function will return one of four colors, depending on the elevation group. Then, data points for groups are added manually to the chart inside a for loop:

get_color <- function(elevation) {
  if (elevation < 500) {
    return("green")
  }
  if (elevation < 1000) {
    return("yellow")
  }
  if (elevation < 1500) {
    return("orange")
  }
  return("red")
}

# New dataset with the new variable for color
df_color <- df %>%
  rowwise() %>%
  mutate(color = get_color(elevation))

df_color$last_color <- dplyr::lag(df_color$color)

# Map
map <- leaflet() %>% addTiles()
for (color in levels(as.factor(df_color$color))) {
  map <- addPolylines(map, lat = ~lat, lng = ~lon, data = df_color[df_color$color == color | df_color$last_color == color, ], color = ~color)
}
map

Image 7 – Plotting GPX data points and elevation with Leaflet

The map isn’t perfect, but it informs us which route segments have a higher elevation than the others.


Summary of R and GPX

And that’s the basics of R and GPX! You’ve learned the basic theory behind this file format, and how to work with it in the R programming language. We’ve only scratched the surface, as there’s plenty more you can do. For example, plotting the elevation profile or making the polyline interactive would be an excellent next step.

Now it’s time for the homework assignment. We encourage you to play around with any GPX file you can find and use R to visualize it. Feel free to explore other visualization libraries and make something truly amazing. When done, please share your results with us on Twitter – @appsilon. We’d love to see what you can come up with.

Want to build interactive maps with R and R Shiny? Try Leaflet and Tmap.

The post R and GPX – How to Read and Visualize GPX Files in R appeared first on Appsilon | Enterprise R Shiny Dashboards.

To leave a comment for the author, please follow the link and comment on their blog: Tag: r - Appsilon | Enterprise R Shiny Dashboards.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.