Plotting data over a map with R
[This article was first published on analytics for fun, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
After searching for a few hours on the web, I’ve been able to get my R code working and plot breast cancer data on a world map. It might not the best looking map possible (R graphics is incredible!), but I am happy with that for now.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
To produce the map I used the “maps” package available through CRAN repository. And of course I needed longitude and latitude coordinates for each country, which I searched on the web and added to my original data set. Here are the steps I followed:
1) Load a .csv file containing lat/long coordinates for all countries
> countryCoord<- read.csv (“~/Rworkdir/data/countryCoord.csv”)
2) Add lat/long coordinates to my original breast cancer data set (dataset is called “gapCleaned”). To do this I used the function “merge”, specifying to merge the two data sets by the variable “country” (both the datasets have this variable in common), and used left outer join (here is a good explanation of merge command)
> mergedCleaned<- merge(gapCleaned, countryCoord, by=”country”, all.x=TRUE)
All right, now I have two new columns in my data set, indicating lat and long coordinates for each country 🙂 Cool, next step is finally drawing a map with the data.
3) Draw a world map and tell R where to plot breast cancer data
> library(maps)
> map(“world”,col=”gray90”, fill=TRUE)
I size the breast cancer symbol according to breast cancer value for each country in my data set
> radius <- 3^sqrt (mergedCleaned$breastcancer)
Finally, I give R instructions to plot my breast cancer data over the world map
> symbols(mergedCleaned$lon, mergedCleaned$lat, bg = “blue”, fg = “red”, lwd = 1, circles = radius, inches = 0.175, add = TRUE)
New cases of breast cancer in the world, 2002
I am sure we can do prettier plots with R (I know there are other interesting packages suitable for this, such as ggplot), but I am happy for now. I’ve learned something new and been able to visualize and communicate data in a more effective way than just a scatterplot.
Conclusions:
Looking at the map, we can quickly identify the countries/areas with the highest number of breast cancer cases and hypothesize patterns. As reported on my last post, these are United States, New Zealand, Israel, Central/Northern Europe and in general highly developed economies rather than developing countries.
To leave a comment for the author, please follow the link and comment on their blog: analytics for fun.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.