Do You Know What You Show at Your Map?
[This article was first published on Misanthrope's Thoughts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As access to the GIS and mapping is becoming easier every year the more people and companies create maps. Unfortunately often they just do not know what they are actually showing at their maps. This issue is being mentioned over and over again.
Here is the example that I discovered recently: Cyberthreat Real-Time Map by Kaspersky antivirus company. Here how it looks like:
Amongst the other info they show the Infection rank for each country… based on total threats detected…. You may have already guessed what is the fail, but I let me explain it anyway.
See, the №1 infected country is Russia, which is the home country for Kaspersky and where this antivirus is quite popular. So we can conclude that the rankings that supposed to demonstrate the severity of virus activities merely demonstrates the number of Kaspersky software installations across the globe.
Lets test this hypothesis. I don’t have the data about the number of installation of Kaspersky software per country, but it is safe to assume that this number is proportional to the population of the given country. Also it is easier to get infection rankings for countries from the map than the number of the threats detected. If I had total threats data per country I would compare it to the population. Having infection rankings it is more rational to compare it to the population rankings instead. So I picked 27 random countries and compared their infection and population rankings. The result is demonstrated at the plot below:
Infection rank vs. Population rank |
The linear model is fairly close to Inrection rank = Population rank. It is clear that the phenomena that is presented as an Infection rank just reflects a total software installations per country and not the severity of the ‘cyberthreat’. In order to get the actual Infection rank the number of detected threats have to be normalised by the number of software installations.
To leave a comment for the author, please follow the link and comment on their blog: Misanthrope's Thoughts.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.