Site icon R-bloggers

TreeMap World Population visualisation

[This article was first published on ipub » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This example is inspired by the examples of the treemap package. You’ll learn how to
This code builds on version 0.2.4 of the data.tree package, which you can get from CRAN or from github. For more posts on data.tree, see here. You will also find this example in the package’s applications vignette.

Original treemap Example (to be improved)

The original example, as available in the treemap package documentation, visualises the world population as a tree map.
1
2
3
4
5
6
7
library(treemap)
data(GNI2010)
treemap(GNI2010,
       index=c(“continent”, “iso3”),
       vSize=“population”,
       vColor=“GNI”,
       type=“value”)
  There are many countries, so the chart gets clustered with many very small boxes. In this example, we will limit the number of countries shown, and sum the remaining population in a catch-all country called “Other”. We use the data.tree package to do this aggregation.

Conversion from data.frame

First, let’s convert the population data into a data.tree structure:
1
2
3
4
library(data.tree)
GNI2010$pathString < paste(“world”, GNI2010$continent, GNI2010$country, sep = “/”)
n < as.Node(GNI2010[,])
print(n, pruneMethod = “dist”, limit = 20)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
##                        levelName
## 1  world                        
## 2   ¦–North America            
## 3   ¦   ¦–Aruba                
## 4   ¦   ¦–Antigua and Barbuda  
## 5   ¦   ¦–Bahamas              
## 6   ¦   °–… 30 nodes w/ 0 sub
## 7   ¦–Asia                    
## 8   ¦   ¦–Afghanistan          
## 9   ¦   ¦–United Arab Emirates
## 10  ¦   °–… 45 nodes w/ 0 sub
## 11  ¦–Africa                  
## 12  ¦   ¦–Angola              
## 13  ¦   ¦–Burundi              
## 14  ¦   °–… 52 nodes w/ 0 sub
## 15  ¦–Europe                  
## 16  ¦   ¦–Albania              
## 17  ¦   ¦–Austria              
## 18  ¦   °–… 41 nodes w/ 0 sub
## 19  ¦–South America            
## 20  ¦   ¦–Argentina            
## 21  ¦   ¦–Bolivia              
## 22  ¦   °–… 10 nodes w/ 0 sub
## 23  °–Oceania                  
## 24      ¦–American Samoa      
## 25      ¦–Australia            
## 26      °–… 16 nodes w/ 0 sub
We can easily navigate the tree to find the population of a specific country. Luckily, RStudio is quite helpful with its code completion (use CTRL + SPACE):
1
n$Europe$Switzerland$population
1
## [1] 7826
Or, we can look at a sub-tree:
1
2
3
northAm < n$`North America`
northAm$Sort(“GNI”, decreasing = TRUE)
print(northAm, “iso3”, “population”, “GNI”, limit = 12)
 
1
2
3
4
5
6
7
8
9
10
11
12
13
##                       levelName iso3 population   GNI
## 1  North America                             NA    NA
## 2   ¦–United States of America  USA     309349 47340
## 3   ¦–Canada                    CAN      34126 43250
## 4   ¦–Bahamas                   BHS        343 22240
## 5   ¦–Puerto Rico               PRI       3978 15500
## 6   ¦–Trinidad and Tobago       TTO       1341 15380
## 7   ¦–Antigua and Barbuda       ATG         88 13280
## 8   ¦–Saint Kitts and Nevis     KNA         52 11830
## 9   ¦–Mexico                    MEX     113423  8930
## 10  ¦–Panama                    PAN       3517  6970
## 11  ¦–Grenada                   GRD        104  6960
## 12  °–… 23 nodes w/ 0 sub                 NA    NA
 

Aggregate and Cumulate

We now want to aggregate the population. For non-leaves, this will recursively iterate through children, and cache the result in the population field. The main reason why we do this is not to calculate the population of the world, but to store the result via thecacheAttribute.
1
2
3
4
Aggregate(node = n,
          attribute = “population”,
          aggFun = sum,
          cacheAttribute = “population”)
1
## [1] 6727766
  Next, we sort each node by population:
1
n$Sort(attribute = “population”, decreasing = TRUE, recursive = TRUE)
  Finally, we cumulate among siblings, and store the running sum in an attribute calledcumPop:
1
n$Do(function(x) Cumulate(x, “population”, sum, “cumPop”))
  The tree now looks as follows. Note the new attributes cumPop, as well as the sort order:
1
print(n, “population”, “cumPop”, pruneMethod = “dist”, limit = 20)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
##                           levelName population  cumPop
## 1  world                               6727766 6727766
## 2   ¦–Asia                            4089247 4089247
## 3   ¦   ¦–China                       1338300 1338300
## 4   ¦   ¦–India                       1224615 2562915
## 5   ¦   ¦–Indonesia                    239870 2802785
## 6   ¦   °–… 44 nodes w/ 0 sub            NA      NA
## 7   ¦–Africa                           954502 5043749
## 8   ¦   ¦–Nigeria                      158423  158423
## 9   ¦   ¦–Ethiopia                      82950  241373
## 10  ¦   °–… 52 nodes w/ 0 sub            NA      NA
## 11  ¦–Europe                           714837 5758586
## 12  ¦   ¦–Russian Federation           141750  141750
## 13  ¦   ¦–Germany                       81777  223527
## 14  ¦   °–… 41 nodes w/ 0 sub            NA      NA
## 15  ¦–North America                    540446 6299032
## 16  ¦   ¦–United States of America     309349  309349
## 17  ¦   ¦–Mexico                       113423  422772
## 18  ¦   °–… 31 nodes w/ 0 sub            NA      NA
## 19  ¦–South America                    392162 6691194
## 20  ¦   ¦–Brazil                       194946  194946
## 21  ¦   ¦–Colombia                      46295  241241
## 22  ¦   °–… 10 nodes w/ 0 sub            NA      NA
## 23  °–Oceania                           36572 6727766
## 24      ¦–Australia                     22299   22299
## 25      ¦–Papua New Guinea               6858   29157
## 26      °–… 16 nodes w/ 0 sub            NA      NA
 

Prune

The previous steps were done to define our threshold: big countries should be displayed, while small ones should be grouped together. This lets us define a pruning function that will allow a maximum of 7 countries per continent. Additionally, it will prune all countries making up less than 90% of a continent’s population:
1
2
3
4
5
myPruneFun < function(x, cutoff = 0.9, maxCountries = 7) {
  if (isNotLeaf(x)) return (TRUE)
  if (x$position > maxCountries) return (FALSE)
  return (x$cumPop < (x$parent$population * cutoff))
}
 
We clone the tree. The reason is that data.tree uses reference semantics, and we want to store the original tree, because we might want to play around later with different parameters:
1
2
n2 < Clone(n, pruneFun = myPruneFun)
print(n2$Oceania, “population”, pruneMethod = “simple”, limit = 20)
1
2
3
4
##              levelName population
## 1 Oceania                   36572
## 2  ¦–Australia             22299
## 3  °–Papua New Guinea       6858
Finally, we need to sum countries that we pruned away into a new “Other” node:
1
2
3
4
5
6
7
8
9
10
11
n2$Do(function(x) {
  missing < x$population sum(sapply(x$children, function(x) x$population))
  other < x$AddChild(“Other”)
  other$iso3 < “OTH”
  other$country < “Other”
  other$continent < x$name
  other$GNI < 0
  other$population < missing
},
filterFun = function(x) x$level == 2
)

Plotting the treemap

In order to plot the treemap, we need to convert the data.tree structure back to a data.frame:
1
2
3
4
5
6
7
df < ToDataFrameTable(n2, “iso3”, “country”, “continent”, “population”, “GNI”)
treemap(df,
        index=c(“continent”, “iso3”),
        vSize=“population”,
        vColor=“GNI”,
        type=“value”)
 
And here we go: Our treemap now has at most 7 countries per continent, and groups all countries below the 90th percentile: If you have enjoyed this example, I recommend you read the package’s vignettes, or have a look at the other data.tree posts in this blog.

To leave a comment for the author, please follow the link and comment on their blog: ipub » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.