[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I downloaded counts of deaths by age, year and mayor cause from the Dutch statistics site. In this post I do some plots to look at causes and changes between the years.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data
Data from CBS. I downloaded the data in Dutch, hence the first thing to do was provide some kind of translation. The coding used seems slightly different from IDC-10 main categories (and has been alphabetically disordered compared to that). I used Google translate and IDC-10 to obtain the translationsPlots
Preparation
In the following I will be using both percentage of population and percentage of deaths by age cohort. The need for the percentage of deaths is because in some cohorts the percentages of deaths are much higher, thereby hiding anything happening in other cohorts. In addition I should mention that for visual purposes only the most important eight causes are used in the plotsYoung
It seems that most of risks are associated with birth. In addition, these risks have steadily been decreasing.
Looking at the age cohorts above 0 years, it seems accidents are most important. Most remarkable is a spike at 1953, which occurs for all four ages. After some consideration, I link this to the North Sea flood of 1953. It is remarkable that this is visible in the plot. It says a lot about how safe we are from accidents that it does. In the age category 15 to 20 there is also a relatively large bump during the 1970 to 1975. This is more difficult to explain, but I suspect traffic, especially the moped. A light motorcycle which preferably would be boosted to run much faster than the legal speed. 1975 saw the requirement to wear a helmet. It was much hated at the time, but in hindsight I can see that government felt compelled to do something and that it did have effect.
Looking at the plots, it seems the next big cause are Neoplasms. This is not because these become more deadly, it is because accidents are getting under control.
Elderly
For the elderly, diseases of the circulatory system are the main cause and decreasing quite a bit. The number of Symptoms and Abnormal Clinical Observations seems to decrease too. Since this seems to be a nice name for the ‘other’ category, this may be better diagnostics.
What is less visible is the increase in mental and behavioral disorders, especially after 1980 and at oldest age. It also seems that Neoplasms are getting lower very slowly.
Code
data reading
library(dplyr)
library(ggplot2)
txtlines <- readLines(‘Overledenen__doodsoo_170615161506.csv’)
txtlines <- grep(‘Centraal’,txtlines,value=TRUE,invert=TRUE)
#txtlines[1:5]
#cat(txtlines[4])
r1 <- read.csv(sep=’;’,header=FALSE,
col.names=c(‘Causes’,’Causes2′,’Age’,’year’,’aantal’,’count’),
na.strings=’-‘,text=txtlines[3:length(txtlines)]) %>%
select(.,-aantal,-Causes2)
transcauses <- c(
“Infectious and parasitic diseases”,
“Diseases of skin and subcutaneous”,
“Diseases musculoskeletal system and connective “,
“Diseases of the genitourinary system”,
“Pregnancy, childbirth”,
“Conditions of perinatal period”,
“Congenital abnormalities”,
“Sympt., Abnormal clinical Observations”,
“External causes of death”,
“Neoplasms”,
“Illness of blood, blood-forming organs”,
“Endocrine, nutritional, metabolic illness”,
“Mental and behavioral disorders”,
“Diseases of the nervous system and sense organs”,
“Diseases of the circulatory system”,
“Diseases of the respiratory organs”,
“Diseases of the digestive organs”,
“Population”,
“Total all causes of death”)
#cc <- cbind(transcauses,levels(r1$Causes))
#options(width=100)
levels(r1$Causes) <- transcauses
levels(r1$Age) <-
gsub(‘jaar’,’year’,levels(r1$Age)) %>%
gsub(‘tot’,’to’,.) %>%
gsub(‘of ouder’,’+’,.)
mutate(.,Population=count) %>%
select(.,-count,-Causes) %>%
merge(.,r1) %>%
filter(.,Causes %in% transcauses[1:17]) %>%
mutate(.,Percentage=100*count/Population,
Causes = factor(Causes),
year = as.numeric(gsub(‘*’,”,year,fixed=TRUE))
)
perc.of.pop <- filter(r1,Causes==’Population’) %>%
mutate(.,Population=count) %>%
select(.,-count,-Causes) %>%
merge(.,r1) %>%
filter(.,Causes %in% transcauses[1:17]) %>%
mutate(.,Percentage=100*count/Population,
Causes = factor(Causes),
year = as.numeric(gsub(‘*’,”,year,fixed=TRUE))
)
tmp1 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[c language=”(1,2,11,3)”][/c],
!is.na(Percentage)) %>%
mutate(.,Age=factor(Age,levels=levels(perc.of.pop$Age)[c language=”(1,2,11,3)”][/c]),
Causes =factor(Causes))
# select ‘important’ causes (which somewhen got over 15%)
group_by(tmp1,Causes)%>%
summarize(.,mp = max(Percentage)) %>%
mutate(.,rk=rank(-mp)) %>%
merge(.,tmp1) %>%
filter(.,rk<=8) %>%
ggplot(.,
aes(y=Percentage,x=year,col=Causes)) +
geom_line()+
guides(col=guide_legend(ncol=2)) +
facet_wrap( ~Age ) +
theme(legend.position=”bottom”)+
ylab(‘Percentage of Cohort’)
dev.off()
###
png(‘youngpop2.png’)
tmp1 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[c language=”(2,11,3,4)”][/c],
!is.na(Percentage)) %>%
mutate(.,Age=factor(Age,levels=levels(perc.of.pop$Age)[c language=”(2,11,3,4)”][/c]),
Causes =factor(Causes))
# select ‘important’ causes (which somewhen got over 15%)
group_by(tmp1,Causes)%>%
summarize(.,mp = max(Percentage)) %>%
mutate(.,rk=rank(-mp)) %>%
merge(.,tmp1) %>%
filter(.,rk<=8) %>%
ggplot(.,
aes(y=Percentage,x=year,col=Causes)) +
geom_line()+
guides(col=guide_legend(ncol=2)) +
facet_wrap( ~Age ) +
theme(legend.position=”bottom”)+
ylab(‘Percentage of Cohort’)
# https://en.wikipedia.org/wiki/North_Sea_flood_of_1953
dev.off()
tmp2 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[18:21],
!is.na(Percentage)) %>%
mutate(.,Age=factor(Age),
Causes =factor(Causes))
group_by(tmp2,Causes)%>%
summarize(.,mp = max(Percentage)) %>%
mutate(.,rk=rank(-mp)) %>%
merge(.,tmp2) %>%
filter(.,rk<=8) %>%
ggplot(.,
aes(y=Percentage,x=year,col=Causes)) +
geom_line()+
guides(col=guide_legend(ncol=2)) +
facet_wrap( ~Age ) +
theme(legend.position=”bottom”)+
ylab(‘Percentage of Cohort’)
dev.off()
# rj.GD
# 2
png(‘oldpop2.png’)
tmp2 <- perc.of.pop %>%
filter(.,
Age %in% levels(perc.of.death$Age)[18:21],
year>=1980,
!is.na(Percentage)) %>%
mutate(.,Age=factor(Age),
Causes =factor(Causes))
group_by(tmp2,Causes)%>%
summarize(.,mp = max(Percentage)) %>%
mutate(.,rk=rank(-mp)) %>%
merge(.,tmp2) %>%
filter(.,rk<=8) %>%
ggplot(.,
aes(y=Percentage,x=year,col=Causes)) +
geom_line()+
guides(col=guide_legend(ncol=2)) +
facet_wrap( ~Age ) +
theme(legend.position=”bottom”)+
ylab(‘Percentage of Cohort’)
dev.off()
library(ggplot2)
txtlines <- readLines(‘Overledenen__doodsoo_170615161506.csv’)
txtlines <- grep(‘Centraal’,txtlines,value=TRUE,invert=TRUE)
#txtlines[1:5]
#cat(txtlines[4])
r1 <- read.csv(sep=’;’,header=FALSE,
col.names=c(‘Causes’,’Causes2′,’Age’,’year’,’aantal’,’count’),
na.strings=’-‘,text=txtlines[3:length(txtlines)]) %>%
select(.,-aantal,-Causes2)
transcauses <- c(
“Infectious and parasitic diseases”,
“Diseases of skin and subcutaneous”,
“Diseases musculoskeletal system and connective “,
“Diseases of the genitourinary system”,
“Pregnancy, childbirth”,
“Conditions of perinatal period”,
“Congenital abnormalities”,
“Sympt., Abnormal clinical Observations”,
“External causes of death”,
“Neoplasms”,
“Illness of blood, blood-forming organs”,
“Endocrine, nutritional, metabolic illness”,
“Mental and behavioral disorders”,
“Diseases of the nervous system and sense organs”,
“Diseases of the circulatory system”,
“Diseases of the respiratory organs”,
“Diseases of the digestive organs”,
“Population”,
“Total all causes of death”)
#cc <- cbind(transcauses,levels(r1$Causes))
#options(width=100)
levels(r1$Causes) <- transcauses
levels(r1$Age) <-
gsub(‘jaar’,’year’,levels(r1$Age)) %>%
gsub(‘tot’,’to’,.) %>%
gsub(‘of ouder’,’+’,.)
Preparation for plots
perc.of.death <- filter(r1,Causes==’Total all causes of death’) %>%mutate(.,Population=count) %>%
select(.,-count,-Causes) %>%
merge(.,r1) %>%
filter(.,Causes %in% transcauses[1:17]) %>%
mutate(.,Percentage=100*count/Population,
Causes = factor(Causes),
year = as.numeric(gsub(‘*’,”,year,fixed=TRUE))
)
perc.of.pop <- filter(r1,Causes==’Population’) %>%
mutate(.,Population=count) %>%
select(.,-count,-Causes) %>%
merge(.,r1) %>%
filter(.,Causes %in% transcauses[1:17]) %>%
mutate(.,Percentage=100*count/Population,
Causes = factor(Causes),
year = as.numeric(gsub(‘*’,”,year,fixed=TRUE))
)
young
png(‘youngpop1.png’)tmp1 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[c language=”(1,2,11,3)”][/c],
!is.na(Percentage)) %>%
mutate(.,Age=factor(Age,levels=levels(perc.of.pop$Age)[c language=”(1,2,11,3)”][/c]),
Causes =factor(Causes))
# select ‘important’ causes (which somewhen got over 15%)
group_by(tmp1,Causes)%>%
summarize(.,mp = max(Percentage)) %>%
mutate(.,rk=rank(-mp)) %>%
merge(.,tmp1) %>%
filter(.,rk<=8) %>%
ggplot(.,
aes(y=Percentage,x=year,col=Causes)) +
geom_line()+
guides(col=guide_legend(ncol=2)) +
facet_wrap( ~Age ) +
theme(legend.position=”bottom”)+
ylab(‘Percentage of Cohort’)
dev.off()
###
png(‘youngpop2.png’)
tmp1 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[c language=”(2,11,3,4)”][/c],
!is.na(Percentage)) %>%
mutate(.,Age=factor(Age,levels=levels(perc.of.pop$Age)[c language=”(2,11,3,4)”][/c]),
Causes =factor(Causes))
# select ‘important’ causes (which somewhen got over 15%)
group_by(tmp1,Causes)%>%
summarize(.,mp = max(Percentage)) %>%
mutate(.,rk=rank(-mp)) %>%
merge(.,tmp1) %>%
filter(.,rk<=8) %>%
ggplot(.,
aes(y=Percentage,x=year,col=Causes)) +
geom_line()+
guides(col=guide_legend(ncol=2)) +
facet_wrap( ~Age ) +
theme(legend.position=”bottom”)+
ylab(‘Percentage of Cohort’)
# https://en.wikipedia.org/wiki/North_Sea_flood_of_1953
dev.off()
old
png(‘oldpop.png’)tmp2 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[18:21],
!is.na(Percentage)) %>%
mutate(.,Age=factor(Age),
Causes =factor(Causes))
group_by(tmp2,Causes)%>%
summarize(.,mp = max(Percentage)) %>%
mutate(.,rk=rank(-mp)) %>%
merge(.,tmp2) %>%
filter(.,rk<=8) %>%
ggplot(.,
aes(y=Percentage,x=year,col=Causes)) +
geom_line()+
guides(col=guide_legend(ncol=2)) +
facet_wrap( ~Age ) +
theme(legend.position=”bottom”)+
ylab(‘Percentage of Cohort’)
dev.off()
# rj.GD
# 2
png(‘oldpop2.png’)
tmp2 <- perc.of.pop %>%
filter(.,
Age %in% levels(perc.of.death$Age)[18:21],
year>=1980,
!is.na(Percentage)) %>%
mutate(.,Age=factor(Age),
Causes =factor(Causes))
group_by(tmp2,Causes)%>%
summarize(.,mp = max(Percentage)) %>%
mutate(.,rk=rank(-mp)) %>%
merge(.,tmp2) %>%
filter(.,rk<=8) %>%
ggplot(.,
aes(y=Percentage,x=year,col=Causes)) +
geom_line()+
guides(col=guide_legend(ncol=2)) +
facet_wrap( ~Age ) +
theme(legend.position=”bottom”)+
ylab(‘Percentage of Cohort’)
dev.off()
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.