PISA 2015 – how to read/process/plot the data with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Yesterday OECD has published results and data from PISA 2015 study (Programme for International Student Assessment). It’s a very cool study – over 500 000 pupils (15-years old) are examined every 3 years. Raw data is publicly available and one can easily access detailed information about pupil’s academic performance and detailed data from surveys for studetns, parents and school officials (~2 000 variables). Lots of stories to be found.
You can download the dataset in the SPSS format from this webpage. Then use the foreign package to read sav files and intsvy package to calculate aggregates/averages/tables/regression models (for 2015 data you shall use the GitHub version of the package).
Below you will find a short example, how to read the data, calculate weighted averages for genders/countries and plot these results with ggplot2. Here you will find other use cases for the intsvy package.
library("foreign") library("intsvy") library("dplyr") library("ggplot2") library("tidyr") stud2015 <- read.spss("CY6_MS_CMB_STU_QQQ.sav", use.value.labels = TRUE, to.data.frame = TRUE) genderMath <- pisa2015.mean.pv(pvlabel = "MATH", by = c("CNT", "ST004D01T"), data = stud2015) genderMath <- genderMath[,c(1,2,4,5)] genderMath %>% select(CNT, ST004D01T, Mean) %>% spread(ST004D01T, Mean) -> genderMathWide genderMathSelected <- genderMathWide %>% filter(CNT %in% c("Austria", "Japan", "Switzerland", "Poland", "Singapore", "Finland", "Singapore", "Korea", "United States")) pl <- ggplot(genderMathWide, aes(Female, Male)) + geom_point() + geom_point(data=genderMathSelected, color="red") + geom_text(data=genderMathSelected, aes(label=CNT), color="grey20") + geom_abline(slope=1, intercept = 0) + geom_abline(slope=1, intercept = 20, linetype = 2, color="grey") + geom_abline(slope=1, intercept = -20, linetype = 2, color="grey") + geom_text(x=425, y=460, label="Boys +20 points", angle=45, color="grey", size=8) + geom_text(x=460, y=425, label="Girls +20 points", angle=45, color="grey", size=8) + coord_fixed(xlim = c(400,565), ylim = c(400,565)) + theme_bw() + ggtitle("PISA 2015 in Math - Gender Gap") + xlab("PISA 2015 Math score for girls") + ylab("PISA 2015 Math score for boys")
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.