Analyzing exam scores
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
📖 Background
The principal of a large school is interested in knowing if the test preparation courses are helpful and also know the effect of parental education level on test scores.
💪Objectives
- What are the average reading scores for students with/without the test preparation course?
- What are the average scores for the different parental education levels?
- Create plots to visualize findings for questions 1 and 2.
- Look at the effects within subgroups. Compare the average scores for students with/without the test preparation course for different parental education levels (e.g., faceted plots).
- The principal wants to know if kids who perform well on one subject also score well on the others. Look at the correlations between scores.
- Summarize the findings.
💾 The data
The data has the following fields:
- “gender” – male / female
- “race/ethnicity” – one of 5 combinations of race/ethnicity
- “parent_education_level” – highest education level of either parent
- “lunch” – whether the student receives free/reduced or standard lunch
- “test_prep_course” – whether the student took the test preparation course
- “math” – exam score in math
- “reading” – exam score in reading
- “writing” – exam score in writing
library(tidyverse) data <- read_csv("C:/Users/Adejumo/Downloads/exams.csv") head(data) ## # A tibble: 6 x 8 ## gender `race/ethnicity` parent_education~ lunch test_prep_course math reading ## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> ## 1 female group B bachelor's degree stan~ none 72 72 ## 2 female group C some college stan~ completed 69 90 ## 3 female group B master's degree stan~ none 90 95 ## 4 male group A associate's degr~ free~ none 47 57 ## 5 male group C some college stan~ none 76 78 ## 6 female group B associate's degr~ stan~ none 71 83 ## # ... with 1 more variable: writing <dbl> skimr::skim(data)
Name | data |
Number of rows | 1000 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
race/ethnicity | 0 | 1 | 7 | 7 | 0 | 5 | 0 |
parent_education_level | 0 | 1 | 11 | 18 | 0 | 6 | 0 |
lunch | 0 | 1 | 8 | 12 | 0 | 2 | 0 |
test_prep_course | 0 | 1 | 4 | 9 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
math | 0 | 1 | 66.09 | 15.16 | 0 | 57.00 | 66 | 77 | 100 | ▁▁▅▇▃ |
reading | 0 | 1 | 69.17 | 14.60 | 17 | 59.00 | 70 | 79 | 100 | ▁▂▆▇▃ |
writing | 0 | 1 | 68.05 | 15.20 | 10 | 57.75 | 69 | 79 | 100 | ▁▂▅▇▃ |
Exploratory Data Analysis
Average reading scores for students with/without the test preparation course
Students who took the test preparation course
mean_reading_tpc <- data %>% filter(test_prep_course == "completed") %>% summarise(mean(reading)) %>% as_vector() data %>% filter(test_prep_course == "completed") %>% ggplot(aes(x = reading)) + geom_density(fill = "skyblue", alpha = 0.5) + geom_vline(xintercept = mean_reading_tpc, size = 0.5, color = "red") + annotate(x = mean_reading_tpc, y = +Inf, label = round(mean_reading_tpc, 2), vjust = 2, geom = "label") + xlab("Students exam scores") + ggtitle("Average scores of students who took the test preparation course")
Students who did not take the test preparation course
mean_reading <- data %>% filter(test_prep_course == "none") %>% summarise(mean(reading)) %>% as_vector() data %>% filter(test_prep_course == "none") %>% ggplot(aes(x = reading)) + geom_density(fill = "pink", alpha = 0.5) + geom_vline(xintercept = mean_reading, size = 0.5, color = "red") + annotate(x = mean_reading, y = +Inf, label = round(mean_reading, 2), vjust = 2, geom = "label")+ xlab("Students exam scores") + ggtitle("Average scores of students who did not take the test preparation course")
The average score of students who took the test preparation course is higher with an average of 73.89 and the normal plot showing that majority of the students scored above average.
Average scores on the different parental educational levels
data %>% group_by(parent_education_level) %>% summarize(Mathematics = round(mean(math),1), Reading = round(mean(reading),1), Writing = round(mean(writing),1)) %>% pivot_longer(cols = c( "Mathematics", "Reading", "Writing"), names_to = c("subject"), values_to = "scores") %>% ggplot(aes(subject, parent_education_level)) + geom_tile(aes(fill = scores), colour = "white") + scale_fill_gradient(low = "white", high = "steelblue")+ geom_text(aes(label = scores)) + theme(legend.position = "none") + xlab("Average scores in each Subject") + ylab("Parent Level of Education")
Children of parents who have achieved a higher level of education recorded a higher test score.
Average scores for students with/without the test preparation course for different parental education level
Mathematics
data %>% group_by(test_prep_course, parent_education_level) %>% summarize(Mathematics = round(mean(math),1), Reading = round(mean(reading),1), Writing = round(mean(writing),1), .groups = ) %>% ggplot(aes(test_prep_course, Mathematics, colour = test_prep_course)) + geom_boxplot() + facet_wrap(vars(parent_education_level)) + xlab("Test Peparation Course") + ylab("Mathematics Test Score") + ggtitle("Mathematics") + labs(colour = "Test Preparation Course")
#### Reading
data %>% group_by(test_prep_course, parent_education_level) %>% summarize(Mathematics = round(mean(math),1), Reading = round(mean(reading),1), Writing = round(mean(writing),1)) %>% ggplot(aes(test_prep_course, Reading, colour = test_prep_course)) + geom_boxplot() + facet_wrap(vars(parent_education_level)) + xlab("Test Peparation Course") + ylab("Reading Test Score") + ggtitle("Reading") + labs(colour = "Test Preparation Course")
#### Writing
data %>% group_by(test_prep_course, parent_education_level) %>% summarize(Mathematics = round(mean(math),1), Reading = round(mean(reading),1), Writing = round(mean(writing),1)) %>% ggplot(aes(test_prep_course, Writing, colour = test_prep_course)) + geom_boxplot() + facet_wrap(vars(parent_education_level)) + xlab("Test Peparation Course") + ylab("Writing Test Score") + ggtitle("Writing") + labs(colour = "Test Preparation Course")
Students of Parents who have attained high education level performed better than those with lower education level regardless whether they took the test preparation test or not.
Relationship between students test scores
data %>% select(math, reading, writing) %>% cor() %>% corrplot::corrplot(method = "number")
There is a highly positive correlation between the test scores especially in reading and writing. Students who perform well in one of the subjects is likely to perform better in the rest.
Summary
From the above analysis, we can conclude that the test preparation course have a significant effect on student performance and also children of parents with higher educational qualifications regardless of taking the test preparation course or not displayed higher scores than those whose parents have lower educational qualifications.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.