Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
📖 Background
The principal of a large school is interested in knowing if the test preparation courses are helpful and also know the effect of parental education level on test scores.
💪Objectives
- What are the average reading scores for students with/without the test preparation course?
- What are the average scores for the different parental education levels?
- Create plots to visualize findings for questions 1 and 2.
- Look at the effects within subgroups. Compare the average scores for students with/without the test preparation course for different parental education levels (e.g., faceted plots).
- The principal wants to know if kids who perform well on one subject also score well on the others. Look at the correlations between scores.
- Summarize the findings.
💾 The data
The data has the following fields:
- “gender” – male / female
- “race/ethnicity” – one of 5 combinations of race/ethnicity
- “parent_education_level” – highest education level of either parent
- “lunch” – whether the student receives free/reduced or standard lunch
- “test_prep_course” – whether the student took the test preparation course
- “math” – exam score in math
- “reading” – exam score in reading
- “writing” – exam score in writing
library(tidyverse) data <- read_csv("C:/Users/Adejumo/Downloads/exams.csv") head(data) ## # A tibble: 6 x 8 ## gender `race/ethnicity` parent_education~ lunch test_prep_course math reading ## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> ## 1 female group B bachelor's degree stan~ none 72 72 ## 2 female group C some college stan~ completed 69 90 ## 3 female group B master's degree stan~ none 90 95 ## 4 male group A associate's degr~ free~ none 47 57 ## 5 male group C some college stan~ none 76 78 ## 6 female group B associate's degr~ stan~ none 71 83 ## # ... with 1 more variable: writing <dbl> skimr::skim(data)
Name | data |
Number of rows | 1000 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
race/ethnicity | 0 | 1 | 7 | 7 | 0 | 5 | 0 |
parent_education_level | 0 | 1 | 11 | 18 | 0 | 6 | 0 |
lunch | 0 | 1 | 8 | 12 | 0 | 2 | 0 |
test_prep_course | 0 | 1 | 4 | 9 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
math | 0 | 1 | 66.09 | 15.16 | 0 | 57.00 | 66 | 77 | 100 | ▁▁▅▇▃ |
reading | 0 | 1 | 69.17 | 14.60 | 17 | 59.00 | 70 | 79 | 100 | ▁▂▆▇▃ |
writing | 0 | 1 | 68.05 | 15.20 | 10 | 57.75 | 69 | 79 | 100 | ▁▂▅▇▃ |
Exploratory Data Analysis
Average reading scores for students with/without the test preparation course
Students who took the test preparation course
mean_reading_tpc <- data %>% filter(test_prep_course == "completed") %>% summarise(mean(reading)) %>% as_vector() data %>% filter(test_prep_course == "completed") %>% ggplot(aes(x = reading)) + geom_density(fill = "skyblue", alpha = 0.5) + geom_vline(xintercept = mean_reading_tpc, size = 0.5, color = "red") + annotate(x = mean_reading_tpc, y = +Inf, label = round(mean_reading_tpc, 2), vjust = 2, geom = "label") + xlab("Students exam scores") + ggtitle("Average scores of students who took the test preparation course")
Students who did not take the test preparation course
mean_reading <- data %>% filter(test_prep_course == "none") %>% summarise(mean(reading)) %>% as_vector() data %>% filter(test_prep_course == "none") %>% ggplot(aes(x = reading)) + geom_density(fill = "pink", alpha = 0.5) + geom_vline(xintercept = mean_reading, size = 0.5, color = "red") + annotate(x = mean_reading, y = +Inf, label = round(mean_reading, 2), vjust = 2, geom = "label")+ xlab("Students exam scores") + ggtitle("Average scores of students who did not take the test preparation course")
Average scores on the different parental educational levels
data %>% group_by(parent_education_level) %>% summarize(Mathematics = round(mean(math),1), Reading = round(mean(reading),1), Writing = round(mean(writing),1)) %>% pivot_longer(cols = c( "Mathematics", "Reading", "Writing"), names_to = c("subject"), values_to = "scores") %>% ggplot(aes(subject, parent_education_level)) + geom_tile(aes(fill = scores), colour = "white") + scale_fill_gradient(low = "white", high = "steelblue")+ geom_text(aes(label = scores)) + theme(legend.position = "none") + xlab("Average scores in each Subject") + ylab("Parent Level of Education")
Average scores for students with/without the test preparation course for different parental education level
Mathematics
data %>% group_by(test_prep_course, parent_education_level) %>% summarize(Mathematics = round(mean(math),1), Reading = round(mean(reading),1), Writing = round(mean(writing),1), .groups = ) %>% ggplot(aes(test_prep_course, Mathematics, colour = test_prep_course)) + geom_boxplot() + facet_wrap(vars(parent_education_level)) + xlab("Test Peparation Course") + ylab("Mathematics Test Score") + ggtitle("Mathematics") + labs(colour = "Test Preparation Course")
data %>% group_by(test_prep_course, parent_education_level) %>% summarize(Mathematics = round(mean(math),1), Reading = round(mean(reading),1), Writing = round(mean(writing),1)) %>% ggplot(aes(test_prep_course, Reading, colour = test_prep_course)) + geom_boxplot() + facet_wrap(vars(parent_education_level)) + xlab("Test Peparation Course") + ylab("Reading Test Score") + ggtitle("Reading") + labs(colour = "Test Preparation Course")
data %>% group_by(test_prep_course, parent_education_level) %>% summarize(Mathematics = round(mean(math),1), Reading = round(mean(reading),1), Writing = round(mean(writing),1)) %>% ggplot(aes(test_prep_course, Writing, colour = test_prep_course)) + geom_boxplot() + facet_wrap(vars(parent_education_level)) + xlab("Test Peparation Course") + ylab("Writing Test Score") + ggtitle("Writing") + labs(colour = "Test Preparation Course")
Relationship between students test scores
data %>% select(math, reading, writing) %>% cor() %>% corrplot::corrplot(method = "number")
Summary
From the above analysis, we can conclude that the test preparation course have a significant effect on student performance and also children of parents with higher educational qualifications regardless of taking the test preparation course or not displayed higher scores than those whose parents have lower educational qualifications.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.