Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Having investigated individuals elsewhere, let’s now take a look at the schools.
NOTE:
Although I began the examinations of competitions and individuals by looking at volume of participation (to provide context), I’ll skip an analogous discussion here because the participation of schools is shown indirectly through those analyses.)
School Scores
Let’s begin by looking at some of the same metrics shown for individual
students, but aggregated across all students for each school. In order
to give the reader some insight into school performance, I’ll rank and
show schools by a singular metric of performance. To be consistent, I’ll
use the same metric used for ranking the individuals–summed percentile
rank of scores (prnk_sum
).
NOTE: For the same reason stated before for showing my own scores among the individuals, I’ll include the numbers for my high school (“CLEMENS”) in applicable contexts.
rnk | school | city | n | prnk_sum | prnk_mean | n_defeat_sum | n_defeat_mean | n_advanced_sum | n_state_sum |
---|---|---|---|---|---|---|---|---|---|
1 | ARGYLE | ARGYLE | 168 | 159.01 | 0.95 | 867 | 5.16 | 109 | 53 |
2 | CLEMENTS | SUGAR LAND | 174 | 149.88 | 0.86 | 936 | 5.38 | 109 | 47 |
3 | LINDSAY | LINDSAY | 154 | 134.39 | 0.87 | 791 | 5.14 | 93 | 40 |
4 | KLEIN | KLEIN | 152 | 131.13 | 0.86 | 783 | 5.15 | 87 | 30 |
5 | DULLES | SUGAR LAND | 155 | 129.02 | 0.83 | 825 | 5.32 | 90 | 37 |
6 | WYLIE | ABILENE | 156 | 124.70 | 0.80 | 636 | 4.08 | 91 | 31 |
7 | GARDEN CITY | GARDEN CITY | 144 | 122.77 | 0.85 | 823 | 5.72 | 85 | 33 |
8 | HIGHLAND PARK | DALLAS | 149 | 121.71 | 0.82 | 655 | 4.40 | 85 | 25 |
9 | SALADO | SALADO | 127 | 103.31 | 0.81 | 605 | 4.76 | 73 | 30 |
10 | WESTWOOD | AUSTIN | 130 | 102.67 | 0.79 | 546 | 4.20 | 67 | 9 |
231 | CLEMENS | SCHERTZ | 77 | 43.35 | 0.56 | 233 | 3.03 | 17 | 0 |
Note: 1 # of total rows: 1,436
Admittedly, there’s not a lot of insight to extract from this summary regarding individual schools. Nonetheless, it provides some useful context regarding the magnitude of performance metric values aggregated at the school level.
To begin gaining some better understanding this list of top-performing schools, let’s break down school performance by year.
Also, let’s combine the performance metric values with coordinate data to visualize where the best schools are located.
Now, let’s visualize school dominance across years.
We saw elsewhere that there is no significant temporal trend for competition types or competition level, but is there some kind of temporal trend for schools? My intuition says that there should not be any kind of significant relationship between year and performance. Rather, I would guess that–going with the theory that certain schools tend to do well all of the time–the school itself should have some non-trivial relationship with performance. (If this is true, this would imply that the top-performing schools have students that are better suited for these academic competitions, perhaps due to a strong support group of teachers, demographics, house income, or some other factor not quantified directly here.) Also, I hypothesize that recent performance is probably the strongest indicator of current performance, as it is in many different contexts. I should note that I think these things may only be shown to be true when also factoring in competition type–it seems more likely that schools are “elite” for certain competition types, as opposed to all competitions in aggregate.
To put these ideas together more plainly, I am curious to know if the
success of a school in any given year can be predicted as a function of
the school itself, the year, and the school’s performance in the
previous year. 1 As before, my preference for quantifying performance
is percent rank sum (prnk_sum
) of team score (relative to other
schools at a given competition level). Also, I think it’s a good idea to
“re-scale” the year value to have a first value of 1 (corresponding to
the first year in the scraped data–2004), with subsequent years taking
on subsequent integer values. (This variable is named year_idx
).
So, to be explicit, a linear regression model of the following form is calculated for each unique school and competition type. (Accounting for competition type allows us to properly model the reality that a given school may excel in some competition types but not others.)
$$ prnk_sum = intercept + prnk_sum{year-1} * \beta{1} + year_idx * \beta_{2} $$
*prnk*_*sum* = *intercept* + *prnk*_*sum**year* − 1 * β1 + *year*_*idx* * β2
Note that, because this formula is applied to each school-competition type pair, the intercept term corresponds to the school entity itself.
The distribution of p-values for each term in the model provide some insight regarding the predictive power of the variables. Visually, it does seem like two of my hypotheses are valid:
Recent performance does seem to be predictive of school performance in a given competition type in any given year.
Year itself is not predictive (meaning that there is no temporal trend indicating that performance improves or worsens over time).
However, my other thought that school itself has some kind of predictive value does not appear to be true. 2
Perhaps the deduction that, in general, individual schools do not tend to dominate the rest of the competition can be comprehended in another way. The distribution of the percentage of possible opponent schools defeated at each competition level for each school should re-enforce this inference.
Indeed, observing that the histograms do not show any noticeable skew to the right supports the notion that, in general, individual schools are not dominating specific competition types. If this theory were true, we would see some non-trivial right-hand skew. This possibility is closest to being true (albeit not that close) with the District level of competition (i.e. the lowest level of competition). This observation is not all so surprising given that if it were true that schools do dominate at some level of competition, it is most likely to be true at the lowest level of competition.
Wrap-Up
Certainly analysis of schools in these academic UIL competitions deserves some more attention than that given here, but I think some of the biggest questions about school performance have been answered.
- Actually, I don’t specifically enforce the criteria that theprevious year is used. Rather, I use the most recent year’s value, which may or may not be the previous year if the school did not compete in the previous year. ^
- For more information regarding interpretation of p-value distributions, I recommend reading David Robinson’s very helpful blog post on the topic. ^
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.