Site icon R-bloggers

An Analysis of Texas High School Academic Competition Results, Part 2 – Competitions

[This article was first published on r on Tony ElHabr, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Competition Participation

Some of the first questions that might come to mind are those regarding the number of schools in each level of competition (District, Region, and State) and each conference classification level (1A, 2A, … 6A).

It seems fair to say that the distribution of schools among Districts, Regions, and Conferences is relatively even. 1 2 This is to be expected since the UIL presumably tries to divide schools evenly among each grouping (to the extent possible) in order to stimulate fair competition.

With some context regarding the number of schools in each competition level and in conference, let’s now consider the number of distinct individual competitors and schools for each competition type (i.e. Calculator, Computer Science, etc.)

Science stands out when evaluating participation by competition type alone.

But what about when considering competition level as well?

Science seems to prevail again.

Now, what about when also considering conferences?

Once again, Science seems to be the most popular. So, in all, science appears to be the answer to the numerous variations of the “Which competition type/level has the most …?” question.

Competition Scores

With an understanding of the participation in the UIL competitions, let’s now consider the scores. What does the distribution of scores for each competition type look like?

Take what you will from the above visual, but it is interesting to see which competition types have wider distributions. This could imply a number of different things:

To try to understand the score distributions better, let’s break them down by year, competition level, and conference.

One might observe the following from the above plot:

So maybe there isn’t some broad, temporal trend to the scores. But is there a trend in scores among the competition levels (for a given competition type)? One would think that, assuming that test difficulty is constant across competition levels, there would be an aggregate increase in scores with increasing competition level (because only the top scoring individuals and schools advance).

In fact, it does appear that score distributions skew higher with increasing levels of competition. If this were not true, then I would suspect that test are made to be more difficult at each competition stage. 3

Notably, it appears that Number Sense demonstrates the largest “jumps” in aggregate scores with increasing competition levels. Having competed in this competition before, I do not find this all too surprising. More than any other competition type, those who succeed Number Sense seem to rely on natural abilities (as opposed to training) to beat out the competition. (I would go so far as to say that for some individuals, this natural ability is “savant-like”.) Consequently, with increasing competition level, it is more likely that these “superior” competitors stand out, and, as observed here, cause the the scoring distributions to skew higher.

So there does seem to be a trend of higher scores with higher competition level, but has that trend changed over time? 4

Recall that there is no apparent trend when not distinguishing by competition level.

As with the raw scores, there does not appear to be any noticeable trend in the change in level of difficulty of tests between each competition level over time. (Aside from a steep drop-off in the Science scores in 2014, there’s really nothing too interesting about this plot.) Along with the previous temporal visual where a trend with competition level is not distinguished, this plot is a strong indication that the tests (and, most likely, the skills of the competitors) have not changed over time.

Such visual inferences are supported quantitatively. Observe the p-values of the variables for linear regression models fit for each competition type, where the average year-to-year score difference (as a percentage) is estimated given average individual competitor score and year. Taking the customary p-value threshold of 0.05 as the level of significance (where one may deduce that the model is not predictive if the p.value is greater than the threshold value), the year term is shown to be insignificant.

$$ mean \space year \space diff \space pct = intercept + mean * \beta{1} + year * \beta{2} $$

comp (Intercept) mean year
Calculator Applications 0.1589 0.0048 0.1290
Computer Science 0.1885 0.0239 0.1957
Mathematics 0.5551 0.0146 0.5137
Number Sense 0.1761 0.3733 0.1807
Science 0.4813 0.0321 0.4958

There could be a number of confounding factors that explain the relatively even level of competition over time, including the following.

If the latter is true, at the very least we can say that, despite how the media or popular culture may portray younger generations, the prowess of the academic “elite” of the current generation of high school students in Texas has not dropped off. (Nonetheless, nothing definitive should be deduced about those who do not compete in UIL academic competitions and are anywhere outside of Texas.)

Wrap-up

In this write-up, the participation and scores of Texas high school academic UIL competitions are illustrated in a number of different ways–by competition type, by competition level, by year, and by variations of these. Next, we’ll take a closer look at some more specific questions, focusing on individual competitors.


  1. The State competition level is not shown because it there is no “sub-grouping” of State (like there is 1, 2, 3, … for each of the groupings shown). ^
  2. As a technical note, Districts, Regions, and Conferences are not really all of the same “type” of data. Nevertheless, these different “groupings” each stratify the sample population in some manner. ^
  3. Actually, it might be true that there is some increase in test difficulty with advancing competitions, but such an adjustment, if it is truly made, does not quite offset the skill level of the advancing competitors. ^
  4. Recall that there is no apparent trend when not distinguishing by competition level. ^

To leave a comment for the author, please follow the link and comment on their blog: r on Tony ElHabr.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.