Site icon R-bloggers

Web Scraping the NFL Draft

[This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

The National Football League (NFL) is big business. How big? The average value of each of the 32 teams is $3.2 billion.

Clearly, Americans love pro football. Indeed, the author of this blog is a football fanatic. Don’t believe me? See below.

Figure 1. The author preparing to attend a Minnesota Vikings game years ago. The cane was for style. 

NFL teams are based in cities across the country, such as the Miami Dolphins, Carolina Panthers, and Pittsburgh Steelers.

NFL players come in different sizes and mentalities. Let’s take a look at the different positions, which will set up our discussion of the draft below.

Anatomy of an NFL Team

Each NFL team has 53 players, mainly comprising the offensive and defensive positions. The main positions are given below.

The figure below shows how the offensive and defensive players line up against each other.

Figure 2: The basic football positions on offense (red) and defense (blue). 

The players at each position have specific physical and mental attributes. For example, some players are huge:

Figure 3. Green Bay Packers offensive Lineman Daryn Colledge (#73) is a big dude. 

Some players are fast:

Figure 4. Arizona Cardinels wide receiver Larry Fitzgerald (#11) is tall, fast, and has tremendous eye-hand coordination. 

Some players are strong, athletic, and fast:

Figure 5. Denver Broncos linebacker Von Miller (#58) is a defensive stud who is strong and fast. 

Some players are smart and poised:

Figure 6. New England Patriot’s QB Tom Brady (#12) is unflappable under pressure and can read defensive alignments and make adjustments on the fly. 

Where Do the Players Come From?

The players are drafted from college teams, which are organized in conferences that are regionally established.

For example, the Big Ten conference consists mainly of teams from the Midwest, with a couple of east coast teams thrown in.

Other prominent conferences include the Southeastern Conference (SEC), Atlantic Coast Conference (ACC), Big 12, Pac 10, Pac 12, Big East, and so on.

The NFL Draft

Every Spring, the NFL holds its draft. Teams select college players and supplement their rosters for the upcoming season.

Figure 7. The NFL draft is spectacle of sports beauty. 

The Draft Decision Making Process

NFL Teams invest a lot of money in trying to select players who will make their teams better.

Even given these measurables, it is incredibly difficult to forecast how well a college player will perform in the NFL.

Figure 8. The typical NFL team has a “war room” during the draft. Scouts, coaches, the general manager and others discuss the players selected by other teams and prepare to make their picks. Note the big board in the background, which contains the team’s ranking of college players. 

Enter Data Science

What can data science tell us about the NFL draft?

It can tell us a lot, but I want to start with one fundamental question: What positions and conferences are selected preferentially within the 7 rounds?

Can data science help us to evaluate such intuitions? And can data science show us new patterns in the selection of players in the NFL draft?

Data Science Methods

Using Scrapy, a web scraping package in the Python language, I acquired data from the past ten years of NFL drafts from Wikipedia (e.g., https://en.wikipedia.org/wiki/2016_NFL_Draft ). This data allowed me to construct a spreadsheet that included the following information about every player selected in the drafts over ten years: round number, pick number, NFL team, player name, position, college team, and conference.

Given this detailed spreadsheet, I used the R programming language for analysis and visualization of the data. The figure below illustrates the draft selection process by position and conference.

 

Figure 9. Violin plot, color coded by conference and arranged by position. The circles represent individual observations, i.e., individual players selected at that pick across the past decade. The top of the Y axis (0) reflects the first player selected, and proceeds downwards to the last player selected in the 7th round. Therefore, players selected in the topmost parts of each plot are the highest picks and considered the most valuable.   

Observations

From this plot, we see that some intuitions are borne out. For example, the Big Ten conference has produced a relatively large number of offensive tackles who were selected in the early rounds. We also see a strong representation of the SEC at this position.

At cornerback (CB), a skill position in which speedy players excel, the ACC and SEC dominate. This finding again matches our intuition, since an informed fan believes that many excellent CBs come from colleges in these conferences, such as Alabama and Florida.

Interestingly, and less intuitively, other trends can be discerned in this plot. Consider quarterback (QB), the most valuable position on any team. Here, the conference with the most QBs taken in high rounds is the Big 12. Also note that there is a cluster of QB picks in the middle rounds for this conference, but hardly any selections in the later rounds.

This application of data science suggests other questions. For example, suppose my favorite team, the Minnesota Vikings, is considering drafting a QB from the Pac 12 conference. I might look at the figure (P=12, gold in Figure 9) and note that not a single QB from this conference has been selected in the first round over the past ten years.

Of course, this does not mean that the QB in question will not be good. Indeed, this figure tells us nothing about how specific college players actually performed in the NFL. Rather, it shows us trends in the data, which may help us to better understand how players have been valued and selected in the past. Of course, we can create other figures that include metrics on player performance once they play in the NFL; this is something I am working on.

Overall, a project like this one may help an NFL team build a better model of how player positions and college conferences are represented in the draft over time. As we noted above, there is big money in the NFL, and data science is a goldmine.

The post Web Scraping the NFL Draft appeared first on NYC Data Science Academy Blog.

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.