Analyzing Golden State Warriors’ passing network using GraphFrames in Spark
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Databricks recently announced GraphFrames, awesome Spark extension to implement graph processing using DataFrames.
I performed graph analysis and visualized beautiful ball movement network of Golden State Warriors using rich data provided by NBA.com’s stats
Pass network of Warriors
Passes received & made
The league’s MVP Stephen Curry received the most passes and the team’s MVP Draymond Green provides the most passes.
We’ve seen most of the offense start with their pick & roll or Curry’s off-ball cuts with Green as a pass provider.
inDegree
id | inDegree |
---|---|
CurryStephen | 3993 |
GreenDraymond | 3123 |
ThompsonKlay | 2276 |
LivingstonShaun | 1925 |
IguodalaAndre | 1814 |
BarnesHarrison | 1241 |
BogutAndrew | 1062 |
BarbosaLeandro | 946 |
SpeightsMarreese | 826 |
ClarkIan | 692 |
RushBrandon | 685 |
EzeliFestus | 559 |
McAdooJames Michael | 182 |
VarejaoAnderson | 67 |
LooneyKevon | 22 |
outDegree
id | outDegree |
---|---|
GreenDraymond | 3841 |
CurryStephen | 3300 |
IguodalaAndre | 1896 |
LivingstonShaun | 1878 |
BogutAndrew | 1660 |
ThompsonKlay | 1460 |
BarnesHarrison | 1300 |
SpeightsMarreese | 795 |
RushBrandon | 772 |
EzeliFestus | 765 |
BarbosaLeandro | 758 |
ClarkIan | 597 |
McAdooJames Michael | 261 |
VarejaoAnderson | 94 |
LooneyKevon | 36 |
Label Propagation
Label Propagation is an algorithm to find communities in a graph network.
The algorithm nicely classifies players into backcourt and frontcourt without providing label!
name | label |
---|---|
Thompson, Klay | 3 |
Barbosa, Leandro | 3 |
Curry, Stephen | 3 |
Clark, Ian | 3 |
Livingston, Shaun | 3 |
Rush, Brandon | 7 |
Green, Draymond | 7 |
Speights, Marreese | 7 |
Bogut, Andrew | 7 |
McAdoo, James Michael | 7 |
Iguodala, Andre | 7 |
Varejao, Anderson | 7 |
Ezeli, Festus | 7 |
Looney, Kevon | 7 |
Barnes, Harrison | 7 |
Pagerank
PageRank can detect important nodes (players in this case) in a network.
It’s no surprise that Stephen Curry, Draymond Green and Klay Thompson are the top three.
The algoritm detects Shaun Livingston and Andre Iguodala play key roles in the Warriors’ passing games.
name | pagerank |
---|---|
Curry, Stephen | 2.17 |
Green, Draymond | 1.99 |
Thompson, Klay | 1.34 |
Livingston, Shaun | 1.29 |
Iguodala, Andre | 1.21 |
Barnes, Harrison | 0.86 |
Bogut, Andrew | 0.77 |
Barbosa, Leandro | 0.72 |
Speights, Marreese | 0.66 |
Clark, Ian | 0.59 |
Rush, Brandon | 0.57 |
Ezeli, Festus | 0.48 |
McAdoo, James Michael | 0.27 |
Varejao, Anderson | 0.19 |
Looney, Kevon | 0.16 |
Everything together
Here is a network visualization using the results of above.
- Node size: pagerank
- Node color: community
- Link width: passes received & made
Workflow
Calling API
I used the endpoint playerdashptpass and saved data for all the players in the team into local JSON files.
The data is about who passed how many times in 2015-16 season
JSON -> Panda’s DataFrame
Then I combined all the individual JSON files into a single DataFrame for later aggregation.
Prepare vertices and edges
You need a special data format for GraphFrames in Spark, vertices and edges.
Vertices are lis of nodes and IDs in a graph.
Edges are the relathionship of the nodes.
You can pass additional features like weight but I couldn’t find out a way to utilize there features well in later analysis.
A workaround I took below is brute force and not even a proper graph operation but works (suggestions/comments are very welcome).
Graph analysis
Bring the local vertices and edges to Spark and let it spark.
Visualise the network
When you run gsw_passing_network.py in my github repo, you have passes.csv, groups.csv and size.csv in your working directory.
I used networkD3 package in R to make a cool interactive D3 chart.
Code
The full codes are available on github.
Analyzing Golden State Warriors' passing network using GraphFrames in Spark was originally published by Kirill Pomogajko at Opiate for the masses on March 15, 2016.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.