Site icon R-bloggers

Insight from FIFA 14’s Player Attributes (Using R)

[This article was first published on You Know, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
FIFA 14 is a video game by EA Sports that mimics the experience of managing and playing for a soccer team. The game uses the likenesses and attributes of real players and this is part of the appeal. Although I rarely play video games, I am an avid soccer player and got curious about what could be learned by taking a closer look at the game-assigned player attributes.

www.futhead.com is a good source of FIFA 14 data. I scraped the html from the two hundred-plus pages of player attributes and then munged them into a useful table. Players have an overall rating and they have six specific stats (pace, shooting, passing, dribbling, defending, and heading). Each player has an assigned position; I collapsed the positions into a “type” category (Defense, Midfield, Forward). The modern game effectively has four lines of players but the position names still carry the naming conventions of the days of the three line formations, such as 4-4-2.

Player Positions and Position Types

Below is a chart summarizing player rating by position. The charted is sorted in ascending median rating. There is a great deal of spread, but generally the center midfielder and fullbacks are a bit lower than the wingers and wingbacks.
< !--[if gte vml 1]>< v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"> < v:stroke joinstyle="miter"/> < v:formulas> < v:f eqn="if lineDrawn pixelLineWidth 0"/> < v:f eqn="sum @0 1 0"/> < v:f eqn="sum 0 0 @1"/> < v:f eqn="prod @2 1 2"/> < v:f eqn="prod @3 21600 pixelWidth"/> < v:f eqn="prod @3 21600 pixelHeight"/> < v:f eqn="sum @0 0 1"/> < v:f eqn="prod @6 1 2"/> < v:f eqn="prod @7 21600 pixelWidth"/> < v:f eqn="sum @8 21600 0"/> < v:f eqn="prod @7 21600 pixelHeight"/> < v:f eqn="sum @10 21600 0"/> < v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/> < o:lock v:ext="edit" aspectratio="t"/>< v:shape id="Picture_x0020_4" o:spid="_x0000_i1033" type="#_x0000_t75" alt="PlayerRatingsByPosition.png" style='width:447pt;height:258pt; visibility:visible;mso-wrap-style:square'> < v:imagedata src="file:///C:\Users\ddunn\AppData\Local\Temp\msohtmlclip1\01\clip_image001.gif" o:title="PlayerRatingsByPosition"/>< ![endif]-->< !--[if !vml]-->< !--[endif]-->
The collapsed view below corresponds with the above chart:  a slight bias as the position becomes more offensive-minded.
< !--[if gte vml 1]>< v:shape id="Picture_x0020_5" o:spid="_x0000_i1032" type="#_x0000_t75" alt="PlayerRatingsByType.png" style='width:447pt;height:258pt;visibility:visible;mso-wrap-style:square'> < v:imagedata src="file:///C:\Users\ddunn\AppData\Local\Temp\msohtmlclip1\01\clip_image002.gif" o:title="PlayerRatingsByType"/>< ![endif]-->< !--[if !vml]-->< !--[endif]-->

Modeling Player Ratings

I built a linear model for each position “type” and found R-squared values ranging 88%-99%. Each model used all six attributes as predictors with overall rating as the dependent variable. I speculate that player age/experience may account for the unexplained variance. Below is a look at the performance of each position type’s model. Both images visually support the models’ validity.
< !--[if gte vml 1]>< v:shape id="Picture_x0020_0" o:spid="_x0000_i1031" type="#_x0000_t75" alt="ModelPerformance.png" style='width:388.5pt;height:224.25pt;visibility:visible;mso-wrap-style:square'> < v:imagedata src="file:///C:\Users\ddunn\AppData\Local\Temp\msohtmlclip1\01\clip_image003.png" o:title="ModelPerformance"/>< ![endif]-->< !--[if !vml]-->< !--[endif]-->
< !--[if gte vml 1]>< v:shape id="Picture_x0020_1" o:spid="_x0000_i1030" type="#_x0000_t75" alt="ModelResiduals.png" style='width:447pt;height:258pt;visibility:visible;mso-wrap-style:square'> < v:imagedata src="file:///C:\Users\ddunn\AppData\Local\Temp\msohtmlclip1\01\clip_image005.gif" o:title="ModelResiduals"/>< ![endif]-->< !--[if !vml]-->< !--[endif]-->

Position Type Models

Each position type’s model naturally has a different mix of attribute weights. Below are charts showing these weights.
< !--[if gte vml 1]>< v:shape id="Picture_x0020_6" o:spid="_x0000_i1029" type="#_x0000_t75" alt="WeightingDefense.png" style='width:447pt;height:258pt;visibility:visible;mso-wrap-style:square'> < v:imagedata src="file:///C:\Users\ddunn\AppData\Local\Temp\msohtmlclip1\01\clip_image006.gif" o:title="WeightingDefense"/>< ![endif]-->< !--[if !vml]-->< !--[endif]-->
Forwards need to be good at shooting and this is expressed in the above graph. Interestingly, passing is actually negatively correlated with a forward’s rating. I can think of several great forwards I have played with that fit this category!
Midfield ratings are more balanced than that of defense but dribbling and passing are the two most important skills for this position type.
< !--[if gte vml 1]>< v:shape id="Picture_x0020_8" o:spid="_x0000_i1027" type="#_x0000_t75" alt="WeightingForward.png" style='width:447pt;height:258pt;visibility:visible;mso-wrap-style:square'> < v:imagedata src="file:///C:\Users\ddunn\AppData\Local\Temp\msohtmlclip1\01\clip_image008.gif" o:title="WeightingForward"/>< ![endif]-->< !--[if !vml]-->< !--[endif]-->
It is clear that defending is far and away the most important skill for defenders; this is less an insight than an indictment of the game developers for not breaking defense into its own attributes such as tackling and positioning.
Goalkeepers are specialists so these skills are not as directly relevant, but I included them for completeness.

Mismatches

Each player’s position is assigned in the database. This leads to the possibility of having a player being theoretically higher rated in a different position. I found some evidence of this. Below is a table of the top three mismatches by position.
Best Rating< o:p>
Forward< o:p>
Craig Gardner
Guillaume Gillet
Steven Reed
Cristiano Ronaldo
Arjen Robben
Thomas Müller

Best Rating< o:p>
Midfield< o:p>
Philipp Lahm
Dani Alves
Marcelo

Neymar
Antonio Cassano
Sebast. Giovinco
Best Rating< o:p>
Defense< o:p>

Yaya Touré
Sergio Busquets
Xabi Alonso
Karim Guédé
Lee McCulloch
Mikael Dahlberg


Defense< o:p>
Midfield< o:p>
Forward< o:p>


Assigned< o:p>
Assigned< o:p>
Assigned< o:p>

Defenders better as midfielders are an impressive crew:  Lahm, Alves, and Marcelo are three of the top players. Midfielders better as defenders are known for their holding prowess and their enforcer reputation. Midfielders better as forwards are often impressive wingers who can use their speed as a weapon in the open wide spaces. Forwards better as midfielders are represented by two Italians and Neymar, which is surprising since he is viewed as a potent striker. As someone who has watched countless matches, I venture that the positions should be thought of in terms of where the player is expected to defend not necessarily where they are expected to attack; it is common for wingers to cut inside and act like forwards once the opponent’s defenders are occupied by the true forwards. Likewise, the rise of the offensive-minded wing backs can cause trouble for defenses that have to cope with a late runner joining the attack.

Model Outliers

The model does a good job of predicting a player’s overall rating, but there are a few exceptions.

At Assigned Position< o:p>
At Best Position< o:p>
Better than Predicted< o:p>
Raoul Cedric Loé
Stefan Reinartz
Cañas
Mesut Özil
Franck Ribéry 
Luca Toni
Worse than Predicted< o:p>
Greg Tempest
Musharraf Al Ruwaili
Jacob Shoop
Nicholas Gotfredsen
Don Anding
Josh Ford
Most of these players are lesser known, with the exception of the top right box. These players must have magic not captured in the regular six attributes; one might call this the X Factor.

Clustering

There is some evidence that the player attributes lead to a few common clusters. Below is a chart showing the weighted sum of squares for a given cluster count. This is a bit of visual confirmation that there are three or four general styles of player; past that the WSS does not change as much.
< !--[if gte vml 1]>< v:shape id="Picture_x0020_9" o:spid="_x0000_i1026" type="#_x0000_t75" alt="WSS.png" style='width:447pt;height:258pt;visibility:visible;mso-wrap-style:square'> < v:imagedata src="file:///C:\Users\ddunn\AppData\Local\Temp\msohtmlclip1\01\clip_image009.gif" o:title="WSS"/>< ![endif]-->< !--[if !vml]-->< !--[endif]-->

Player Tree

Finally, I clustered the top field players (overall rating at least 85) hierarchically. What developed was an insightful way to visualize how different players are stylistically related to each other.
< !--[if gte vml 1]>< v:shape id="Picture_x0020_11" o:spid="_x0000_i1025" type="#_x0000_t75" alt="PlayerTree.png" style='width:489pt;height:447pt;visibility:visible;mso-wrap-style:square'> < v:imagedata src="file:///C:\Users\ddunn\AppData\Local\Temp\msohtmlclip1\01\clip_image010.png" o:title="PlayerTree"/>< ![endif]-->< !--[if !vml]-->< !--[endif]-->
Football / Soccer’s very own family tree. The forward Gareth Bale is mixed in between the midfielders and defenders. The forward Lionel Messi is mixed in with the midfielders.These are two of the most talked about players today. Maybe being mixed in with different position players in the tree is predictive of being an important, interesting player. If so, keep your eyes on Thomas Müller.

To leave a comment for the author, please follow the link and comment on their blog: You Know.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.