In this post, I will introduce three different methods for dimensionality reduction of large datasets.
#used packages
library(tidyverse) # for data wrangling
library(stringr) # for string manipulations
library(ggbiplot) # pca biplot with ggplot
library(Rtsne) # implements the t-SNE algorithm
library(kohonen) # implements self organizing maps
library(hrbrthemes) # nice themes for ggplot
library(GGally) # to produce scatterplot matrices
Data
The data we use comes from Kaggle
and contains around 18,000 players of the game FIFA 18
with 75 features per player.
glimpse(fifa_tbl)
## Observations: 17,981
## Variables: 75
## $ X1 <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12...
## $ Name <chr> "Cristiano Ronaldo", "L. Messi", "Neymar...
## $ Age <int> 32, 30, 25, 30, 31, 28, 26, 26, 27, 29, ...
## $ Photo <chr> "https://cdn.sofifa.org/48/18/players/20...
## $ Nationality <chr> "Portugal", "Argentina", "Brazil", "Urug...
## $ Flag <chr> "https://cdn.sofifa.org/flags/38.png", "...
## $ Overall <int> 94, 93, 92, 92, 92, 91, 90, 90, 90, 90, ...
## $ Potential <int> 94, 93, 94, 92, 92, 91, 92, 91, 90, 90, ...
## $ Club <chr> "Real Madrid CF", "FC Barcelona", "Paris...
## $ `Club Logo` <chr> "https://cdn.sofifa.org/24/18/teams/243....
## $ Value <chr> "€95.5M", "€105M", "€123M", "€97M", "€61...
## $ Wage <chr> "€565K", "€565K", "€280K", "€510K", "€23...
## $ Special <int> 2228, 2154, 2100, 2291, 1493, 2143, 1458...
## $ Acceleration <int> 89, 92, 94, 88, 58, 79, 57, 93, 60, 78, ...
## $ Aggression <int> 63, 48, 56, 78, 29, 80, 38, 54, 60, 50, ...
## $ Agility <int> 89, 90, 96, 86, 52, 78, 60, 93, 71, 75, ...
## $ Balance <int> 63, 95, 82, 60, 35, 80, 43, 91, 69, 69, ...
## $ `Ball control` <int> 93, 95, 95, 91, 48, 89, 42, 92, 89, 85, ...
## $ Composure <int> 95, 96, 92, 83, 70, 87, 64, 87, 85, 86, ...
## $ Crossing <int> 85, 77, 75, 77, 15, 62, 17, 80, 85, 68, ...
## $ Curve <int> 81, 89, 81, 86, 14, 77, 21, 82, 85, 74, ...
## $ Dribbling <int> 91, 97, 96, 86, 30, 85, 18, 93, 79, 84, ...
## $ Finishing <int> 94, 95, 89, 94, 13, 91, 13, 83, 76, 91, ...
## $ `Free kick accuracy` <int> 76, 90, 84, 84, 11, 84, 19, 79, 84, 62, ...
## $ `GK diving` <int> 7, 6, 9, 27, 91, 15, 90, 11, 10, 5, 11, ...
## $ `GK handling` <int> 11, 11, 9, 25, 90, 6, 85, 12, 11, 12, 8,...
## $ `GK kicking` <int> 15, 15, 15, 31, 95, 12, 87, 6, 13, 7, 9,...
## $ `GK positioning` <int> 14, 14, 15, 33, 91, 8, 86, 8, 7, 5, 7, 1...
## $ `GK reflexes` <int> 11, 8, 11, 37, 89, 10, 90, 8, 10, 10, 11...
## $ `Heading accuracy` <int> 88, 71, 62, 77, 25, 85, 21, 57, 54, 86, ...
## $ Interceptions <int> 29, 22, 36, 41, 30, 39, 30, 41, 85, 20, ...
## $ Jumping <int> 95, 68, 61, 69, 78, 84, 67, 59, 32, 79, ...
## $ `Long passing` <int> 77, 87, 75, 64, 59, 65, 51, 81, 93, 59, ...
## $ `Long shots` <int> 92, 88, 77, 86, 16, 83, 12, 82, 90, 82, ...
## $ Marking <int> 22, 13, 21, 30, 10, 25, 13, 25, 63, 12, ...
## $ Penalties <int> 85, 74, 81, 85, 47, 81, 40, 86, 73, 70, ...
## $ Positioning <int> 95, 93, 90, 92, 12, 91, 12, 85, 79, 92, ...
## $ Reactions <int> 96, 95, 88, 93, 85, 91, 88, 85, 86, 88, ...
## $ `Short passing` <int> 83, 88, 81, 83, 55, 83, 50, 86, 90, 75, ...
## $ `Shot power` <int> 94, 85, 80, 87, 25, 88, 31, 79, 87, 88, ...
## $ `Sliding tackle` <int> 23, 26, 33, 38, 11, 19, 13, 22, 69, 18, ...
## $ `Sprint speed` <int> 91, 87, 90, 77, 61, 83, 58, 87, 52, 80, ...
## $ Stamina <int> 92, 73, 78, 89, 44, 79, 40, 79, 77, 72, ...
## $ `Standing tackle` <int> 31, 28, 24, 45, 10, 42, 21, 27, 82, 22, ...
## $ Strength <int> 80, 59, 53, 80, 83, 84, 64, 65, 74, 85, ...
## $ Vision <int> 85, 90, 80, 84, 70, 78, 68, 86, 88, 70, ...
## $ Volleys <int> 88, 85, 83, 88, 11, 87, 13, 79, 82, 88, ...
## $ CAM <dbl> 89, 92, 88, 87, NA, 84, NA, 88, 83, 81, ...
## $ CB <dbl> 53, 45, 46, 58, NA, 57, NA, 47, 72, 46, ...
## $ CDM <dbl> 62, 59, 59, 65, NA, 62, NA, 61, 82, 52, ...
## $ CF <dbl> 91, 92, 88, 88, NA, 87, NA, 87, 81, 84, ...
## $ CM <dbl> 82, 84, 79, 80, NA, 78, NA, 81, 87, 71, ...
## $ ID <int> 20801, 158023, 190871, 176580, 167495, 1...
## $ LAM <dbl> 89, 92, 88, 87, NA, 84, NA, 88, 83, 81, ...
## $ LB <dbl> 61, 57, 59, 64, NA, 58, NA, 59, 76, 51, ...
## $ LCB <dbl> 53, 45, 46, 58, NA, 57, NA, 47, 72, 46, ...
## $ LCM <dbl> 82, 84, 79, 80, NA, 78, NA, 81, 87, 71, ...
## $ LDM <dbl> 62, 59, 59, 65, NA, 62, NA, 61, 82, 52, ...
## $ LF <dbl> 91, 92, 88, 88, NA, 87, NA, 87, 81, 84, ...
## $ LM <dbl> 89, 90, 87, 85, NA, 82, NA, 87, 81, 79, ...
## $ LS <dbl> 92, 88, 84, 88, NA, 88, NA, 82, 77, 87, ...
## $ LW <dbl> 91, 91, 89, 87, NA, 84, NA, 88, 80, 82, ...
## $ LWB <dbl> 66, 62, 64, 68, NA, 61, NA, 64, 78, 55, ...
## $ `Preferred Positions` <chr> "ST LW", "RW", "LW", "ST", "GK", "ST", "...
## $ RAM <dbl> 89, 92, 88, 87, NA, 84, NA, 88, 83, 81, ...
## $ RB <dbl> 61, 57, 59, 64, NA, 58, NA, 59, 76, 51, ...
## $ RCB <dbl> 53, 45, 46, 58, NA, 57, NA, 47, 72, 46, ...
## $ RCM <dbl> 82, 84, 79, 80, NA, 78, NA, 81, 87, 71, ...
## $ RDM <dbl> 62, 59, 59, 65, NA, 62, NA, 61, 82, 52, ...
## $ RF <dbl> 91, 92, 88, 88, NA, 87, NA, 87, 81, 84, ...
## $ RM <dbl> 89, 90, 87, 85, NA, 82, NA, 87, 81, 79, ...
## $ RS <dbl> 92, 88, 84, 88, NA, 88, NA, 82, 77, 87, ...
## $ RW <dbl> 91, 91, 89, 87, NA, 84, NA, 88, 80, 82, ...
## $ RWB <dbl> 66, 62, 64, 68, NA, 61, NA, 64, 78, 55, ...
## $ ST <dbl> 92, 88, 84, 88, NA, 88, NA, 82, 77, 87, ...
In this post, we are only interested in the attributes and the ...
[Read more...]