Site icon R-bloggers

Data Viz and Manipulation P1

[This article was first published on Analysis of AFL, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is the start of the tutorial series, where we will cover visualising and manipulating data in R. There will be a series of mini checkpoints that should be used as a guide to check understanding.

To check the most basic functionality of R (you can use it as a calculator) what does 9+3 equal?

9+3
## [1] 12

Checkpoint 1: Were you able to get 12 as an answer?

Another cool thing we can do is store variables for example we can have a variable x which is the sum of 9+3 earlier.

x = 9+3

After running x= 9+3 if we type x and hit enter like below then our x variable is printed.

x
## [1] 12

Now that we have our variable x stored, instead of going 9+3+4 we can just go x + 4

Now that we have variable x stored, we can use x instead of 9+3

x + 4

Now that we have the very basics sorted, lets try something a little bit more interesting….

Rembering Tony Locketts Career

Tony Lockett is the games leading goal kicker, and is easily one of the best to lace them up.

When thinking about data, we can either enter it in manually or we can get the data in a pre-processed format be it from a R package or other.

Lets pretend for a second that we didn’t have such a good R package for AFL data. We would go to a site like afltables and enter in his data manually in a csv file to analyse.

We can also do this in R. So instead of in excel entering the data in cells we would enter each column as a vector

Year = c( 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
             1994, 1995, 1996, 1997, 1998, 1999, 2002)
GL = c(19,77,  79,  60, 117,  35,  78,  65, 127, 132,  53,  56, 110,
                 121,  37, 109,  82,  3)

GM=c(12, 20, 21, 18, 22,  8, 11, 12, 17, 22, 10, 10, 19, 22, 12, 23, 19,3)

This would give us 3 variables * Year – Season that Tony Lockett played * GL – Goals kicked in season by Tony Lockett * GM – Total games played by Tony Lockett in season

We can view these just by typing in the variables once we have created them.

Checkpoint 2 Are you able to print the vectors you have created?

Year
##  [1] 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996
## [15] 1997 1998 1999 2002
GL
##  [1]  19  77  79  60 117  35  78  65 127 132  53  56 110 121  37 109  82
## [18]   3
GM
##  [1] 12 20 21 18 22  8 11 12 17 22 10 10 19 22 12 23 19  3

Basic arithmetic are done element wise in R, for example lets say we wanted Tony Lockets average goals per game GL_GM

GL_GM = GL /GM
GL_GM
##  [1] 1.583333 3.850000 3.761905 3.333333 5.318182 4.375000 7.090909
##  [8] 5.416667 7.470588 6.000000 5.300000 5.600000 5.789474 5.500000
## [15] 3.083333 4.739130 4.315789 1.000000

Arithmetic operations involving a scalar (a consistent number applied to all values) and a vector (like Year) act element wise aswell. For example, the command below substract 1966 from each element of our year vector. Because Lockett was born in 1966 , this gives us his age in each season of his career.

age = Year - 1966 
age
##  [1] 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 36

Checkpoint 3 – Are you able to get the graph below.

Next let’s plot Lockett’s goals per game by age:

plot(age, GL_GM, type="l", col="red", main="Tony Lockets Average Goals Per Game by Age")

Plot has a lot of options in R, to get a feel for them all simply put a question mark before the function and R will help you out!

?plot

Indexing in R

Lets say we wanted to get Tony Lockets first 3 years goals per game we would do this using the square brackets in R

GL_GM[1:3]
## [1] 1.583333 3.850000 3.761905

We can also remove data we don’t want, for example Tony Lockett retired and came back. So maybe we don’t want to have his comeback year as part of our analysis. We would remove it using negative index.

GL_GM[-c(18)]
##  [1] 1.583333 3.850000 3.761905 3.333333 5.318182 4.375000 7.090909
##  [8] 5.416667 7.470588 6.000000 5.300000 5.600000 5.789474 5.500000
## [15] 3.083333 4.739130 4.315789

Which we should compare to the original GL_GM

GL_GM
##  [1] 1.583333 3.850000 3.761905 3.333333 5.318182 4.375000 7.090909
##  [8] 5.416667 7.470588 6.000000 5.300000 5.600000 5.789474 5.500000
## [15] 3.083333 4.739130 4.315789 1.000000

What if we wanted to find out the values of when Tony Lockett played more than 10 games, we could just go GM>10

GM>10
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
## [12] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
GM[GM>10]
##  [1] 12 20 21 18 22 11 12 17 22 19 22 12 23 19

So I gather at this point you are probably thinking “Hey mate this isn’t the cool tidyverse stuff I see online”

Well that is true so lets change tack and move to using tidyverse and fitzRoy for cool AFL things.

To leave a comment for the author, please follow the link and comment on their blog: Analysis of AFL.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.