Getting started with R
[This article was first published on Fear and Loathing in Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I wanted to avoid advanced topics in this post and focus on some “blocking and tackling” with R in an effort to get novices started. This is some of the basic code I found useful when I began using R just over 6 weeks ago.
Reading in data from a .csv file is a breeze with this command.
> data = read.csv(file.choose())
No need to have your own data set as R comes with data packages already.
> data() #list the datasets available in R
> # load the dataset ‘cars’ and display the variables
> data(cars)
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
#the command head() gives shows we have two variables, car speed and stopping distance along with the first 6 rows of data
#using attach() splits the data into separate columns and avoids having to use what I feel is the pesky $
> attach(cars)
# descriptive statistics of our two variables
> summary(cars)
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00
> # univariate plots for speed
> #scatterplot for speed and dist
> plot(speed,dist)
boxplot(speed, dist, notch=T)
# you can use [] to create a subset. Here is how to get rows 1 thru 10 of both variables
> subsetcars = cars[1:10, ]
> subsetcars
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
7 10 18
8 10 26
9 10 34
10 11 17
#rows 1 thru 5 of just speed
> subspeed = cars[1:5, 1]
> subspeed
[1] 4 4 7 7 8
# Observations where stopping distance is greater than 50
> stop = cars[dist > 50, ]
> stop
speed dist
22 14 60
23 14 80
26 15 54
33 18 56
34 18 76
35 18 84
38 19 68
41 20 52
42 20 56
43 20 64
44 22 66
45 23 54
46 24 70
47 24 92
48 24 93
49 24 120
50 25 85
# and finally the correlation
> cor(speed, dist)
[1] 0.8068949
To leave a comment for the author, please follow the link and comment on their blog: Fear and Loathing in Data Science.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.