Getting started with R

Posted on June 25, 2013 by Cory Lesmeister in R bloggers | 0 Comments

[This article was first published on Fear and Loathing in Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I wanted to avoid advanced topics in this post and focus on some “blocking and tackling” with R in an effort to get novices started. This is some of the basic code I found useful when I began using R just over 6 weeks ago.

Reading in data from a .csv file is a breeze with this command.

> data = read.csv(file.choose())

No need to have your own data set as R comes with data packages already.

> data() #list the datasets available in R

> # load the dataset ‘cars’ and display the variables

> data(cars)

> head(cars)

speed dist

1 4 2

2 4 10

3 7 4

4 7 22

5 8 16

6 9 10

#the command head() gives shows we have two variables, car speed and stopping distance along with the first 6 rows of data

#using attach() splits the data into separate columns and avoids having to use what I feel is the pesky $

> attach(cars)

# descriptive statistics of our two variables

> summary(cars)

speed dist

Min. : 4.0 Min. : 2.00

1st Qu.:12.0 1st Qu.: 26.00

Median :15.0 Median : 36.00

Mean :15.4 Mean : 42.98

3rd Qu.:19.0 3rd Qu.: 56.00

Max. :25.0 Max. :120.00

> # univariate plots for speed

> plot(speed)

> hist(speed)

> #scatterplot for speed and dist

> plot(speed,dist)

boxplot(speed, dist, notch=T)

# you can use [] to create a subset. Here is how to get rows 1 thru 10 of both variables

> subsetcars = cars[1:10, ]

> subsetcars

speed dist

1 4 2

2 4 10

3 7 4

4 7 22

5 8 16

6 9 10

7 10 18

8 10 26

9 10 34

10 11 17

#rows 1 thru 5 of just speed

> subspeed = cars[1:5, 1]

> subspeed

[1] 4 4 7 7 8

# Observations where stopping distance is greater than 50

> stop = cars[dist > 50, ]

> stop

speed dist

22 14 60

23 14 80

26 15 54

33 18 56

34 18 76

35 18 84

38 19 68

41 20 52

42 20 56

43 20 64

44 22 66

45 23 54

46 24 70

47 24 92

48 24 93

49 24 120

50 25 85

# and finally the correlation

> cor(speed, dist)

[1] 0.8068949

To leave a comment for the author, please follow the link and comment on their blog: Fear and Loathing in Data Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Getting started with R

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)