Common Data Creation Commands
[This article was first published on Coffee and Econometrics in the Morning, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here is a video tutorial where I go through some of the most commonly used commands in creating and manipulating data. As soon as I want to do more than just running a single regression, I use these commands more than any other set of commands (in some of the other videos, you may have seen these).Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here is the code I use in the video if you would like to try out these commands for yourself.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## ------------------------ ## | |
## Data Creation in R ## | |
## Tutorial by Tony Cookson ## | |
## ------------------------ ## | |
## Simplest way to create a vector is to use the colon. | |
a = 1:4 | |
b = 4:1 | |
a | |
b | |
## But, they're just sequences | |
## c() can be useful for creating more interesting | |
## vectors | |
y = c(1,2,3,6) ## basic usage | |
y2 = c(a,b) ## binds vectors together into one vector | |
## More interesting... can create a vector | |
## recursively | |
weight = NULL ## It can be useful to initialize a null vector to define the vector recursively | |
for(i in 1:14){ | |
temp= rnorm(10, mean = i*5, sd = 2) | |
weight = c(weight, mean(temp)) | |
} | |
## seq() is a fancy way to create a vector | |
seq(1, 4) ## Same as 1:4 | |
seq(0.5, 4) ## Default Step is 1 | |
seq(1, 4, by = 0.5) ## Can Go in different number of steps | |
seq(1, 4, by = 0.4) ## Doesn't have to partition the interval (1,4) | |
## It can be useful to repeat a short sequence many times | |
## The rep() command is useful for this. | |
x1 = c(1,2,3,1,2,3,1,2,3,1,2,3) ## 1:3 repeated 4 times | |
rep(1:3, times = 4) ## More efficient | |
x2 = c(1,1,1,1, 2,2,2,2, 3,3,3,3) ## 1:3 each element repated 4 times | |
c(rep(1,4), rep(2,4), rep(3,4)) ## Can do it this way | |
rep(1:3, each = 4) ## Or use the "each" option (Most Efficient) | |
rep(1:3, each = 4, times = 2) | |
## Can be useful to organize into a matrix | |
y = rnorm(12, 8, 2) ## Creates a length 12 vector with a mean of 8 and sd = 2. | |
## I could also use c() then give the matrix dimensions | |
x.vec = c(y, x1, x2) | |
Xalt = matrix(x.vec, ncol = 3) ## coerces vectors into matrices | |
Xalt2 = matrix(x.vec, nrow= 12) ## Equivalent | |
Xmess = matrix(x.vec, ncol= 3, byrow=TRUE) ## Can reverse the way the matrix is filled | |
Xmess = matrix(x.vec, ncol= 12, byrow=TRUE) ## using byrow | |
## If I want x1, x2, and y in a matrix, | |
## I can use cbind() and rbind() | |
X = cbind(y, x1, x2) | |
Xrow = rbind(y, x1, x2) | |
## Maybe I want to use the names, too. | |
## Then the data.frame() command is useful | |
Xdf = data.frame(y, x1, x2) | |
Xdf ## The matrix | |
Xdf$y ## the vector named y... | |
## nice to have variable names |
To leave a comment for the author, please follow the link and comment on their blog: Coffee and Econometrics in the Morning.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.