Common Data Creation Commands

[This article was first published on Coffee and Econometrics in the Morning, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here is a video tutorial where I go through some of the most commonly used commands in creating and manipulating data. As soon as I want to do more than just running a single regression, I use these commands more than any other set of commands (in some of the other videos, you may have seen these).



Here is the code I use in the video if you would like to try out these commands for yourself.

## ------------------------ ##
## Data Creation in R ##
## Tutorial by Tony Cookson ##
## ------------------------ ##
## Simplest way to create a vector is to use the colon.
a = 1:4
b = 4:1
a
b
## But, they're just sequences
## c() can be useful for creating more interesting
## vectors
y = c(1,2,3,6) ## basic usage
y2 = c(a,b) ## binds vectors together into one vector
## More interesting... can create a vector
## recursively
weight = NULL ## It can be useful to initialize a null vector to define the vector recursively
for(i in 1:14){
temp= rnorm(10, mean = i*5, sd = 2)
weight = c(weight, mean(temp))
}
## seq() is a fancy way to create a vector
seq(1, 4) ## Same as 1:4
seq(0.5, 4) ## Default Step is 1
seq(1, 4, by = 0.5) ## Can Go in different number of steps
seq(1, 4, by = 0.4) ## Doesn't have to partition the interval (1,4)
## It can be useful to repeat a short sequence many times
## The rep() command is useful for this.
x1 = c(1,2,3,1,2,3,1,2,3,1,2,3) ## 1:3 repeated 4 times
rep(1:3, times = 4) ## More efficient
x2 = c(1,1,1,1, 2,2,2,2, 3,3,3,3) ## 1:3 each element repated 4 times
c(rep(1,4), rep(2,4), rep(3,4)) ## Can do it this way
rep(1:3, each = 4) ## Or use the "each" option (Most Efficient)
rep(1:3, each = 4, times = 2)
## Can be useful to organize into a matrix
y = rnorm(12, 8, 2) ## Creates a length 12 vector with a mean of 8 and sd = 2.
## I could also use c() then give the matrix dimensions
x.vec = c(y, x1, x2)
Xalt = matrix(x.vec, ncol = 3) ## coerces vectors into matrices
Xalt2 = matrix(x.vec, nrow= 12) ## Equivalent
Xmess = matrix(x.vec, ncol= 3, byrow=TRUE) ## Can reverse the way the matrix is filled
Xmess = matrix(x.vec, ncol= 12, byrow=TRUE) ## using byrow
## If I want x1, x2, and y in a matrix,
## I can use cbind() and rbind()
X = cbind(y, x1, x2)
Xrow = rbind(y, x1, x2)
## Maybe I want to use the names, too.
## Then the data.frame() command is useful
Xdf = data.frame(y, x1, x2)
Xdf ## The matrix
Xdf$y ## the vector named y...
## nice to have variable names
view raw DataCreation.R hosted with ❤ by GitHub

To leave a comment for the author, please follow the link and comment on their blog: Coffee and Econometrics in the Morning.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)