Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
egen(stata cmd) compute a summary statistics by groups and store it in to a new variable. For example, the data has three variables, id, time and y, we want to compute the mean of y by for each id and then store it as a new variable mean_y.
In stata, the command would be
egen mean_y = mean(y), by(id)
In R, this task can be completed by ave
Generate dataset:
id <- rep(1:3,each=3) t<-rep(1:3,3) y<-sample(1:5,9,replace=T) my_data<-data.frame(id=id,time=t,y=y)
Orignal data:
> my_data
id time y
1 1 1 4
2 1 2 1
3 1 3 4
4 2 1 2
5 2 2 3
6 2 3 3
7 3 1 4
8 3 2 4
9 3 3 3
> within(my_data, {mean_y = ave(y,id)} )
id time y mean_y
1 1 1 4 3.000000
2 1 2 1 3.000000
3 1 3 4 3.000000
4 2 1 2 2.666667
5 2 2 3 2.666667
6 2 3 3 2.666667
7 3 1 4 3.666667
8 3 2 4 3.666667
9 3 3 3 3.666667
The default summary statistics is mean. However, we can assign a particular function to compute the summary statistics. For example, if we want to compute the sd of y by id, then we can have
within(my_data, {sd_y = ave(y,id,FUN=sd)} )
id time y sd_y
1 1 1 4 1.7320508
2 1 2 1 1.7320508
3 1 3 4 1.7320508
4 2 1 2 0.5773503
5 2 2 3 0.5773503
6 2 3 3 0.5773503
7 3 1 4 0.5773503
8 3 2 4 0.5773503
9 3 3 3 0.5773503
Remark: The within evaluate an expression in an environment created from the data.frame. In addition, it will modify the data.frame and return it back(in our case, it create new variables, mean_y or sd_y )
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
