Stata or R

[This article was first published on Enterprise Software Doesn't Have to Suck, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently I came across a complex model written in Access with complex SQL queries all over the place. The engineer who was maintaining it and I did some analysis and agreed that the model was using SQL in an unnatural way (things SQL isn’t good at) – complex logic, formatting etc. 

We agreed to use SQL and a more powerful programming language to re-build the model. The engineer is familiar with Stata, so he quickly wrote Stata code. When I looked at the Stata code, it looked fairly easy to reproduce it in R. I’ve posted some R commands for the Stata commands I found in that code. 

What are the advantages of using Stata? Why shouldn’t I use R for this?

# 1) Stata command:
use growth_and_uptake_assumptions.dta
# Corresponding R command:
load(file="my_data.RData")
attach(my_data)
# ------------------------------------------------
# 2) Stata command:
sort year
# Corresponding R command:
my_order <- order(my_data$year)
# my_order will have the sorted index for "year"
# my_data[my_order, ] would show sorted values
# ------------------------------------------------
# 3) Stata command:
keep year *_trend
# Corresponding R command:
my_new_data <- my_data[, 1:4] # only keep the first 4 columns
# ------------------------------------------------
# 4) Stata command:
gen id = 1
# Corresponding R command:
id <- 1
# ------------------------------------------------
# 5) Stata command:
reshape wide *_trend, i(id) j(year)
# Corresponding R command:
attach(my_data)
install.packages("reshape")
library("reshape")
my_wide <- cast(my_data, wide + *_trend ~ id + year)
# ------------------------------------------------
# 6) Stata command:
merge category year using dumping_rate_append.dta
# Corresponding R command:
my_data2 <- merge(mydata,my_data1, by=c("category","year") ) # second data is loaded as my_data1
# ------------------------------------------------
# 7) Stata command:
replace rate = 0 if category== "D" | category == "E"
# Corresponding R command:
rate[category=="D" |category == "E"] = 0
# ------------------------------------------------
# 8) Stata command:
insheet using $dirprm/prmBasetrend_ByCvT_ByYear.txt, tab
# Corresponding R command:
my_data <- read.csv(file="simple.csv",head=TRUE,sep="\t")
# ------------------------------------------------
# 9) Stata command:
clear
# Corresponding R command:
rm(list = ls())
# ------------------------------------------------
# 10) Stata command:
??
# Corresponding R command:
memory.limit(size=3000) # SET MEMORY TO USE
view raw gistfile1.txt hosted with ❤ by GitHub

To leave a comment for the author, please follow the link and comment on their blog: Enterprise Software Doesn't Have to Suck.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)