Six lines to install and start SparkR on Mac OS X Yosemite

Yuki Katoh

6 years ago

[This article was first published on Opiate for the masses, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I know there are many R users who like to test out SparkR without all the configuration hassle. Just these six lines and you can start SparkR from both RStudio and command line.

One line for Spark and SparkR

Apache Spark is a fast and general-purpose cluster computing system

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R

Six lines to start SparkR

The first three lines should be called in your command line.

brew update # If you don't have homebrew, get it from here (http://brew.sh/)
brew install hadoop # Install Hadoop
brew install apache-spark # Install Spark

You can already start SparkR shell by typing this in your command line;

SparkR

If you like to call it from RStudio, execute the rest in R

spark_path <- strsplit(system("brew info apache-spark",intern=T)[4],' ')[[1]][1] # Get your spark path
.libPaths(c(file.path(spark_path,"libexec", "R", "lib"), .libPaths())) # Navigate to SparkR folder
library(SparkR) # Load the library

That’s all.
Now this should run in your RStudio

sc <- sparkR.init()
sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sqlContext, iris) 
head(df)
# Sepal_Length Sepal_Width Petal_Length Petal_Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

Enjoy!

Codes

The full codes are available from here.

Six lines to install and start SparkR on Mac OS X Yosemite was originally published by Kirill Pomogajko at Opiate for the masses on September 21, 2015.

To leave a comment for the author, please follow the link and comment on their blog: Opiate for the masses.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.