Site icon R-bloggers

Releasing and open-sourcing the Using Spark from R for performance with arbitrary code series

[This article was first published on Jozef's Rblog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Over the past months, we published and refined a series of posts on Using Spark from R for performance with arbitrary code. Since the posts have grown in size and scope the blogposts were no longer the best medium to share the content in the way most useful to the readers, we decided to compile a publication instead and open-source it for all readers to use freely.

In this post, we present Using Spark from R for performance, an open-source online publication that will serve as a medium to communicate the current and future installments of the series comprehensively, including instructions on how to use it and a Docker image with all the prerequisites needed to run the code examples.

Contents

  1. Who is this book for?
  2. What are the main topics currently covered?
  3. Are the sources also available?
  4. Where can issues be raised?
  5. Acknowledgments and thank yous

Who is this book for?

The book is published at sparkfromr.com and it focuses on users who are interested in practical insights into using the sparklyr interface to gain the benefits of Apache Spark while still retaining the ability to use R code organized in custom-built functions and packages. This publication focuses on exploring the different interfaces available for communication between R and Spark using the sparklyr package.

We have also created a Docker image that lets you use the code in the book without caring for setting up all the necessary software requirements such as Java, Spark, and all the necessary R packages. A guide to using the book with that image is included as a separate chapter.

What are the main topics currently covered?

The main topics are summarized in the following chapters:

Are the sources also available?

Yes. The content is rendered and published automatically from publicly accessible git repositories, you can find the

All contributions to the above are of course most welcome.

Where can issues be raised?

In case you find any errors and other issues with the book, or simply have requests for improvements or more content features the ideal place to raise them is directly in the GitHub repositories:

Acknowledgments and thank yous

Creation of this book would not be possible without many openly available resources such as the

My thanks go to the creators and maintainers of all these amazing open-source tools.

Logos of bookdown, Apache Spark and R

Happy reading!

To leave a comment for the author, please follow the link and comment on their blog: Jozef's Rblog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.