Site icon R-bloggers

RvsPython #3: Setting up Selenium (Limitations with the RSelenium package; getting past them)

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Selenium is a powerful library available for both Python and R (the R version is called RSelenium) which can automate tasks such as form filling, job applications, CRM system administration and many other tasks. That being said Selenium can be used as well to do a lot of harm such as filling up forms with fake answers, making bots to create fake views for Youtube and other nefarious purposes.

With this in mind, I can only think of what Peter Parker was told by Uncle Ben:

Make the right choice, choose good over evil.

This blog post is about how setting up Selenium on R and Python went for me, If you can relate to this or have any insight, please leave a comment below!

Setting up Selenium on Python:

Learning how to use Selenium on Python took me about 10 minutes to figure out. All I needed to do was download chromedriver and install selenium pip install selenium and I was ready to start working with it.

I was even able to do some form automation with it:

My experience with RSelenium

From the offical documentation RSelenium is reccomended to be ran on Docker.

Coming from Python and wanting to do this in R presented an inconvenience for me as my main machine does not support virtualization- which disqualifies me from even being able to install Docker on the machine which I have been working on.

This left me with no other choice but to use Selenium strictly in Python.

While Chromedriver reccomends that it be ran on a VM, it is not a requirement, and I was able to use it in Python. My experience with RSelenium is that it is impossible to use it without Docker or something similar, which is disappointing as I wanted to see how RSelenium matched up.

Getting past the limitations

If you are really set on wanting to use Selenium in an R framework (maybe because you need to do some data wrangling or want to use tidyverse as part of your project, etc.), I would recommend writing the script in python and executing it in R with the the reticulate package and have something like:

reticulate::py_run_file("path_to_python_file")

...

...


(Rest of your R Code)


Let me reiterate you can learn how to use Selenium in Python in around 10 minutes, so the learning curve is as difficult as finding a solution for RSelenium and will integrate in R code thanks to the reticulate package.

So as things look now- unless things change, my Selenium work will have to be written in Python.

Conclusion

This post originally was going to be one where I was going to compare the use and speed of Selenium in R and Python, but the inability to install Docker on my computer made me unable to do use the RSelenium package.

I’m sure I am not the only one who faced this challenge, so I thought I would share my thoughts about how to get around it.

If you have a better solution- please feel free to share it with me as I would want to do a comparison between Python and R using Selenium!

To leave a comment for the author, please follow the link and comment on their blog: r – bensstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.