Site icon R-bloggers

Automating and downloading Google Chrome images with Selenium

[This article was first published on R Blogs – Hutsons-hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Fork Star Watch Download

I love Nottingham Forest and have been trying to find a way to include them in one of my tutorials, as they are in the play-offs to go into the top flight leagues. This tutorial allows you to download images from Selenium and automate Google Chrome.

Can I get a live tutorial?

The live tutorial is here:

Creating the bones of the project

The first stage would be to define the dependencies:

Downloading the web driver

The next step would be to download the ChromeDriver for your Google Chrome. There are different versions that work the best with Google Chrome. To download the driver see here: https://chromedriver.chromium.org/downloads.

The next step is to install the relevant packages. I have made this simple for you with a requirements.txt in the associated repository of this project.

Set the web driver path

Then we would need to set the web driver path to where you have stored this on your local machine, or server:

Create the get images function to list URLs

The function will inspect the webpage via Javascript, go down the page and click on the thumbnails. Once it has clicked on the thumbnails it will get the image URL for the image and store this in an empty set:

This function:

Create the download image routine

Once we have a set of all the image URLs we might want to then store them in a directory on our machine? That is exactly what the next routine will do:

The download image routine does the following:

Working with our functions

What we would then need to do is source a Google Photos page link with a query string – should look like this: https://www.google.com/search?q=brennan+johnson&tbm=isch&ved=2ahUKEwi-i4WJ9Ov3AhVE0RoKHc47BZoQ2-cCegQIABAA&oq=brennan+johnson&gs_lcp=CgNpbWcQAzIKCAAQsQMQgwEQQzILCAAQgAQQsQMQgwEyCwgAEIAEELEDEIMBMgUIABCABDIFCAAQgAQyBQgAEIAEMgUIABCABDIFCAAQgAQyBQgAEIAEMgUIABCABFDMBljMBmDnCGgAcAB4AIAB_gGIAc4DkgEDMi0ymAEAoAEBqgELZ3dzLXdpei1pbWfAAQE&sclient=img&ei=82WGYv7FCMSia873lNAJ&bih=858&biw=1745&rlz=1C5CHFA_enGB980GB980 and associate this with the label of the directory to create:

Looping through each URL and label

The final step to deal with multiple list inputs above would be to loop over the labels and download them sequentially. This is what the final step does:

Step by step:

Now you should have some lovely images, in my case of Nottingham Forest players, but you could substitute this for images of cats.

Final note

I do hope you have enjoyed this whistle-stop tour into the power of Selenium for web automation.

These are the examples of the images I scraped for my Deep Learning project to predict players based off their images.

The repository that support this contains the full working code.

To leave a comment for the author, please follow the link and comment on their blog: R Blogs – Hutsons-hacks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.