intGus
Posts: 1
Joined: Sun Apr 26, 2020 12:28 pm

Question about Selenium in multiprocess

Sun Apr 26, 2020 12:56 pm

Hello everyone,

I have this code that takes search strings from a CSV file, make a google image search using Selenium and return the first 10 image links for each search string (I did it like that because the google API has more limitations now and in this way, I can even choose the size of the images I want) using multiprocess and outputting to a CSV file.

The script is working as intended, I'm using all the available cores from my raspberry pi4 and it takes about 21 seconds to get 10 links. Considering it has to open a headless chromium and render all the code with javascript I think it is pretty fast. However, I'm curious about threading in python for this case. I have been reading some articles and some people find threading faster than multiprocess for this type of I/O problem. But before doing all the code to test the multithread I wanted to ask if maybe someone here faced a similar scenario with a raspberry pi and have a real answer about this.

Thanks :D

EDIT: well in case anyone is interested in something like this, my GitHub has all the code, single thread, multiprocess and multithread. As suspected, the multithread version is about 5 seconds faster than the multiprocess version measured with the time command. The point of this is for a different script to check the email, download the attachment, process the file, and get the links then respond to the email with the file containing the links.

Return to “Python”