Why is multithreading Selenium lousy on MacOS?
This blog post might be the start of a series, depending on how much bandwidth I have to investigate this further...
I've been working on a new data problem that has necessitated using Selenium to extract information expediently. To further speed up the process because I'm impatient as hell, I decided to utilize the ThreadPoolExecutor from the  concurrent.futures in my python script to spin up a bunch of Chrome instances like this:
def setup_driver():
    chrome_options = Options()
    chrome_options.add_argument("--headless=new")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--window-size=1920,1080")
    return webdriver.Chrome(options=chrome_options)
    
def search_range(range_tuple):
    start_num, end_num, thread_num = range_tuple
    driver = setup_driver()
    
    # ...rest of selenium searching / parsing / processing logic
    
def main():
    start_entry = 0000
    end_entry = 5000
    max_threads = 10
    chunk_size = (end_entry- start_entry) // max_threads
    ranges = []
    for i in range(max_threads):
        range_start = start_entry + (i * chunk_size)
        range_end = range_start + chunk_size - 1 if i < max_threads else end_entry
        ranges.append((range_start, range_end, i))
    with ThreadPoolExecutor(max_workers=max_threads) as executor:
        futures = [executor.submit(search_range, range_) for range_ in ranges]
        for future in futures:
            try:
                future.result()
            except Exception as e:
                print(f"Thread crashed with error: {str(e)}")
                traceback.print_exc()
The chrome_options specified are mainly to optimize performance since I am running it headless.
I have two machines with similar(ish, though now I'm doubting this) specs and bought around the same time in 2022:
- Lenovo Thinkpad X1 Carbon 10th Gen - 32 gb RAM (running Ubuntu)
 - Macbook Pro - Apple M1 Pro - 32 gb RAM
 
The M1 performance with the above code was terrible (I think it's the first time I've really heard my fans spin up). Inspecting the performance in htop was practically bewildering, especially when I then looked at the Thinkpad running the exact same script.
MacOS
At startup

Running script

Linux
At startup

Running script

Interesting Observations
- From the start, the number of tasks on Linux is ~1/5th of macOs.
 - On macOS the CPU usage on all my cores shot up to 100% almost immediately after the script started running.
 - Linux seems to never show a count for 
runningprocesses (though the script is obviously running, and I could see many Chrome processes listed in htop). On macOS this consistently showed up at10while I was running the script. - the 
Load averagewas also substantially higher on macOS vs Linux - The memory usage on MacOS was also more than 2x that of Linux
 
Next Steps?
I don't have time to dig into this right now, but if I manage to revisit it, I think the first step would be to try replicating the results in containers. It looks like there's actually a macOS VM via Docker-OSX, so that might be a good place to start. A bit of googling also revealed this issue, but seeing as it was resolved over 2 years ago, I doubt this is still the problem.
For now I'd say, proceed with caution if you're going to try multithreading with Selenium on a Mac M1 (or use the opportunity to warm your lap in the dead of winter).