Description
What I am trying to do:
This code uses multithreading to automate web interactions with a search engine (via searxng_url) using SeleniumBase in headless Chrome browsers. For each question number in a list, it launches a browser, loads cookies, and performs actions defined in get_all_dumbs(). The browsers are run concurrently using ThreadPoolExecutor to speed up processing.
The issue:
As far as I know, the code works fine in Windows VM, but in Linux VM, it does not work as intended.
Setting the worker number in the Linux VM to less than 2 works, but when setting the worker to more than 1, the UC mode, as far as I understand, disconnected the driver momentarily and reconnected it again; however, for the life of me, I can not load the cookie due to the absenss of the driver. (I assume this is what caused the issue.)
Thank you for your time, and I would love to know if there is a best practice for my case.
The code:
main.py
from concurrent.futures import ThreadPoolExecutor
from utilities.get_all_dumbs import get_all_dumbs
def get_all_dumbs_wrapper(question_num):
"""Wrapper function to pass parameters to get_all_dumbs."""
get_all_dumbs(
question_num,
end,
searxng_url,
cookies_file,
cert,
images_folder,
DISC_HEADER,
REVEAL_BTN,
QA_SECTION
)
# Generate filtered list of questions to process
questions = []
.
.
.
.
questions.append(question_num)
# Process remaining questions in parallel
with ThreadPoolExecutor(max_workers=10) as executor:
executor.map(get_all_dumbs_wrapper, questions)
get_all_dumbs.py
from seleniumbase import SB
from utilities.load_cookie import load_cookie
from fake_useragent import UserAgent
# Initialize a UserAgent instance for generating random user agents
user_agent = UserAgent()
def get_all_dumbs(q, end, searxng_url, cookies_file, cert, images_folder, DISC_HEADER, REVEAL_BTN, QA_SECTION):
"""Ensure cookies are loaded for the session"""
with SB(browser="chrome",uc=True,test=False,locale_code="en",headless=True,agent=user_agent.random,devtools=False,remote_debug=False) as sb:
print(f"\n=== Processing Question {q}/{end} ===")
# Load cookies for each new browser instance
load_cookie(sb, searxng_url, cookies_file,images_folder)
.
.
.
load_cookie.py
def load_cookie(sb, searxng_url, cookies_file, images_folder):
"""Ensure cookies are loaded for the session"""
try:
sb.get(searxng_url)
sb.load_cookies(name=cookies_file)
except Exception as e:
print(f"Error loading cookies: {e}")
The output with the error:
=== Processing Question 2/611 ===
Error loading cookies: Message: Active window was already closed!
Note: it is random which question triggers the error; some of the Question survived.