I am using a script to scrape news from many websites using newspaper3k
.
Instead of running it sequentially I tried to utilize all of my cores by using joblib.Parallel
However, it still takes A LOT of time (50 websites take around 20 minutes). I profiled the script and it turns out the majority of the time (51%) is waiting on locks from Parallel
:
Is there any way you think I can improve that? I thought of using async
but turns out Joblib doesn't work too well with it.