I am trying to scrape a site with a table of real-estate listings and I am accessing the link in each listing/row to get more info about it (surface, region). The table is 25 row on each page and it's taking around 10-12s for each page(so 25 link accessed per page) which I find really slow (there are around 850 pages). I tried using requests.Session() instead of requests.get but I can't tell if it's the same or slightly worse.
My question is this: Is using .Session() for accessing a link only once but on the same site actually slowing the script down compared to just using requests.get() or is request.session() smart enough to keep the cookies/connection for every link that redirects inside the site, i.e if I use:
s=requests.Session()
response = s.get("http://www.tunisie-annonce.com/AnnoncesImmobilier.asp")
and then I collect links inside the response above and access them through the sessions I opened from the main url as such :
def get_surfaces_and_region(session,links):
for link in links:
start = time.time()
html = session.get(link) # instead of requests.get(link)
new_page = BeautifulSoup(html.text, 'lxml')
### do stuff here ##
What happens when I access another link through the previously open session ? Is it being added to a list in case I access it again ? Would this theoretically slow down the 'get' requests since every link will be unique ? if so , Is requests only good for accessing the same url multiple times over (which I am struggling to find a situation where it makes sense).
Thank you