retrieve all car links from dynamic page

Question

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--user-agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'")
#options.add_argument("headless")
driver=webdriver.Chrome(executable_path="/home/timmy/Python/chromedriver",chrome_options=options)

url="https://turo.com/search?country=US&defaultZoomLevel=7&endDate=03%2F20%2F2019&endTime=10%3A00&international=true&isMapSearch=false&itemsPerPage=200&location=Colorado%2C%20USA&locationType=City&maximumDistanceInMiles=30&northEastLatitude=41.0034439&northEastLongitude=-102.040878&region=CO&sortType=RELEVANCE&southWestLatitude=36.992424&southWestLongitude=-109.060256&startDate=03%2F15%2F2019&startTime=10%3A00"
driver.get(url)


list_of_all_car_links=[]
x=0
while True:
    html=driver.page_source
    soup = BeautifulSoup(html, "html.parser")
    for i in soup.find_all("a", href=True):
        if i['href'].startswith("/rentals") and len(i['href']) > 31 :
            link2="https://turo.com"+i['href']
            list_of_all_car_links.append(link2)
    try:
        x=scrolldown(last_height=x)
    except KeyError:
        #driver.close()
        break

i tried scolling down and then finding links but i only got part here is my scroll down function:

def scrolldown(last_height=0,SCROLL_PAUSE_TIME=3,num_tries = 2):

        # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")

        # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    new_height = driver.execute_script("return document.body.scrollHeight")

        # break condition
    if last_height == new_height:
        #print("hello")
        num_tries-=1
        if num_tries==0:
            print("Reached End of page")
            raise KeyError
        else:
            scrolldown(last_height=last_height, SCROLL_PAUSE_TIME=2,num_tries=num_tries)

    return new_height

I also tried converting html after each scroll to BeautifulSoup then find the links but didn't get all links.

what i want is to get every car link in that page.

Did you check out request url starts with `https://turo.com/api/search`. It contains everything you want (at least 200 items). And I have found vehicle url in this. — MoreFreeze, Mar 14 '19 at 07:21

QHarr · Accepted Answer · 2019-03-14T08:40:20.707

1

I would use requests and the API shown in the xhr list in dev tools. Note the items per page parameter in the query string itemsPerPage=200. You can try altering this for larger result sets.

import requests
url = 'https://turo.com/api/search?country=US&defaultZoomLevel=7&endDate=03%2F20%2F2019&endTime=10%3A00&international=true&isMapSearch=false&itemsPerPage=200&location=Colorado%2C%20USA&locationType=City&maximumDistanceInMiles=30&northEastLatitude=41.0034439&northEastLongitude=-102.040878&region=CO&sortType=RELEVANCE&southWestLatitude=36.992424&southWestLongitude=-109.060256&startDate=03%2F15%2F2019&startTime=10%3A00'
baseUrl = 'https://turo.com'
headers = {'Referer' : 'https://turo.com/search?country=US&defaultZoomLevel=7&endDate=03%2F20%2F2019&endTime=10%3A00&international=true&isMapSearch=false&itemsPerPage=200&location=Colorado%2C%20USA&locationType=City&maximumDistanceInMiles=30&northEastLatitude=41.0034439&northEastLongitude=-102.040878&region=CO&sortType=RELEVANCE&southWestLatitude=36.992424&southWestLongitude=-109.060256&startDate=03%2F15%2F2019&startTime=10%3A00',
           'User-Agent' : 'Mozilla/5.0'}

r = requests.get(url, headers = headers).json()
results = []

for item in r['list']:
    results.append(baseUrl + item['vehicle']['url'])
print(results)

edited Mar 14 '19 at 08:40

answered Mar 14 '19 at 07:57

QHarr

83,427
12
54
101

thanks, this gives exactly 200 links, how can I be sure this is the entire list? – ziji zijia Mar 14 '19 at 08:17
there is an items per page argument you can alter in the url itemsPerPage=200 – QHarr Mar 14 '19 at 08:30
even if i change the length is still 200 – ziji zijia Mar 14 '19 at 08:44
I suspect that is the max per page. Are there more than 200 available on the page when you inspect? Is it possible to pass a pagination parameter in the url ? e.g. page = 1 – QHarr Mar 14 '19 at 08:45
You currently only get results on the page itself for what is in the view port (using css selectors for example) You could try scrolling until all visible and doing a manual count to confirm. – QHarr Mar 14 '19 at 08:46
I don't think it's possible to use this method if you check my code i do exactly that, i think the problem is with the scroll down,and HTML being parser at that time – ziji zijia Mar 14 '19 at 09:05

retrieve all car links from dynamic page

1 Answers1