4

I want to retrieve the price of the flight of this webpage using Python 3: https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o

At first I got an error which after many hours I realized was due to the fact that I wasn't giving the webdriver enough time to load all elements. So to ensure that it had enough time I added a time.sleep like so:

time.sleep(1)

This made it work! However, I've read and was advised to not use this solution and to use WebDriverWait instead. So after many hours and several tutorials im stuck trying to pinpoint the exact CSS class the WebDriverWait should wait for.

The closest I think I've got is:

WebDriverWait(d, 1).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price")))

Any ideas on what I'm missing on?

QHarr
  • 83,427
  • 12
  • 54
  • 101

2 Answers2

6

You could use a css attribute = value selector to target, or if that value is dynamic you can use a css selector combination to positional match.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o")

#element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , '[jstcache="9322"]')))
element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , '.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl')))
print(element.text)
#driver.quit()

No results case:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome()
url ="https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o"  #"https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-11-28;c:EUR;e:1;a:FR;sd:1;t:f;tt:o"
driver.get(url)

try:
    status = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , 'p[role=status')))
    print(status.text)
except TimeoutException as e:
    element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , '.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl')))
    print(element.text)
#driver.quit()
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Hi QHarr, isn't the jstcache value generated dinamically and as such would change over time? – David García Ballester Nov 28 '18 at 20:46
  • element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , '.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl'))) – QHarr Nov 28 '18 at 20:48
  • Am getting this error: "selenium.common.exceptions.TimeoutException: Message:" so probably not locating the element – David García Ballester Nov 28 '18 at 20:55
  • increase the wait to 10? element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR , '.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl'))) – QHarr Nov 28 '18 at 20:56
  • Im getting so frustrated cause I created a new python files with your code and it works, but somehow it doesnt work in my code. And it's literally the same thing. I'm gonna keep trying to figure out what's going on. Thanks for your help! – David García Ballester Nov 28 '18 at 21:17
  • I will help if I can – QHarr Nov 28 '18 at 21:19
  • Ok I realised what the error is. Your code is using a different link than mine. My code is using this link: https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-11-28;c:EUR;e:1;a:FR;sd:1;t:f;tt:o Which coincidentally doesn't have a flight that day, so the element couldn't be found. – David García Ballester Nov 28 '18 at 21:58
  • 1
    Yeah my bad. The thing is im running the link through a while loop to get the results for the whole year and I accidentally posted the link for a day next month instead of the link which im using which is for today. Thanks for your help, I accepted your other answer aswell. – David García Ballester Nov 28 '18 at 22:12
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/184436/discussion-between-qharr-and-david-garcia-ballester). – QHarr Nov 28 '18 at 22:18
1

I may be wrong but I think you are trying to get the price of the flight trip.

If my assumption is correct, take a look at my approach. I find the Search Results list, then all the Itinerary inside the Search Results list, loop over and get all the price information. This is the best approach I can come up with and avoiding all the dynamic attributes

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

wait = 20

driver = Chrome()
driver.get("https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o")

# Get the Search Result List
search_results= WebDriverWait(driver, wait).until(EC.presence_of_element_located((By.CSS_SELECTOR , 'ol[class="gws-flights-results__result-list"]')))

# loop through all the Itinerary
for result in search_results.find_elements_by_css_selector('div[class*="gws-flights-results__collapsed-itinerary"]'):
    price = result.find_element_by_css_selector('div[class="gws-flights-results__itinerary-price"]')
    print(price.text)

Output €18

Satish
  • 1,976
  • 1
  • 15
  • 19