-2

I was trying to do some data scraping from booking.com for prices. But it just keeps on returning an empty list.

If anyone can explain me what is happening i would be really thankful to them.

Here is the website from which I am trying to scrape data:

https://www.booking.com/searchresults.html?label=gen173nr-1DCAEoggI46AdIM1gEaCeIAQGYATG4AQfIAQzYAQPoAQH4AQKIAgGoAgO4AuGQ8JAGwAIB0gIkYjFlZDljM2MtOGJiMy00MGZiLWIyMjMtMWIwYjNhYzU5OGQx2AIE4AIB&sid=2dad976fd78f6001d59007a49cb13017&sb=1&sb_lp=1&src=index&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Findex.html%3Flabel%3Dgen173nr-1DCAEoggI46AdIM1gEaCeIAQGYATG4AQfIAQzYAQPoAQH4AQKIAgGoAgO4AuGQ8JAGwAIB0gIkYjFlZDljM2MtOGJiMy00MGZiLWIyMjMtMWIwYjNhYzU5OGQx2AIE4AIB%3Bsid%3D2dad976fd78f6001d59007a49cb13017%3Bsb_price_type%3Dtotal%26%3B&ss=Golden&is_ski_area=0&ssne=Golden&ssne_untouched=Golden&dest_id=-565331&dest_type=city&checkin_year=2022&checkin_month=3&checkin_monthday=15&checkout_year=2022&checkout_month=3&checkout_monthday=16&group_adults=2&group_children=0&no_rooms=1&b_h4u_keep_filters=&from_sf=1

Here is my code:

from bs4 import BeautifulSoup
import requests

html_text = requests.get("https://www.booking.com/searchresults.html?label=gen173nr-1DCAEoggI46AdIM1gEaCeIAQGYATG4AQfIAQzYAQPoAQH4AQKIAgGoAgO4AuGQ8JAGwAIB0gIkYjFlZDljM2MtOGJiMy00MGZiLWIyMjMtMWIwYjNhYzU5OGQx2AIE4AIB&sid=2dad976fd78f6001d59007a49cb13017&sb=1&sb_lp=1&src=index&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Findex.html%3Flabel%3Dgen173nr-1DCAEoggI46AdIM1gEaCeIAQGYATG4AQfIAQzYAQPoAQH4AQKIAgGoAgO4AuGQ8JAGwAIB0gIkYjFlZDljM2MtOGJiMy00MGZiLWIyMjMtMWIwYjNhYzU5OGQx2AIE4AIB%3Bsid%3D2dad976fd78f6001d59007a49cb13017%3Bsb_price_type%3Dtotal%26%3B&ss=Golden&is_ski_area=0&ssne=Golden&ssne_untouched=Golden&dest_id=-565331&dest_type=city&checkin_year=2022&checkin_month=3&checkin_monthday=15&checkout_year=2022&checkout_month=3&checkout_monthday=16&group_adults=2&group_children=0&no_rooms=1&b_h4u_keep_filters=&from_sf=1").text

soup = BeautifulSoup(html_text, 'lxml')

prices = soup.find_all('div', class_='fde444d7ef _e885fdc12')

print(prices)
Robert
  • 7,394
  • 40
  • 45
  • 64
  • you may have the most common problem: page may use `JavaScript` to add/update elements but `BeautifulSoup`/`'lxml`, `requests`/`urllib` can't run `JS`. You may need [Selenium](https://selenium-python.readthedocs.io/) to control real web browser which can run `JS`. OR try to use (manually) `DevTools` in `Firefox`/`Chrome` (tab `Network`) to see if `JavaScript` reads data from some URL. And you can try this URL with `requests`. `JS` usually get `JSON` which can be easy converted to Python dictionary (without `BS`). You can also check if page has (free) `API` for programmers. – furas Feb 28 '22 at 01:44
  • I'm not sure but `find_all` may have problem to use multi-class - and you have string `'fde444d7ef _e885fdc12'` with `space` so it is NOT single class but two classes. You may need to use CSS selector. `soup.select("div.'fde444d7ef._e885fdc12')` and there has to be `dot` before every class. – furas Feb 28 '22 at 01:46
  • but first you should check what you get form server in `html_text`. If I check position `html_text.find('fde444d7ef _e885fdc12')` then I get `-1` which means it couldn't find it. BTW: when I check HTML in browser then I see price in `` , not `
    `
    – furas Feb 28 '22 at 01:54

1 Answers1

1

After checking different possible problems I found two problems.

  1. price is in <span> but you search in <div>

  2. server sends different HTML for different browsers or devices and code needs full header User-Agent from real browser. It can't be short Mozilla/5.0. And requests as default use something like Python/3.8 Requests/2.27

from bs4 import BeautifulSoup
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0'
}

url = "https://www.booking.com/searchresults.html?label=gen173nr-1DCAEoggI46AdIM1gEaCeIAQGYATG4AQfIAQzYAQPoAQH4AQKIAgGoAgO4AuGQ8JAGwAIB0gIkYjFlZDljM2MtOGJiMy00MGZiLWIyMjMtMWIwYjNhYzU5OGQx2AIE4AIB&sid=2dad976fd78f6001d59007a49cb13017&sb=1&sb_lp=1&src=index&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Findex.html%3Flabel%3Dgen173nr-1DCAEoggI46AdIM1gEaCeIAQGYATG4AQfIAQzYAQPoAQH4AQKIAgGoAgO4AuGQ8JAGwAIB0gIkYjFlZDljM2MtOGJiMy00MGZiLWIyMjMtMWIwYjNhYzU5OGQx2AIE4AIB%3Bsid%3D2dad976fd78f6001d59007a49cb13017%3Bsb_price_type%3Dtotal%26%3B&ss=Golden&is_ski_area=0&ssne=Golden&ssne_untouched=Golden&dest_id=-565331&dest_type=city&checkin_year=2022&checkin_month=3&checkin_monthday=15&checkout_year=2022&checkout_month=3&checkout_monthday=16&group_adults=2&group_children=0&no_rooms=1&b_h4u_keep_filters=&from_sf=1"

response = requests.get(url, headers=headers)
#print(response.status)

html_text = response.text

soup = BeautifulSoup(html_text, 'lxml')

prices = soup.find_all('span', class_='fde444d7ef _e885fdc12')

for item in prices:
    print(item.text)
furas
  • 134,197
  • 12
  • 106
  • 148