0

I am trying to get food names from this menu, but for some reason I am not able to get the full list of items, I am getting only 9 items. Using google developer, I can clearly see that the number of elements containing the class name that I indicated is definitely higher than 9

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

path="/Users/ruhanamirza/Downloads/chromedriver"
driver=webdriver.Chrome(path)
driver.get('https://wolt.com/az/aze/baku/venue/gurmania-winter-park')
try:
    modules=WebDriverWait(driver,120).until(
        EC.presence_of_all_elements_located((By.CLASS_NAME,"MenuItem-module_content__mNrbB"))
    )
    for module in modules:
        name=module.find_element(By.CLASS_NAME,"MenuItem-module_name__iqvnU")
        print(name.text)
finally:
    driver.quit()

2 Answers2

1

Doing it with selenium, we can observe that, on page scroll, elements are being created dynamically, with images being pulled from APIs like https://imageproxy.wolt.com/menu/menu-images/5e75dca9494db98d926e52e3/a96d46e6-feb0-11ec-9cdb-d605cab88f2d_img_20220708_132819.jpeg?w=200. One way of obtaining this info is with requests, like below. Another way is to scroll the page .. maybe 75% of the active view, collect the items in view and add them to some list, and do this until we reach page bottom. See below for another solution using requests & BeautifulSoup.

These are wine names, not food names tho... is this what you're after?

import requests
from bs4 import BeautifulSoup

r = requests.get('https://wolt.com/az/aze/baku/venue/gurmania-winter-park')
soup = BeautifulSoup(r.text, 'html.parser')
titles = soup.find_all('p', {'data-test-id': 'menu-item.name'})
for t in titles:
    title = t.text
    print(title)

Result:

Hillside Blush Rose, 750 ml
Hillside Classico, 750 ml
Hillside Saperavi, 750 ml
Hillside Reserve, 750 ml
Hillside Pinot Grigio, 750 ml
Hillside Image, 750 ml
Hillside Prestige, 750 ml
Hillside Caucasus, 750 ml
Hillside Rose, 750 ml
Gurmania Rkatsiteli
Madrasa by Gurmania 750 ml
Yarimada Shiraz Rose 2015 , 750 ml
Yarimada Chardonnay 2014, 750 ml
Merlot by Gurmania 750 ml
Yarimada Madrasa Rose 2012 , 750 ml
Meyseri Mercan 2018 Orqanik 750 ml
Gurmania® Rose, 750 ml
Chabiant Vino Raro, 750 ml
Gurmania Saperavi Turş
Meyseri Innabi 2018 750 ml
Yarimada Cabernet Sauvignon 2016, 750 ml
Syrah by Gurmania 750 ml
Yarimada Muscat 2014, 750 ml
Meyseri Sedef Orqanik 750 ml
Meyseri Bulluri 2018 750 ml
Pomegranat by Gurmania 750 ml
Hillside Cuvee Qırmızı Turş, 750 ml
Hillside Nectar Red Desert , 750 ml
Hillside Chardonnay Ağ Kəmturş, 750 ml
Hillside Pomegranate, 750 ml
Hillside Sauvignon Blanc Ağ Turş, 750 ml
Spiritus Vini, Matrasa 2018 750 ml
Traminer by Gurmania 750 ml
Muscat by Gurmania 750 ml
Alazani Valley Qırmızı by Gurmania 750 ml
Alazani Valley Ağ by Gurmania 750 ml
Chardonnay by Gurmania 750 ml
Chabian Bayan Shira
Vine Ponto® Mtsvane, 750 ml
Vine Ponto® Кhikhvi, 750 ml
Vine Ponto® Rkatsiteli, 750 ml
Zigu
Shavnabada Monastery Wine 750ml
8000 Millennium, 750 ml
Okro Gold Kəmşirin, 750 ml
Marani Milorauli, Trio Turş Kəhrəba Şərabı, 750ml
Gogushika Kvevri Wine Kisi, 750 ml
Umano Tsinandali, 750 ml
Mtsvane White Kəmşirin, 750 ml
Usakhelauri®, 750 ml

##################

Another solution, using requests only and accessing the website api:

import requests
from bs4 import BeautifulSoup
import pandas as pd

r = requests.get('https://restaurant-api.wolt.com/v4/venues/slug/gurmania-winter-park/menu?unit_prices=true&show_weighted_items=true')
obj = r.json()['items']
df = pd.DataFrame(obj)
df

This returns a dataframe with 1609 rows × 31 columns:

    alcohol_percentage  allowed_delivery_methods    baseprice   category    checksum    description dietary_preferences disabled_info   enabled exclude_from_discounts  has_extra_info  id  image   image_blurhash  mandatory_warnings  max_quantity_per_purchase   name    no_contact_delivery_allowed options original_price  quantity_left   quantity_left_visible   restrictions    sell_by_weight_config   tags    times   type    unit_info   unit_price  validity    vat_percentage
0   120 [takeaway, homedelivery]    1360    000000000000000000000002    52ad71bc4371a195c9a6e91f26ab977f        []  None    True    False   False   62d145d3a67304b52f01e9c8    https://wolt-menu-images-cdn.wolt.com/menu-ima...   j1slBb4I008y2;cRXr8QYM4y;LTb    []  None    Hillside Blush Rose, 750 ml False   []  1700.0  None    False   [{'age_limit': None, 'type': 'alcohol'}]    None    []  [{'available_days_of_week': [1, 2, 3, 4, 5, 6,...   deal    None    None    {'end_date': 1658691900000, 'start_date': 1657...   18
1   120 [takeaway, homedelivery]    2240    000000000000000000000002    53718ff72c4d46f753b736c638b7a697        []  None    True    False   False   62d145d3a67304b52f01e9cb    https://wolt-menu-images-cdn.wolt.com/menu-ima...   j1nQQ;Lu8y0h0z8Q;:pl804yTtXK    []  None    Hillside Classico, 750 ml   False   []  2800.0  None    False   [{'age_limit': None, 'type': 'alcohol'}]    None    []  [{'available_days_of_week': [1, 2, 3, 4, 5, 6,...   deal    None    None    {'end_date': 1658691900000, 'start_date': 1657...   18
2   120 [takeaway, homedelivery]    1360    000000000000000000000002    9830017ad47cb0de5eaf58ceb9130c7f        []  None    True    False   False   62d145d3a67304b52f01e9cc    https://wolt-menu-images-cdn.wolt.com/menu-ima...   j1mA8PLu;K4i0Jh4XscQYw4igQXs    []  None    Hillside Saperavi, 750 ml   False   []  1700.0  None    False   [{'age_limit': None, 'type': 'alcohol'}]    None    []  [{'available_days_of_week': [1, 2, 3, 4, 5, 6,...   deal    None    None    {'end_date': 1658691900000, 'start_date': 1657...   18
Barry the Platipus
  • 9,594
  • 2
  • 6
  • 30
  • Thank you very much for your reply! The problem with Beautiful Soup is that it returns only top 50 tags, not the full list. I have posted about this here :https://stackoverflow.com/questions/73010716/beautifulsoup-find-all-returns-only-first-50-tags/73010757?noredirect=1#comment128955860_73010757 – Masail Guliyev Jul 24 '22 at 08:39
  • How about solution #3, where you scrape the api and get back dataframe with 1609 rows × 31 columns? – Barry the Platipus Jul 24 '22 at 08:41
  • With Selenium, I was thinking that the WebDriverwait method would solve the problem of items being loaded on the screen, but seems like it is insensitive to the number of seconds that I put in. How can tell with selenium to open the page, scroll to the bottom and then locate all the elements? – Masail Guliyev Jul 24 '22 at 08:42
  • For #solution3, how do I get the link to the json file, without going to google developer? Ultimately, I want to provide the URL to the store/restaurant and get the JSON file that you have posted – Masail Guliyev Jul 24 '22 at 08:46
  • 1
    For solution 3, you just concatenate `https://restaurant-api.wolt.com/v3/venues/slug/` with the result [1] of original url split by '/venue/', with `/menu`. it should make sense to you. You get a json response. Btw, I also responded to your other bountied question. – Barry the Platipus Jul 24 '22 at 12:13
  • Thanks, this is helpful. The only caveat that this json is only for english version of the menu – Masail Guliyev Jul 26 '22 at 08:14
0

To print the text of the items instead of presence_of_all_elements_located() you have to induce WebDriverWait for visibility_of_all_elements_located() and using list comprehension you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.execute("get", {'url': 'https://wolt.com/az/aze/baku/venue/gurmania-winter-park'})
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[contains(@data-localization-key, 'accept')]//div[starts-with(@class, 'Button__Content')]"))).click()
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p[data-test-id='menu-item.name']")))])
    
  • Using XPATH:

    driver.execute("get", {'url': 'https://wolt.com/az/aze/baku/venue/gurmania-winter-park'})
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[contains(@data-localization-key, 'accept')]//div[starts-with(@class, 'Button__Content')]"))).click()
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@data-test-id='menu-item.name']")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

    ['Hillside Blush Rose, 750 ml', 'Hillside Classico, 750 ml', 'Hillside Saperavi, 750 ml', 'Hillside Reserve, 750 ml', 'Hillside Pinot Grigio, 750 ml', 'Hillside Image, 750 ml', 'Hillside Prestige, 750 ml', 'Hillside Caucasus, 750 ml', 'Hillside Rose, 750 ml']
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352