1

I want to scrape the text between <> element. HTML code where I want to scrape "text" inside small (which is was: 27.00). The HTML is <> class="product-views-price-old" Was: £27.00 <> My code is:

from bs4 import BeautifulSoup
import requests
url = "https://www.petshop.co.uk/Dog"
r = requests.get(url)
soup = BeautifulSoup(r.content)
for old_price in soup.find_all("small", class_ = "product-views-price-old"):
    print(old_price)

The above code gives me nothing. Even no error. How can I scrape the text between <> tags?

HedgeHog
  • 22,146
  • 4
  • 14
  • 36

2 Answers2

3

Content is served dynamically, so you wont get it this way with requests - Take a look at this selenium code.

To get rid of text and spaces you can do:

.get_text(strip=True).replace('Was: ','')

Example

from selenium import webdriver
from bs4 import BeautifulSoup
import time

url = "https://www.petshop.co.uk/Dog"
driver = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
driver.get(url)
time.sleep(3)

html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
for old_price in soup.find_all("small", class_ = "product-views-price-old"):
    print(old_price.get_text(strip=True).replace('Was: ',''))

driver.quit()

Output

£2.20
£18.61
£27.00
£38.39
£38.39
£20.65
£1.30
£67.99
£20.65
£1.30
£54.95
£30.99
HedgeHog
  • 22,146
  • 4
  • 14
  • 36
  • Thank you for the answer. You used selenium means this was not possible with BeautifulSoup. Am I right? – Muhammad Rehan Jan 25 '21 at 17:25
  • 1
    It is not `BeautifulSoup`, it is `requests` that could not fetch that dynamic content -> https://stackoverflow.com/a/55709584/14460824 – HedgeHog Jan 25 '21 at 17:41
1

You don't need selenium and beautifulsoup for this. There is an API if you go to Network tab. Once you get the response you need to identify the key to get the value.

enter image description here

https://www.petshop.co.uk/api/items?c=3934951&commercecategoryurl=%2FDog&country=GB&currency=GBP&fieldset=search&include=facets&language=en&limit=100&n=2&offset=0&pricelevel=5&sort=custitem_bb1_qtysold%3Adesc

import requests
url = "https://www.petshop.co.uk/api/items?c=3934951&commercecategoryurl=%2FDog&country=GB&currency=GBP&fieldset=search&include=facets&language=en&limit=100&n=2&offset=0&pricelevel=5&sort=custitem_bb1_qtysold%3Adesc"
r = requests.get(url).json()
for item in r['items']:
       print(item['pricelevel2_formatted'])

Output:

£2.20
£18.61
£27.00
£5.92
£38.39
£38.39
£20.65
£1.30
£67.99
£20.65
£1.30
£54.95
£30.99
£57.95
£22.00
£46.55
£9.60
£1.99
£32.99
£30.99
£54.95
£8.21
£38.39
£57.95
£32.99
£2.65
£20.65
£10.50
£18.48
£10.50
£3.75
£2.99
£33.99
£25.00
£23.99
£1.39
£54.95
£36.99
£27.00
£49.50
£38.39
£39.59
£67.99
£32.99
£40.70
£29.69
£39.94
£31.49
£59.99
£38.39
£25.99
£67.99
£38.39
£25.99
£49.50
£39.59
£1.30
£12.90
£1.00
£44.99
£22.99
£69.99
£15.50
£2.99
£20.99
£32.99
£38.39
£15.99
£42.99
£27.12
£46.55
£52.49
£2.99
£1.99
£51.59
£2.99
£25.99
£2.99
£49.50
£18.84
£40.74
£44.99
£20.99
£39.56
£2.99
£7.09
£26.99
£18.61
£19.99
£43.99
£16.50
£12.00
£36.29
£40.40
£2.99
£35.99
£59.99
£5.50
£8.99
£57.95
KunduK
  • 32,888
  • 5
  • 17
  • 41