1

https://i.stack.imgur.com/ZEx9b.jpg

how do I retrieve the highlighted text with beautifulsoup? a example would be the best answer, thank you ;)

edit; heres the code

import requests
import pyperclip
from bs4 import BeautifulSoup
import time

url = 'https://sales.elhst.co/'

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"}

site = requests.get(url, headers=headers)
site = str(site)
if site == "<Response [200]>":
    print("Site is up..")

page = requests.get(url, headers=headers)


soup = BeautifulSoup(page.content, 'html.parser')

time.sleep(2)

target = soup.find("pp", id="copies")

print(target)

and the output is:

Site is up..
<pp id="copies"></pp>

and i wanna to get this text: https://i.stack.imgur.com/ZEx9b.jpg is there any way to do it?

Dharman
  • 30,962
  • 25
  • 85
  • 135
  • Please attach html code itself instead of link to screenshot – Pavel Aug 16 '20 at 09:23
  • Does this answer your question? [How to extract the text inside a tag with BeautifulSoup in Python?](https://stackoverflow.com/questions/44858226/how-to-extract-the-text-inside-a-tag-with-beautifulsoup-in-python) – Dharman Aug 31 '20 at 11:48

2 Answers2

1

The data you see on the page is loaded from external URL. You can try this script to print number of copies:

import re
import json
import requests


url = 'https://sales.elhst.co/socket.io/?EIO=3&transport=polling'
copies_url = 'https://sales.elhst.co/socket.io/?EIO=3&transport=polling&sid={sid}'

r = requests.get(url).text
sid = json.loads(re.search(r'(\{".*)', r).group(1))['sid']

r = requests.get(copies_url.format(sid=sid)).text
copies = json.loads(re.search(r'(\[".*)', r).group(1))[-1]

print(copies)

Prints:

0
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0
from lxml import html
import requests

page = requests.get('http://url')
tree = html.fromstring(page.content)

#This will extract the text you need
buyers = tree.xpath('//pp[@id="copies"]/text()')

It should work. But I don't know pp tag. I think it's a mistake and there should be tag <p>.

More info about lxml here.