wanna get text from label using beautifulsoup, tried all things but not working

Question

i wanna get this "rejected" text from label i tried so many things but nothing working for me.

import bs4
import requests
url="example"

agent = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
data = requests.get(url, headers=agent)
soup = bs4.BeautifulSoup(data.text, 'html.parser')

# rejects = soup.select("label._1TSOc")
#rejects = soup.find("label._1TSOc")
#rejects = soup.find("label._1TSOc._3Gol_")
rejects  = soup.find("label",{"class":"_1TSOc"})
print(rejects) #checking either getting data or not, but OUTPUT: None    

for i in rejects:
    print(i.text) #not working

actually you will have to sign up on that website because, this is admin page code — Cathrine, Mar 03 '19 at 10:28
In the picture, it looks as if there is a space before and after the class name. Could this cause a problem. — Dragonthoughts, Mar 03 '19 at 10:29
Can you confirm that `data.text` contains the expected html? Just in case this is dynamically generated JS or something... — match, Mar 03 '19 at 10:29
If `data.text` is empty then your whole request isn't working. — match, Mar 03 '19 at 10:31
Upload the content of data.text to https://pastebin.com/ and share it — balderman, Mar 03 '19 at 10:31
@match if this css is generated through JS, can we see it on view source-code page? because this class is visible over there — Cathrine, Mar 03 '19 at 10:33
@Cathrine You mentioned that this is the admin page. But there is no code for logging in to the site. They might simply be redirecting you to the login page. That is why you have to check the response to see what is happening. — Bitto, Mar 03 '19 at 10:38
@BittoBennichan i am already logged in, then working with BS4, actually mainly am working with Selenium, for 2 points i have to use BS4 first already done, 2nd is what i shared here — Cathrine, Mar 03 '19 at 10:42
The text you have posted in pastebin does not include the word '_1TSOc'. This explains why you get None — balderman, Mar 03 '19 at 10:50
@balderman guide me i already shared a screenshot, how i can get that "rejected" text — Cathrine, Mar 03 '19 at 10:53
The point here is why viewsorce shows you the elements you are looking for while requests.get(url) does not show it. Did you upload the viewsorce text or 'data.txt' ? I think you are doing viewsource to URL A while you coed fetch the data of URL B — balderman, Mar 03 '19 at 10:57
I have shared data.txt code on pastebin & this class ._1TSOc is visible on inspect element as well as on view source code — Cathrine, Mar 03 '19 at 11:01
@balderman i have alternate of this, but there is a bug, i think i will post another question of that bug, — Cathrine, Mar 03 '19 at 11:03
@balderman i have posted another question, please check, may be you can answer it https://stackoverflow.com/questions/54968525/how-to-put-xpath-into-optional-mode — Cathrine, Mar 03 '19 at 11:58

score 0 · Answer 1 · answered Mar 04 '19 at 10:09

0

Have you tried either of these?

rejects  = soup.find("label",{"class":" _1TSOc _3Gol_ "})
print(rejects.text)

or

rejects  = soup.find("label",{"data-aut-id":"statusLabel"})
print(rejects.text)

answered Mar 04 '19 at 10:09

chitown88

27,527
4
30
59

first one i already tried, but not sure about the 2nd one, i will let you know – Cathrine Mar 04 '19 at 10:15

B. Go · Answer 2 · 2019-03-05T11:40:33.687

0

"rejected" is not your data.text, even though it is present when you inspect the code.
That means that it's added later by some (java)scripts, and won't ever be accessible by beautifulsoup, as this does execute the scripts.
You'll have to use an headless browser to access the fully loaded and run page state, after all the scripts were loaded and executed. There are tons of answers about that on this website!
See for example Headless Browser for Python (Javascript support REQUIRED!)
You can also look at activesoup https://pypi.org/project/activesoup/ or at how to drive chrome from python.

edited Mar 05 '19 at 11:40

answered Mar 04 '19 at 12:15

B. Go

1,436
4
15
22

headless browser? can you explain please – Cathrine Mar 05 '19 at 08:42
The pure html page you get with beautifulsoup is the page state before running any script filling its content (= a template page). To get the result after the scripts are run, you have to download and run them, which bs doesn't do. An headless browser is a full browser, running the scripts and all, just without a GUI (windows), to be driven by programs like python. You can then go through the fully loaded page, not the "template" one. Google / stackoverflow are your friends! – B. Go Mar 05 '19 at 11:44

wanna get text from label using beautifulsoup, tried all things but not working

2 Answers2