0

i wanna get this "rejected" text from label i tried so many things but nothing working for me.

import bs4
import requests
url="example"

agent = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
data = requests.get(url, headers=agent)
soup = bs4.BeautifulSoup(data.text, 'html.parser')

# rejects = soup.select("label._1TSOc")
#rejects = soup.find("label._1TSOc")
#rejects = soup.find("label._1TSOc._3Gol_")
rejects  = soup.find("label",{"class":"_1TSOc"})
print(rejects) #checking either getting data or not, but OUTPUT: None    

for i in rejects:
    print(i.text) #not working

enter image description here

enter image description here

Cathrine
  • 131
  • 10
  • Can you give the url? – Bitto Mar 03 '19 at 10:27
  • actually you will have to sign up on that website because, this is admin page code – Cathrine Mar 03 '19 at 10:28
  • In the picture, it looks as if there is a space before and after the class name. Could this cause a problem. – Dragonthoughts Mar 03 '19 at 10:29
  • Can you confirm that `data.text` contains the expected html? Just in case this is dynamically generated JS or something... – match Mar 03 '19 at 10:29
  • If `data.text` is empty then your whole request isn't working. – match Mar 03 '19 at 10:31
  • Upload the content of data.text to https://pastebin.com/ and share it – balderman Mar 03 '19 at 10:31
  • @match if this css is generated through JS, can we see it on view source-code page? because this class is visible over there – Cathrine Mar 03 '19 at 10:33
  • @Cathrine You mentioned that this is the admin page. But there is no code for logging in to the site. They might simply be redirecting you to the login page. That is why you have to check the response to see what is happening. – Bitto Mar 03 '19 at 10:38
  • @balderman done,please check https://pastebin.com/tZXVqxcV – Cathrine Mar 03 '19 at 10:39
  • @BittoBennichan i am already logged in, then working with BS4, actually mainly am working with Selenium, for 2 points i have to use BS4 first already done, 2nd is what i shared here – Cathrine Mar 03 '19 at 10:42
  • The text you have posted in pastebin does not include the word '_1TSOc'. This explains why you get None – balderman Mar 03 '19 at 10:50
  • but it is visible in "view-source" page of browser – Cathrine Mar 03 '19 at 10:52
  • @balderman guide me i already shared a screenshot, how i can get that "rejected" text – Cathrine Mar 03 '19 at 10:53
  • The point here is why viewsorce shows you the elements you are looking for while requests.get(url) does not show it. Did you upload the viewsorce text or 'data.txt' ? I think you are doing viewsource to URL A while you coed fetch the data of URL B – balderman Mar 03 '19 at 10:57
  • I have shared data.txt code on pastebin & this class ._1TSOc is visible on inspect element as well as on view source code – Cathrine Mar 03 '19 at 11:01
  • @balderman i have alternate of this, but there is a bug, i think i will post another question of that bug, – Cathrine Mar 03 '19 at 11:03
  • @balderman i have posted another question, please check, may be you can answer it https://stackoverflow.com/questions/54968525/how-to-put-xpath-into-optional-mode – Cathrine Mar 03 '19 at 11:58

2 Answers2

0

Have you tried either of these?

rejects  = soup.find("label",{"class":" _1TSOc _3Gol_ "})
print(rejects.text)

or

rejects  = soup.find("label",{"data-aut-id":"statusLabel"})
print(rejects.text)
chitown88
  • 27,527
  • 4
  • 30
  • 59
0

"rejected" is not your data.text, even though it is present when you inspect the code.
That means that it's added later by some (java)scripts, and won't ever be accessible by beautifulsoup, as this does execute the scripts.
You'll have to use an headless browser to access the fully loaded and run page state, after all the scripts were loaded and executed. There are tons of answers about that on this website!
See for example Headless Browser for Python (Javascript support REQUIRED!)
You can also look at activesoup https://pypi.org/project/activesoup/ or at how to drive chrome from python.

B. Go
  • 1,436
  • 4
  • 15
  • 22
  • headless browser? can you explain please – Cathrine Mar 05 '19 at 08:42
  • The pure html page you get with beautifulsoup is the page state before running any script filling its content (= a template page). To get the result after the scripts are run, you have to download and run them, which bs doesn't do. An headless browser is a full browser, running the scripts and all, just without a GUI (windows), to be driven by programs like python. You can then go through the fully loaded page, not the "template" one. Google / stackoverflow are your friends! – B. Go Mar 05 '19 at 11:44