Full text returned in requests-html not just first

Question

I am trying to scrape specific elements of the internship page below using requests-html. I specifiy that first=True but when I print the text out it prints everything on the page starting with the element I selected instead of returning just that element.

`from requests_html import HTMLSession


url = "https://jobs.disneycareers.com/job/orlando/wdi-estimating-internship-orlando-fall- 
2022/391/30101898544"
s = HTMLSession()
r = s.get(url)
r.html.render(sleep=1)


internship_title = r.html.find("h1#job-title-scrape", first=True)
print(internship_title.text)`

have you tried `r.html.find("h1#job-title-scrape")[0]` to get the same first element? — Andrew Ryan, Jun 05 '22 at 22:42
Yes, same result as if I tried first=True. If I try print(internship_title.text[0]) then I just get the first letter, not all of the contents of the h1 tag. — Kyle Roark, Jun 05 '22 at 23:06

score 0 · Answer 1 · answered Sep 02 '22 at 11:36

I have a similar problem to the above. I investigated few hours and got something you can do.

Solution 1: People said downgrade your python version to 3.6.. However, in my case doesn't work.

Solution 2: if you get the entire page regardless of what you want, extract it first rather than specific. Simply use list comprehension or any simple for loop to extract that information.

# Whole page
internship_titles = r.html.find("h1#job-title-scrape")

# Specific 
title = [internship_title.find("h1#job-title-scrape", first=True).text for internship_title in internship_titles]

Full text returned in requests-html not just first

1 Answers1