3

I am trying to scrape specific elements of the internship page below using requests-html. I specifiy that first=True but when I print the text out it prints everything on the page starting with the element I selected instead of returning just that element.

`from requests_html import HTMLSession


url = "https://jobs.disneycareers.com/job/orlando/wdi-estimating-internship-orlando-fall- 
2022/391/30101898544"
s = HTMLSession()
r = s.get(url)
r.html.render(sleep=1)


internship_title = r.html.find("h1#job-title-scrape", first=True)
print(internship_title.text)`
Kyle Roark
  • 31
  • 1
  • have you tried `r.html.find("h1#job-title-scrape")[0]` to get the same first element? – Andrew Ryan Jun 05 '22 at 22:42
  • Yes, same result as if I tried first=True. If I try print(internship_title.text[0]) then I just get the first letter, not all of the contents of the h1 tag. – Kyle Roark Jun 05 '22 at 23:06

1 Answers1

0

I have a similar problem to the above. I investigated few hours and got something you can do.

Solution 1: People said downgrade your python version to 3.6.. However, in my case doesn't work.

Solution 2: if you get the entire page regardless of what you want, extract it first rather than specific. Simply use list comprehension or any simple for loop to extract that information.

# Whole page
internship_titles = r.html.find("h1#job-title-scrape")

# Specific 
title = [internship_title.find("h1#job-title-scrape", first=True).text for internship_title in internship_titles]


code_conundrum
  • 529
  • 6
  • 12