0

I am faced with a misunderstanding (more than a problem). For a program (a bot) requiring to parse the HTML source code of a simple Youtube page (normal video page), I use the "urllib3" and "requests" libraries. The query works fine and I get a file containing HTML.

And that's when I realized that by going to a random Youtube video page and viewing the entire HTML source code (without using the development tools included in Chrome and Firefox), the code source does not represent the reality of the page.

However, in the HTML source code of a web page to which we have access we should find each visible element (graphically speaking).

However, I cannot find the links of the video recommendations inside the HTML source code. Not even most of the page for that matter.

Can someone explain this to me and recommend a way to get the total HTML source code of the page I am seeing.

PS: I understand that Selenium could be a solution.

Cordially, Kyu

  • Does this answer your question? [How to scrape dynamic webpages by Python](https://stackoverflow.com/questions/33795799/how-to-scrape-dynamic-webpages-by-python) – trincot Mar 03 '21 at 19:06

1 Answers1

0

Try the python library requests_html

import requests_html

sess = requests_html.HTMLSession()
r = sess.get(my_url)
abs_links = r.html.absolute_links
interesting_urls = [e for e in abs_links if e.find('Whatever') > 0]

This should give you all relevant links on a page.

CodeMantle
  • 1,249
  • 2
  • 16
  • 25