-1

I want in Python to visit a link which is processed by JavaScript like the one below:

<a href="#" class="example"> Hello World </a>

I want to find the link from an HTML String based on the class attribute or the id attribute of the <a> Element.

Is it possible to do this in Python?

tshepang
  • 12,111
  • 21
  • 91
  • 136
ant0nisk
  • 581
  • 1
  • 4
  • 17

1 Answers1

1

You can't, because that is a self-referencing link. You have already opened the document.

A # in a URL signifies a location within a document. When the URL starts with a # it is location within the current document; the browser will scroll to whatever ID is named after the #. In the following example, clicking on the <a href="#footer"> link instructs the browser to scroll the document to position the <div id="footer"> element at the top of the browser window:

<a href="#footer">to the end of the document</a>

<!-- long document follows -->

<div id="footer">Something at the bottom of the document</div>

When the URL consists only of a #, the URL is a no-op. It is a placeholder, for JavaScript to intercept the link click usually. You can ignore it altogether when processing this document with Python. Your Python HTML parser is not a browser, no JavaScript is being run to handle the mouseclick on that link element. There is not even a mouseclick.

If you are trying to process a JavaScript-driven page, you could either use a JavaScript debugger (comes with most browsers) to figure out what it is doing, or run a headless browser controlled by Python. You could use Ghost.py to do the latter:

from ghost import Ghost
ghost = Ghost()
page, extra_resources = ghost.open("http://jeanphi.fr")
assert page.http_status==200 and 'jeanphix' in ghost.content

This runs a headless Webkit browser.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343