Open specific "#" link that is processed by Javascript?

Question

I want in Python to visit a link which is processed by JavaScript like the one below:

<a href="#" class="example"> Hello World </a>

I want to find the link from an HTML String based on the class attribute or the id attribute of the <a> Element.

Is it possible to do this in Python?

http://stackoverflow.com/questions/3075550/how-can-i-get-href-links-from-html-code — user2485710, Jun 30 '13 at 21:49
It's not clear what you're starting with, or exactly what you're asking. Do you have an HTML file and you want to parse out `href` attribute for various `` tags in it? By "open a link", do you mean in a web browser, or in the background using something like `urllib2`? — Ben Hoyt, Jun 30 '13 at 21:51
I want to open a link with href="#" and class="example".... If the python script finds that the class is equal to "example", open it! However, how can I open it when href is "#"? — ant0nisk, Jun 30 '13 at 21:57
A link with a hash wouldn't go anywhere if you need to trigger is with python you could use a headless browser — dm03514, Jun 30 '13 at 22:04

score 1 · Answer 1 · answered Jun 30 '13 at 22:02

You can't, because that is a self-referencing link. You have already opened the document.

A # in a URL signifies a location within a document. When the URL starts with a # it is location within the current document; the browser will scroll to whatever ID is named after the #. In the following example, clicking on the <a href="#footer"> link instructs the browser to scroll the document to position the <div id="footer"> element at the top of the browser window:

<a href="#footer">to the end of the document</a>

<!-- long document follows -->

<div id="footer">Something at the bottom of the document</div>

When the URL consists only of a #, the URL is a no-op. It is a placeholder, for JavaScript to intercept the link click usually. You can ignore it altogether when processing this document with Python. Your Python HTML parser is not a browser, no JavaScript is being run to handle the mouseclick on that link element. There is not even a mouseclick.

If you are trying to process a JavaScript-driven page, you could either use a JavaScript debugger (comes with most browsers) to figure out what it is doing, or run a headless browser controlled by Python. You could use Ghost.py to do the latter:

from ghost import Ghost
ghost = Ghost()
page, extra_resources = ghost.open("http://jeanphi.fr")
assert page.http_status==200 and 'jeanphix' in ghost.content

This runs a headless Webkit browser.

Isn't there any library to do this? Mechanize or any other library? — ant0nisk, Jun 30 '13 at 22:04
See [Simulate user browsing by code](http://stackoverflow.com/a/15177624) — Martijn Pieters, Jun 30 '13 at 22:05

Open specific "#" link that is processed by Javascript?

1 Answers1