I'm writing this nice tool which will return every link on a web page. I tried to do this with help from another 'stackoverflow' question and got this code meanwhile:
import httplib2
from bs4 import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request(url)
for link in BeautifulSoup(response, parse_only=SoupStrainer('a')):
if link.has_attr('href'):
print link['href']
I thought that it worked, but unfortunately I saw that a lot of links are missing, I believe it's because some of the links are generated when DOM comes to action and BeautifulSoup does not takes it in consideration.