2

I'm tryng to verify if all my page links are valid, and also something similar to me if all the pages have a specified link like contact. i use python unit testing and selenium IDE to record actions that need to be tested. So my question is can i verify the links in a loop or i need to try every link on my own? i tried to do this with __iter__ but it didn't get any close ,there may be a reason that i'm poor at oop, but i still think that there must me another way of testing links than clicking them and recording one by one.

Decebal
  • 1,376
  • 1
  • 21
  • 36
  • i need to do more than just verify the link ,rather i thought of putting all the links on a page in a list and after that use the list to verify all the elements of a page – Decebal Aug 04 '10 at 08:32

4 Answers4

1

Though the tool is in Perl, have you checked out linklint? It's a tool which should fit your needs exactly. It will parse links in an HTML doc and will tell you when they are broken.

If you're trying to automate this from a Python script, you'd need to run it as a subprocess and get the results, but I think it would get you what you're looking for.

bedwyr
  • 5,774
  • 4
  • 31
  • 49
  • i need to do more than just verify the link ,rather i thought of putting all the links on a page in a list and after that use the list to verify all the elements of a page – Decebal Aug 04 '10 at 08:32
1

I would just use standard shell commands for this:

  • You can use wget to detect broken links
  • If you use wget to download the pages, you can then scan the resulting files with grep --files-without-match to find those that don't have a contact link.

If you're on windows, you can install cygwin or install the win32 ports of these tools.

EDIT: Embed Info from the use wget to detect broken links link above:

When ever we release a public site its always a good idea to run a spider on it, this way we can check for broken pages and bad urls. WGET has a recursive download command and mixed with --spider option it will just crawl the site.

1) Download WGET

    Mac:
    http://www.statusq.org/archives/2008/07/30/1954/
    Or use macports and download wget.

    Windows:
    http://gnuwin32.sourceforge.net/packages/wget.htm

    Linux:
    Comes built in
    ----------------------------------------

2) In your console / terminal, run (without the $):

    $ wget --spider -r -o log.txt http://yourdomain.com

3) After that just locate you "log.txt" file and at the very bottom
 of the file will be a list of broken links, how many links there 
are, etc.
SherylHohman
  • 16,580
  • 17
  • 88
  • 94
Wim Coenen
  • 66,094
  • 13
  • 157
  • 251
0

What exactly is "Testing links"?

If it means they lead to non-4xx URIs, I'm afraid You must visit them.

As for existence of given links (like "Contact"), You may look for them using xpath.

Almad
  • 5,753
  • 7
  • 35
  • 53
0

You could (as yet another alternative), use BeautifulSoup to parse the links on your page and try to retrieve them via urllib2.

Wayne Werner
  • 49,299
  • 29
  • 200
  • 290