0

I am using HTML::TreeBuilder in order to extract data from html file. What I need to do is to:

$div->look_down(_tag => 'a', 'href' !=> 'index.html')

So I am searching for a href that is not equal to 'index.html' and one other tag but obviously !=> is not proper command for HTML::TreeBuilder. How can I achieve something like that? Can I use regular expression?

BR

Miller
  • 34,962
  • 4
  • 39
  • 60
Lenny
  • 887
  • 1
  • 11
  • 32

1 Answers1

2

There is no "not equal", but you can use a regex that matches anything but that string, like this

$div->look_down( _tag => 'a', href => qr/\A(?!index\.html\z)/i )

or you could write a subroutine that makes the check

$div->look_down( _tag => 'a', sub { lc $_[0]->attr('href') ne 'index.html' } )
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • Great! Works like a charm. Can I use something like: `$div->look_down(_tag => 'a', href => qr/\A(?!index\.html && ?!somethingelse\.html\z)/)` for multiple condition? – Lenny Sep 13 '14 at 15:41
  • @Lenny: You can add as many conditions as you like, so for instance `$div->look_down(_tag => 'a', class => 'myclass', href => qr/\A(?!index\.html\z)/i)` but if you have any subroutine checks then put them last as they are the slowest. – Borodin Sep 13 '14 at 15:57