0

I've got a list of URLs in an array:

http://www.site.sx/doc1.html
http://www.site.sx/doc2.html
http://www.site.sx/doc3.html
.
.
.

Let's view the contents of the first page, namely doc1.html:

<?xmlversion = "1.0" encoding = "utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
     "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <title>Birds</title>
   </head>

   <body>
      <p>Some bird's feather's aren't actually blue, they're clear.</p>
      <!--LOOK HERE--><p id = "abc123FACT1xyz789">There exists an insect that makes 100-decibel sounds.</p> 
   </body>
</html>

Now, let's view the contents of the second page, namely doc2.html:

<?xmlversion = "1.0" encoding = "utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
     "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <title>Cats</title>
   </head>

   <body>
      <p>Moota goes from house to house.</p>
      <!--LOOK HERE--><p id = "abc123FACT2xyz789">Falling from a higher altitude might be better than a lower one.</p> 
   </body>
</html>

doc3.html will have the same abc123.....xyz789-type of pattern for its ìd value, and so will the rest of the pages in my array. I want to capture the text content of each one. There is only one id value in each document with this particular pattern. Of course, there are multiple id values all over the document in reality, but--for sake of simplicity--we can disregard this.


BIG PICTURE: I want to put each match in like this:

$tree->look_down( _tag => 'p' , id => "abc123.*xyz789")->as_text; # NOT SURE HOW TO MAKE AN ARRAY OF MATCHES...
user3404787
  • 11
  • 1
  • 6

1 Answers1

0
my $match = $tree->look_down( _tag => 'p' , id => qr{abc123.*xyz789} )->as_text;

This will get what I'm after.

Miller
  • 34,962
  • 4
  • 39
  • 60
user3404787
  • 11
  • 1
  • 6