4

Hi this website already helped me a few times fixing my problems in perl. this is the first time i have to ask a question because i can't find an answer neither on google nor on stack overflow.

What i am trying to do is to get the content between two words. But the patterns they have to match are changing. I am trying to get the product details. Brand, description, name and so on. I tried to do an regex match after another but unfortunately this doesn't work because $1 stays defined. Trying to undef the $1 variable gives me the error message "read only" which is logical. I will post my code below maybe someone has an Idea how to make it work.

#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use IO::File;
use utf8;
my $nfh = IO::File->new('test.html','w');
my $site = 'http://www.test.de/dp/';
my $sku = '1550043196';
my $url = join('',$site,$sku);
my $content = get $url;
my $name = $1 if ($content =~ m{<span id="productTitle" class="a-size-large">(.*?)</span>}gism);
print "$name\n";

# My attempt of undefying 
#undef $1;

my $marke = $1 if ($content =~ m{data-brand="(.*?)"}gism);
print "$marke\n";

Any suggestions?

Lehmann
  • 41
  • 2
  • 1
    could you provide a sample input along with expected output? – Avinash Raj Feb 19 '15 at 07:54
  • 1
    Side note, `my ($marke) = $content =~ m{data-brand="(.*?)"}gism;` is better than `my $marke = $1 if ($content =~ m{data-brand="(.*?)"}gism);` – mpapec Feb 19 '15 at 08:05

1 Answers1

5

First, never use construction like:

my $var = $val if( $some );

according to documentation:

NOTE: The behaviour of a my, state, or our modified with a statement modifier conditional or loop construct (for example, my $x if ... ) is undefined. The value of the my variable may be undef, any previously assigned value, or possibly anything else. Don't rely on it. Future versions of perl might do something different from the version of Perl you try it out on. Here be dragons.

The m// operator, when the /g modifier is specified in list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. So, as @Сухой27 says in the comment above, you should to use:

my ($some) = $str =~ m/...(...).../g;

With simple example:

use strict;
use warnings;

my $u="undefined";
my $str = q{some="string" another="one"};

#will match
my ($m1) = $str =~ m/some="(.*?)"/g;
print 'm1=', $m1 // $u, '= $1=', $1 // $u, "=\n";

#will NOT match
my ($m2) = $str =~ m/nothere="(.*?)"/g;
print 'm2=', $m2 // $u, '= $1=', $1 // $u, "=\n";

#will match another
my ($m3) = $str =~ m/another="(.*?)"/g;
print 'm3=', $m3 // $u, '= $1=', $1 // $u, "=\n";

prints:

m1=string= $1=string=
m2=undefined= $1=string=   #the $1 hold previously matched value
m3=one= $1=one=

As you can see, the $1 remains, when matching NOT successful. The documentation says:

These special variables, like the %+ hash and the numbered match variables ($1 , $2 , $3 , etc.) are dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See Compound Statements in perlsyn.)

NOTE: Failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.

So, if you don't want have the $1 defined, you can enclose the matching part into a block, like:

use strict
use warnings;

my $u="undefined";
my $str = q{some="string" another="one"};
my($m1,$m2,$m3);

{($m1) = $str =~ m/some="(.*?)"/g;}
print 'm1=', $m1 // $u, '= $1=', $1 // $u, "=\n";

{($m2) = $str =~ m/nothere="(.*?)"/g;}
print 'm2=', $m2 // $u, '= $1=', $1 // $u, "=\n";

{($m3) = $str =~ m/another="(.*?)"/g;}
print 'm3=', $m3 // $u, '= $1=', $1 // $u, "=\n";

what prints

m1=string= $1=undefined=
m2=undefined= $1=undefined=
m3=one= $1=undefined=

PS: I'm not a Perl guru, maybe others will extend/correct this answer.

Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
clt60
  • 62,119
  • 17
  • 107
  • 194