2

I parse a string to HTML and extract tables from it.

The tables have two columns: 1st single (key), 2nd multi-value (values)

I want to store the values in a hash to an arrays.

use strict;
use warnings;

use Data::Dumper qw(Dumper);

my $html='
<p class="auto-cursor-target"><br /></p>
<table class="wrapped">
<colgroup><col style="width: 50.0px;" /><col style="width: 29.0px;" />
</colgroup>
<tbody>
<tr><th><p>Wikispace</p></th><th><p>right</p></th></tr>
<tr><td>mimi</td><td>right1</td></tr>
<tr><td colspan="1">mama</td><td colspan="1">right3,right2</td></tr>
</tbody>
</table>
<p class="auto-cursor-target"><br /></p>
';

use HTML::TableExtract;
my $te = HTML::TableExtract->new( headers => [qw(Wikispace right)] );
$te->parse($html);

my %known;
foreach my $ts ($te->tables) {
   foreach my $row ($ts->rows) {
     print @$row[0], ":::", @$row[1], ":  ";
     foreach my $val (split(/,/,@$row[1])) {
             print $val, ";";
             if (! $known{@$row[0]}) {
               my @arr = ($val);
               @known{@$row[0]}=\@arr;
             } else {
                     # my @arr = \@known{@$row[0]};
                     #              push (@arr, $val);
                     #         print Dumper @arr;
                     push (@$known{@$row[0]}, $val);
             };
     }
     print "\n";
   }
 }

print Dumper \%known;

What am I doing wrong? What's wrong with the last push, and how would you do it differently?

Also is there no way to assign an array directly to a hash (dictionary) instead of first having to generate an array and later linking its address?

ikegami
  • 367,544
  • 15
  • 269
  • 518
mcaustria
  • 123
  • 6
  • "_is there no way to assign an array directly to a hash (dictionary) instead of first having to generate an array and later linking its address?_" -- no, there isn't. What would "directly" even mean? -- if one could "_assign the array_" as a value for a key, what would be that value? (All elements, joined perhaps? The first one? The last one?) A dictionary "value" has to be a single-valued thing (not a collection of values like an array), in any language. In Perl it's a reference (to array or hash). That's how you build [complex data structures](https://perldoc.perl.org/perldsc) – zdim Jun 15 '23 at 17:06
  • (in languages in which one seemingly assigns an array (or equivalent) some kind of a pointer is taken and assigned, otherwise how would one retrieve that array later) – zdim Jun 15 '23 at 17:07

3 Answers3

2

The overall approach is fine but there are many basic errors throughout. I'd suggest to first make a good go over a solid introductory material, instead of suffering with basic notions and syntax of the language.

Basic errors: that $row is an array reference (often called "arrayref" for short) so to extract an element you need $row->[0]; then, those elements themselves are not arrayrefs so you can't dereference them (@{ $row->[0] } is wrong). And, the headers you specify are wrong -- your document doesn't have such headers.

I don't fully understand the whole purpose but here is youor program cleaned up so that it works

use strict;
use warnings;
use feature 'say';

use Data::Dumper qw(Dumper);

my $html='<p class="auto-cursor-target"><br /></p><table class="wrapped"><colgroup><col style="width: 50.0px;" /><col style="width: 29.0px;" /></colgroup><tbody><tr><th><p>Wiki    space</p></th><th><p>right</p></th></tr><tr><td>mimi</td><td>right1</td></tr><tr><td colspan="1">mama</td><td colspan="1">right3,right2</td></tr></tbody></table><p class="auto-    cursor-target"><br /></p>';

use HTML::TableExtract;

my $te = HTML::TableExtract->new( headers => ['Wiki    space', 'right'] );
$te->parse($html);

my %known;
foreach my $ts ($te->tables) {
    #say "ts: $ts";
    foreach my $row ($ts->rows) {
        #say "row: @{$row}";
        foreach my $val ( split /,/, $row->[1] ) {
            print $val, ";";
            if (not $known{$row->[0]}) {
                $known{$row->[0]} = [ $val ];
            }
            else {
                push @{$known{$row->[0]}}, $val;
            };
        }
        say '';
    }
}

print Dumper \%known;

This prints

right1;
right3;right2;
$VAR1 = {
          'mimi' => [
                      'right1'
                    ],
          'mama' => [
                      'right3',
                      'right2'
                    ]
        };
zdim
  • 64,580
  • 5
  • 52
  • 81
  • Basic docs for references: tutorial [perlreftut](https://perldoc.perl.org/perlreftut), reference [perlref](https://perldoc.perl.org/perlref), complex data structures [perldsc](https://perldoc.perl.org/perldsc) – zdim Jun 15 '23 at 16:52
  • In newer Perls instead of `@{ $h{$key} }` we can do `$h{$key}->@*`. See [postfix dereference](https://perldoc.perl.org/perlref#Postfix-Dereference-Syntax) – zdim Jun 15 '23 at 16:58
2

See Perl Dereferencing Syntax. We see that

@$known{ ... }

is short for

@{ $known }{ ... }

But you don't have a scalar $known. You want

@{ $known{ ... } }

or

$known{ ... }->@*

This gives us

for my $val ( split /,/, $row->[1] ) {
   if ( !$known{ @$row[0] } ) {
      my @arr = $val;                       # Useless parens removed.
      @known{ @$row[0] } =  \@arr;
   } else {
      push @{ $known{ @$row[0] } }, $val;
   }                                        # Useless `;` removed.
}

But let's clean up your code.

  1. Using an array slice is discouraged when the slice is just one element.

    @{ $row }[0]    # Array slice (via reference), using infix syntax
    

    should be

    ${ $row }[0]    # Array index (via reference), using infix syntax
    

    Cleaner:

    $row->[0]       # Array index (via reference), using the postfix/arrow syntax
    

    This gives us

    for my $val ( split /,/, $row->[1] ) {
       if ( !$known{ $row->[0] } ) {
          my @arr = $val;
          $known{ $row->[0] } = \@arr;
       } else {
          push @{ $known{ $row->[0] } }, $val;
       }
    }
    
  2. my @a = ...; \@a can be shortened to [ ... ].

    This gives us

    for my $val ( split /,/, $row->[1] ) {
       if ( !$known{ $row->[0] } ) {
          $known{ $row->[0] } = [ $val ];
       } else {
          push @{ $known{ $row->[0] } }, $val;
       }
    }
    
  3. We don't need that if statement.

    for my $val ( split /,/, $row->[1] ) {
       $known{ $row->[0] } //= [];
       push @{ $known{ $row->[0] } }, $val;
    }
    

    We can even combine those two inner statements.

    for my $val ( split /,/, $row->[1] ) {
       push @{ $known{ $row->[0] } //= [] }, $val;
    }
    

    In fact, thanks to autovivification, @{ EXPR //= [] } can be written as @{ EXPR }. Perl will automatically create the array if needed.

    for my $val ( split /,/, $row->[1] ) {
       push @{ $known{ $row->[0] } }, $val;
    }
    
  4. You can push multiple values at once.

    That means your entire inner loop can be reduced to the following:

    push @{ $known{ $row->[0] } }, split /,/, $row->[1];
    
  5. Finally, if the first column is a key (i.e. unique values), then we don't need push at all.

    $known{ $row->[0] } = [ split /,/, $row->[1] ];
    
ikegami
  • 367,544
  • 15
  • 269
  • 518
1

You get a syntax error on the line:

                 push (@$known{@$row[0]}, $val);

because you declared the variable as a hash (%known), but you are trying to access it as a scalar ($known).

Here is a simpler version of your code that runs without errors:

use strict;
use warnings;

use Data::Dumper qw(Dumper);

my $html='<p class="auto-cursor-target"><br /></p><table class="wrapped"><colgroup><col style="width: 50.0px;" /><col style="width: 29.0px;" /></colgroup><tbody><tr><th><p>Wikispace</p></th><th><p>right</p></th></tr><tr><td>mimi</td><td>right1</td></tr><tr><td colspan="1">mama</td><td colspan="1">right3,right2</td></tr></tbody></table><p class="auto-    cursor-target"><br /></p>';

use HTML::TableExtract;
my $te = HTML::TableExtract->new( headers => [qw(Wikispace right)] );
$te->parse($html);

my %known;
foreach my $ts ($te->tables) {
    foreach my $row ($ts->rows) {
        my @vals = split(/,/, $row->[1]);
        $known{ $row->[0] } = [@vals];
    }
 }
print Dumper(\%known);

Output:

$VAR1 = {
          'mama' => [
                      'right3',
                      'right2'
                    ],
          'mimi' => [
                      'right1'
                    ]
        };
toolic
  • 57,801
  • 17
  • 75
  • 117