0

I have made one script which will extract all the Row data from HTML <TR> tags. I am having 30 HTML <TR> tags on my HTML page. Based on count, my code will fetch particular row data. Let's say if I need data present in 5th <tr>...</tr>, then my condition is if(count =5) {(go inside and get that data)}

But my problem here is I need the selected rows' data one at a time. Let's say I need data for rows 5, 6, and 14.

Could you please help me sort it out?

$te = new HTML::TableExtract(count => 0 );
$te->parse($content);
# Examine all matching tables
foreach $ts ($te->table_states) {
    #print "Table (", join(',', $ts->coords), "):\n";
    $cnt = 1;
    foreach $row($ts->rows) {
        # print " ---- Printing Row $cnt ----\n";
        $PrintLine= join("\t", @$row);
        @RowData=split(/\t/,$PrintLine);
        $PrintLine =~ s/\r//ig;
        $PrintLine =~ s/\t//ig;
        $cnt = $cnt + 1;
        #   if ($PrintLine =~ /Site ID/ig || $PrintLine =~ /Site name/ig){print " Intrest $PrintLine $cnt =====================\n"};
        if ( $cnt == 14) { 
            $arraycnt = 1;
            my $SiteID="";
            my $SiteName="";
            foreach (@RowData) {
                # print " Array element $arraycnt\n";
                chomp;
                $_ =~ s/\r//ig;
                $_ =~ s/[\xC3\xA1\xC3\xA0\xC3\xA2\xC3\xA3]//ig;
                if ($arraycnt== 17 ) { $SiteID= $_;}
                if ($arraycnt== 39 ) { $SiteName= $_;}
                    $arraycnt = $arraycnt + 1;
            } 
            #$PrintLineFinal = $BridgeCase."\t".$PrintLine;
            $PrintLineFinal = $BridgeCase."\t".$SiteID."\t".$SiteName;
            #print "$PrintLineFinal\n";
            print MYFILE2 "$PrintLineFinal\n";          
            last;
        }       
    }
}
Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
user2106358
  • 21
  • 1
  • 2

2 Answers2

0

A few suggestions:

Always:

 use strict;
 use warnings;

This will force you to declare your variables with my. e.g.

foreach my $ts ($te->table_states) {
   my $cnt = 1;

(warnings will let you know about most silly mistakes. strict prevents mistakes by requiring you to use better practices in certain cases).

In several places, you are using your own counter variables as you go through the array. You don't need to do this. Instead, just get the array element you want directly. e.g. $array[3] to get the third element.

Perl also allows array slices to get just certain elements you want. @array[4,5,13] gets the fifth, sixth, and fourteenth elements of the array. You can use this to process only the rows you want, instead of looping through all of them:

my @rows = $ts->rows;
foreach my $row (@rows[4,5,13]) #process only the 5th, 6th, and 14th rows.
{
    ...
}

Here is a shortcut version of the same thing, using an anonymous array:

foreach my $row (@{[$ts->rows]}[4,5,13])

Also, perhaps you want to define the rows you want elsewhere in your code:

my @wanted_rows = (4,5,13);
...
foreach my $row (@{[$ts->rows]}[@wanted_rows])

This code is quite confused:

$PrintLine= join("\t", @$row);
@RowData=split(/\t/,$PrintLine);
$PrintLine =~ s/\r//ig;
$PrintLine =~ s/\t//ig;

First you are joining an array with tab characters, then you are splitting the array you just joined to get the array back again. Then you remove all tab characters from the line anyway.

I suggest you get rid of all that code. Just use @$row whenever you need the array, instead of making a copy of it. If you need to print the array for debugging (which is all you seem to be doing with $PrintLine, you can print an array directly:

print @$row;    #print an array, nothing between each element.
print "@$row";  #print an array with spaces between each element.

With all of these changes, your code would be something like this:

use strict;
use warnings;

my @wanted_rows = (4,5,13);

my $te = new HTML::TableExtract(count => 0);

$te->parse($content);
# Examine all matching tables
foreach my $ts ($te->table_states) {
    foreach my $row (@{[$ts->rows]}[@wanted_rows]) {

        s/[\xC3\xA1\xC3\xA0\xC3\xA2\xC3\xA3\r\n]//ig for (@$row);

        my $SiteID   = $$row[16] // '';  #set to empty strings if not defined.
        my $SiteName = $$row[38] // '';  
        print MYFILE2 $BridgeCase."\t".$SiteID."\t".$SiteName;
    }
}
0

You could access the results like this:

foreach $ts ($te->table_states) {
    #you need 14th rows
    #my 14throws = $ts->rows->[13];#starting with zero!
    #17th col from the 14th row
    #my $17colfrom14throws = $ts->rows->[13]->[16];
    my $SiteName = $ts->rows->[13]->[38];
    my $SiteID   = $ts->rows->[13]->[16];
    my $PrintLineFinal = $BridgeCase."\t".$SiteID."\t".$SiteName;
    print MYFILE2 "$PrintLineFinal\n";     
}
user1126070
  • 5,059
  • 1
  • 16
  • 15