0

I have following html i would like to extract information and reshape them into a table-like database:

<tr some parameters here>
<td more parameters here></div>
<div even more para>Var1</td>
<td params>observation 1</td>
<td params></td> 
</tr> 
<tr some parameters here>
<td more parameters here></div>
<div even more para>Var2</td>
<td params>observation 2</td>
<td params></td> 
</tr> 

so on and so forth for the var3 obs 3, var4 obs4.

I was advised to use Mojo::DOM and given the following highly structured codes :

#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use lib './lib/lib/perl64';
use Mojo::DOM;
my $html = q(
<html>
<head><title>Some list</title>
</head>
<body>
<div>
<table>
<tr>
<td>Var1</td>
<td>Obs1</td>
</tr>
<tr>
<td>var2</td>
<td>obs2</td>
</table>
</div>
</body>
</html>
);
my $dom = Mojo::DOM->new($html);
my $table = $dom->at('table');
for my $record ($table->children('tr')->each) {
my %record = map { $_->text } $record->children('td')->each;
print Dumper(\%record), "\n";
}

Please assume that I do not know any programming. And how would I adjust the code so such i can use it in my case? It seems to me it is still quite far away from being a doable script. really appreciated your help and thx in advance.

regards,sh

PerC
  • 429
  • 3
  • 11
  • What problems are you having? Is it not compiling/running? Is it not returning the results you want? – eandersson Mar 22 '13 at 15:44
  • Hi Fuji, the code i was given runs well under the simple html that is defined inside the codes. it however returns `odd number of elements in hash assignment at xxx.plx line 42` where the children is defined. The difference between the base html case and the real html, from my point of view, is that 1) for each variable, it has a
    tag instead of tag (sorry for being so naive) 2) there are some other s which i do not want to parse out but it does not seem be a big concern.
    – user2198367 Mar 22 '13 at 18:33

1 Answers1

1

A close </tr> is missing just before the </table>:

  <tr>
    <td>var2</td>
    <td>obs2</td>
  </tr>
</table>
Toto
  • 89,455
  • 62
  • 89
  • 125