0

Using the LWP user agent I am sending the request and getting the response. I will get the response in html format and a file attached in it.

eg:

     `<html>
      <head>
      <title>Download Files</title>
      <meta http-equiv=\'Content-Type\' content=\'text/html; charset=utf-8\'>
      <link rel=\'stylesheet\' href=\'http://res.mytoday.com/css/main.css\' type=\'text/css\'>
      <link rel=\'stylesheet\' href=\'http://res.mytoday.com/css/Menu.css\' type=\'text/css\'>
      <link rel=\'stylesheet\' href=\'/statsdoc/freeze.css\' type=\'text/css\'>
      </head>
      <body>
      <table border=1>
      <tr class=\'rightTableData\'>
      <th>No.</th>
      <th>File Name</th> 
      <th>File Size</th>
      </tr><tr class=\'rightTableData\'>
      <td>1</td><td>
      <a href=\'/dlr_download?file=/mnt/dell6/SRM_DATA/data/API_FILE     /20160329/LSUZisbZahtHNeImZJm_1-1.csv.zip\'>1-1.csv.zip</a>
     </td><td>487 bytes</td>  </tr>
     </table>
     </br></br>  
     <center><a href=\'/dlr_download?file=/mnt/dell6/SRM_DATA/data/API_FILE/20160329/LSUZisbZahtHNeImZJm-csv.zip\'>Download all</a></center>                                                         
    </body></html>`

From this response I need to get the file. Can anyone help me to get the file from response.

sharon
  • 734
  • 6
  • 15

1 Answers1

1

Use a parser to extract the information. I used XML::LibXML, but I had to remove the closing br tags that made the parser fail.

#!/usr/bin/perl
use warnings;
use strict;

my $html = '<html>
      <head>
      <title>Download Files</title>
      <meta http-equiv=\'Content-Type\' content=\'text/html; charset=utf-8\'>
      <link rel=\'stylesheet\' href=\'http://res.mytoday.com/css/main.css\' type=\'text/css\'>
      <link rel=\'stylesheet\' href=\'http://res.mytoday.com/css/Menu.css\' type=\'text/css\'>
      <link rel=\'stylesheet\' href=\'/statsdoc/freeze.css\' type=\'text/css\'>
      </head>
      <body>
      <table border=1>
      <tr class=\'rightTableData\'>
      <th>No.</th>
      <th>File Name</th> 
      <th>File Size</th>
      </tr><tr class=\'rightTableData\'>
      <td>1</td><td>
      <a href=\'/dlr_download?file=/mnt/dell6/SRM_DATA/data/API_FILE     /20160329/LSUZisbZahtHNeImZJm_1-1.csv.zip\'>1-1.csv.zip</a>
     </td><td>487 bytes</td>  </tr>
     </table>
     <!-- </br></br> I had to comment this out! -->
     <center><a href=\'/dlr_download?file=/mnt/dell6/SRM_DATA/data/API_FILE/20160329/LSUZisbZahtHNeImZJm-csv.zip\'>Download all</a></center>                                                         
    </body></html>';

use XML::LibXML;
my $dom = 'XML::LibXML'->load_html( string => $html );
print $dom->findvalue('/html/body/table/tr[2]/td[2]/a/@href');

You could also use the recover flag to parse invalid HTML:

my $dom = 'XML::LibXML'->load_html( string => $html, recover => 1 );
choroba
  • 231,213
  • 25
  • 204
  • 289
  • when I store the file url in a variable, it throws this error `empty XPath found at /usr/lib/perl5/XML/LibXML.pm line 1317`. Why this error is occurred. – sharon Mar 31 '16 at 07:10
  • @sharon: This seems unrelated. Ask a new question, show the code. – choroba Mar 31 '16 at 09:45
  • I fixed the bug. And to read the file I used LWP::UserAgent package and I did it. ;-) Thanks for ur help. – sharon Mar 31 '16 at 10:32