-1

I am trying to get text from a website that is HTTPS. I have made this work with LWP, but I need to parse the information because it is XML. I think I have found out how to do what I want with XML::LibXML but I cannot access the data from LWP::UserAgent with it.

This is my code:

#! usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
use open qw(:std :utf8);
use 5.014;
use IO::Socket::SSL qw();
use XML::LibXML;

BEGIN {
    $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;
    $ENV{HTTPS_DEBUG} = 1;
}

my $ua = LWP::UserAgent->new(ssl_opts => {
    SSL_verify_mode => IO::Socket::SSL::SSL_VERIFY_NONE,
    verify_hostname => 0, 
});

my $response = $ua->get('https:<mywebsite>');

my $t = '';

if ( $response->is_success ) {
    $t = $response->decoded_content;
}
else {
    die $response->status_line;
}


my $parser = XML::->new();
my $xmldoc = $parser->parse_file($t);

print $xmldoc;

I am getting the error : No such file or directory and I get an error for every parser method I try and the LibXML parser string methods don't work because my data is many lines. I need a way to either trick XML::LibXML into thinking $t is a file or file handle or find another way to parse my data. And I don't want to actually create a file if that can be helped.

For reference this is the XML data I get from the HTTPS website with the above code that is stored in $t:

<?xml version="1.0" ?>
<resultset>
<table name="PROFILE">
 <column name="ID" type="String"/>
 <column name="VERSION" type="String"/>
 <column name="NAME" type="String"/>
 <column name="DESCRIPTION" type="String"/>
<data>
<r><c>0</c><c>1.0</c><c>Default profile</c><c>Default profile</c></r>
<r><c>2</c><c>1.2</c><c>Custom 2</c><c></c></r>
<r><c>3</c><c>6.0</c><c>Custom 3</c><c></c></r>
<r><c>1</c><c>1.15</c><c> For Compare</c><c>The built in profile for compare.</c></r>
<r><c>4</c><c>1.3</c><c>Custom 4</c><c> </c></r>
<r><c>6</c><c>11.0</c><c>Custom 6</c><c>Please only make approved changes.</c></r>
</data>
</table>
</resultset>

Any help is appreciated, thanks.

CircuitB0T
  • 465
  • 2
  • 6
  • 19

2 Answers2

3

Just read the docs for XML::LibXML, and you'll see the alternative forms for calling it.

use XML::LibXML;
my $dom = XML::LibXML->load_xml(string => $t);
Miller
  • 34,962
  • 4
  • 39
  • 60
  • Then what do I use for the parser? – CircuitB0T Mar 21 '14 at 21:32
  • I've never used `XML::LibXML`. My preferred xml parsing modules are [`XML::Simple`](https://metacpan.org/pod/XML::Simple) for "simple" xml, and [`XML::Twig`](https://metacpan.org/pod/XML::Twig) for everything else. – Miller Mar 21 '14 at 21:38
  • *Please* don't encourage the use of `XML::Simple`. In the module's documentation the author says, *"The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces. In particular, XML::LibXML is highly recommended"*. `XML::Twig` is a good alternative, mainly for its convenient API. – Borodin Mar 22 '14 at 02:47
  • @Borodin I've gave learning `XML::LibXML` another attempt per your advice, and overall I'm still disappointed with its style of documentation. I got more info from a [`perlmonks post from 2005`](http://www.perlmonks.org/?node_id=490846). As for `XML::Simple` being discouraged, I agree with the maintainer's opinion that the options of the module give it many flaws. I'd never use it for writing or editing XML. However, it's still a useful tool as the barrier for entry is just understanding perl data structures. So I'd still use it for "simple" xml. – Miller Mar 22 '14 at 08:21
2

If you look at the documentation for XML::LibXML::Parser you will see that the location option for load_xml can be either a path to a file or a URL. So there is no need to directly involve LWP at all; you can write just

my $xmldoc = XML::LibXML->load_xml(location => 'https:<mywebsite>');
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • @CircuitB0T: That is probably true, although it's not mentioned in the documentation for the module. – Borodin Mar 23 '14 at 19:00