-1

I am trying to read a XML form using Perl but I can not use any XML modules like XML::Simple, XML::Parse.

It is a simple XML form which has some basic information and a MS Doc attachment. I want to read this XML and download this attached Doc file then print the XML information in the screen.

But I don't know any way how I can do this without a XML module, I heard that XML file can be parse using Data::Dumper but I am not familiar with this module, so not getting how to do this.

Could you please help me on this if there is any way to do this without a XML modules?

Sample XML:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>
Sobrique
  • 52,974
  • 7
  • 60
  • 101
r-developer
  • 517
  • 1
  • 8
  • 21
  • 1
    Parsing XML is far from simple. That is why there are modules and libraries to do it. I can't imagine what you are thinking of, but `Data::Dumper` can't be used in this way; in any case it is a module, and you're not allowed to use those. Why are you unable to use modules? Can you show a sample of your data? – Borodin Jul 27 '15 at 10:59
  • I am working in a project where there is very limited modules are available as the environment very much restricted so it removed most of the modules from Perl lib :( , that's why I am looking for an alternative way – r-developer Jul 27 '15 at 11:02
  • 6
    I wish I knew why people did this. It is an unnecessary and useless restriction. I guess you could take a look at `XML::Parser::Lite` and copy the ideas. Can you show your data please? – Borodin Jul 27 '15 at 11:08
  • There is no document reference to a Microsoft Word file in your example XML. Please show some _real_ data. You can substitute the actual data or leave out some colums or rename some of them if you are worried about copyright sharing sensitive information. But we cannot help you if you don't tell us what you are working with. – simbabque Jul 27 '15 at 11:21
  • @simbabque right now I am not concern about the attached Doc file. First I want to read the XML and print the data then I will think about the doc file. Sorry I can not provide you the actual data as I dont have access to this site from my office. So I have just used a sample form using I am doing the sample code in my home. – r-developer Jul 27 '15 at 11:25
  • 4
    look, seriously. XML is complicated. Parsing it isn't trivial. That's why parsers exist - because they ensure things happen in a valid, clean and smooth way. Parsing XML without an XML parser is a bit like cleaning the toilet block with your toothbrush. You can do it, but it's way harder than it needs to be, and is just a bit dirty. But also, as it stands, this questions is 'how do I write an XML parser' and so I'd suggest - too broad to meaningfully answer. – Sobrique Jul 27 '15 at 11:30
  • is there a way through which I can convert this XML file to some other format ? – r-developer Jul 27 '15 at 11:30
  • 1
    Yes, you can use a parser... – Sobrique Jul 27 '15 at 11:31
  • if you are asking to use XML::Parser or other perl modules then I am sorry, I don't have that scope that's why I am not getting any solution for this. I know that it's easy and efficient to parse a XML file using XML modules – r-developer Jul 27 '15 at 11:37
  • Extracting some specific data from "well formatted" xml file may be simple. Parsing any xml file is not simple without libraries/modules. – AnFi Jul 27 '15 at 11:41
  • @ Andrzej A. Filip can you tell me how I can print the author, title and price for each book id for the above XML file with a XML module ? – r-developer Jul 27 '15 at 11:44
  • 1
    Please don't change your question. Especially if you already have answers. If you want to ask something else, please ask it as a NEW question. – Sobrique Jul 28 '15 at 09:56

3 Answers3

5

I'd like to re-iterate that this is a BAD IDEA. Because whilst XML looks like plain text - it's isn't plain text. And if you treat it as such, you are creating brittle, unmaintainable and unsupportable code, which may well break one day, because someone changes the XML format in a valid way.

I would strongly suggest that your first port of call is go back to your project, and point out how parsing XML without an XML parser is rather like trying to use a hammer to put screws into a piece of wood. In that it sort of works, but the results are rather shoddy, and frankly it's completely unnecessary because screwdrivers exist and they do the job properly, easily and are widely available.

E.g.

can you tell me how I can print the author, title and price for each book id for the above XML file with a XML module ?

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;
my $twig = XML::Twig -> new -> parsefile ( 'your_file.xml' );
foreach my $book ( $twig -> get_xpath ( '//book' ) ) {
    print join ("\n", 
         $book -> att('id'),
         $book -> field('author'),
         $book -> field('title'),
         $book -> field('price'), ),"\n----\n";
}

However:

Given your very specific sample, you may be able to get away with treating it as 'plain text'. Before you do this, you should point out to your project lead that this is a risky approach - you're putting in screws with a hammer - and therefore creating ongoing risk of support problems, which is trivially resolved by just installing a bit of freely available, open source code.

I am only suggesting this AT ALL because I've had to deal with ludicrously unreasonable similar project demands.

Like this:

#!/usr/bin/env perl
use strict;
use warnings;

while ( <> ) {
   if ( m/<book/ ) { 
       my ( $id ) = ( m/id="(\w+)"/ ); 
       print $id,"\n";
   }
   if ( m/<author/ ) { 
        my ( $author ) = ( m/>(.*)</ );
        print $author,"\n";
   }
}

Now, the reason this doesn't work is your sample above can be perfectly validly formatted as:

<?xml version="1.0"?>
<catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applications 
      with XML.</description></book><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description></book></catalog>

Or

<?xml version="1.0"?>
<catalog>
  <book id="bk101">
    <author>Gambardella, Matthew</author>
    <title>XML Developer's Guide</title>
    <genre>Computer</genre>
    <price>44.95</price>
    <publish_date>2000-10-01</publish_date>
    <description>An in-depth look at creating applications 
      with XML.</description>
  </book>
  <book id="bk102">
    <author>Ralls, Kim</author>
    <title>Midnight Rain</title>
    <genre>Fantasy</genre>
    <price>5.95</price>
    <publish_date>2000-12-16</publish_date>
    <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
  </book>
</catalog>

Or:

<?xml version="1.0"?>
<catalog
><book
id="bk101"
><author
>Gambardella, Matthew</author><title
>XML Developer's Guide</title><genre
>Computer</genre><price
>44.95</price><publish_date
>2000-10-01</publish_date><description
>An in-depth look at creating applications 
      with XML.</description></book><book
id="bk102"
><author
>Ralls, Kim</author><title
>Midnight Rain</title><genre
>Fantasy</genre><price
>5.95</price><publish_date
>2000-12-16</publish_date><description
>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description></book></catalog>

Or:

<?xml version="1.0"?>

<catalog>
  <book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applications 
      with XML.</description></book>
  <book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description></book>
</catalog>

This is why you have so many comments that say 'use a parser' - from those snippets above, the simplistic example I gave you... will only work on one and break messily on the others.

But the XML::Twig solution handles them all correctly. XML::Twig is freely available on CPAN. (There's other libraries that do the job too just as well). And it's also pre-packaged with a lot of operating systems 'default' repositories.

Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • thanks for your reply. somehow I managed to install XML::Simple module and now I am able to print the details in the screen. But now I have the 2nd issue that the attached doc. while checking the XML file it showing some encrypted value for doc path, I have updated my question, can you please put some light on this issue ? – r-developer Jul 28 '15 at 09:40
  • If you have a different question, then I would suggest asking a different question. I would also suggest `XML::Simple` was exactly the worst choice, but if that was the only option might not be a complete disaster. – Sobrique Jul 28 '15 at 09:55
2

Well, an XML parser is just code. And CPAN modules are all open source, so I suppose that you could copy the code from an XML parsing module from CPAN into your program.

But really, that's an incredibly stupid idea. Why wouldn't you just use the module? You would be far better off spending your time getting your bar on using modules removed. A lot of modern Perl Perl programming consists of installing the right modules from CPAN and plumbing them together. If you're not using CPAN modules then you're cutting yourself of from a large proportion of Perl's power.

If you really can't get that restriction lifted then (seriously) get better employers.

Dave Cross
  • 68,119
  • 3
  • 51
  • 97
0

If you can not use any module then you should check out the source code of modules like XML::LibXML and understand how they deal with XML and then implement it your way, which is not recommended though.

See: Perl for XML Processing

Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
  • 2
    I really wouldn't suggest going anywhere near `XML::Simple` if you don't have to. About the only time it's a good choice is if it's the only choice. – Sobrique Jul 27 '15 at 11:32
  • @Sobrique: Ah I see. I've edited the answer to include XML::LibXML. Is that good enough? – Chankey Pathak Jul 28 '15 at 11:10