0

I have an xml file and I want to compare ids from entry node to id from reaction node and if are the same like the example below I want to access all the information of reaction (substrate id and product id). I have two product id and this code gives the first one Here is the XML file

<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<!-- Creation date: May 31, 2012 14:53:24 +0900 (GMT+09:00) -->
<pathway name="path:ko00010" org="ko" number="00010" >
    <entry id="13">
    </entry>
    <entry id="37" >
    </entry>
    <reaction id="13" name="rn:R01070" type="reversible">
      <substrate id="105" name="cpd:C05378"/>
      <product id="132" name="cpd:C00118"/>
      <product id="89" name="cpd:C00111"/>
    </reaction>
</pathway>

Here is my code

use strict;
use warnings;
use XML::Simple;

my $xml = new XML::Simple;
my $data = $xml->XMLin("file.xml");
foreach my $entry (keys %{$data->{entry}}) {
    foreach my $reaction (keys %{$data->{reaction}}) {
    if ($data->{reaction}->{id} eq $data->{entry}->{$entry}->{id} ){
        print "substrate:::$data->{reaction}->{substrate}->{id}\n";
        print "product:::$data->{reaction}->{product}->{id}\n";
    }
    }
}
amon
  • 57,091
  • 2
  • 89
  • 149
shaq
  • 187
  • 4
  • 13

1 Answers1

1

XML::Simple is anything but simple. Its own documentation discourages further use of that module.

The data structure you might be getting (who knows?) is on my system:

{
  entry    => { 13 => {}, 37 => {} },
  name     => "path:ko00010",
  number   => "00010",
  org      => "ko",
  reaction => {
                id => 13,
                name => "rn:R01070",
                product => { "cpd:C00111" => { id => 89 }, "cpd:C00118" => { id => 132 } },
                substrate => { id => 105, name => "cpd:C05378" },
                type => "reversible",
              },
}

It is always good to inpect a data structure when you are not sure if you are accessing it correctly. One way to do so is use Data::Dumper; print Dumper $data.

You might notice that there is no field for id in the entry. Also, the products do not have an ID field, rather using the name attribute as a name. *Sigh* – this kind of “cleverness” is why you shouldn't be using XML::Simple.


It is far easier to use a proper parser like XML::LibXML. We can then use XPath to select nodes we want:

use XML::LibXML;
use feature 'say';

my $data = XML::LibXML->load_xml(location => "test.xml");
my $query = '/pathway/reaction[/pathway/entry/@id=@id]';

if (my ($reaction) = $data->findnodes($query)) {
  say "substrate:::", $reaction->findvalue('substrate/@id');
  say "product:::", $_->textContent for $reaction->findnodes('product/@id');
}

Output:

substrate:::105
product:::132
product:::89
amon
  • 57,091
  • 2
  • 89
  • 149