1

I am trying to look at the tag of a in XML. I have my script running down to the dbReferences (there are several for an entry), but I only want to take the 'id' if the 'type' = "EC".

I am thinking of doing some type of if statement, where it will look at the 'type' of the dbReference before taking the id:

foreach $entry (@{$data->{entry}}) {         
        foreach $ref (@{$entry->{dbReference}}) {
            if($ref type ="EC"){
                #then print the id
            }
        }
 }

Edit: The entry XML would be formatted like this, with a lot of dbReferences in a row, that need to be checked:

<dbReference type="NCBI Taxonomy" id="9606"/>
<dbReference type="PubMed" id="8274401"/>
<dbReference type="EC" id="1.1.5.54"/>

Any ideas?

bforcer
  • 125
  • 1
  • 11
  • yu should write `if ($ref eq "EC")` – Jens May 24 '14 at 07:52
  • dbReference has a bunch of parameters, id, length, type, property, etc. How can I specify that I am testing type in the if statement? – bforcer May 24 '14 at 16:08
  • Can you please post an sample input and what you expacted? – Jens May 24 '14 at 16:15
  • I updated the main post with an edit of what the xml looks like. the code is already parsed with XML::Simple. I'm not sure how to access the type and id parameters of a the dbReference, as they're not in the body of a tag. – bforcer May 24 '14 at 16:31

2 Answers2

1

You could use XPath for that. This expression returns the id for all dbReference elements (in any nesting level) that have a type attribute equal to EC:

//dbReference[@type="EC"]/@id

Code snippet:

use XML::LibXML;

my $dom = XML::LibXML->new->parse_file('file.xml');
my $node = $dom->findnodes('//dbReference[@type="EC"]/@id');
print 'Result: '.$node;

You could adjust the expression with extra restrictions (ex: an absolute path to the node, or other attributes, node position, etc.) in case this doesn't return a unique value.

helderdarocha
  • 23,209
  • 4
  • 50
  • 65
  • can i incorporate the findnodes method to the for loop? That way I could keep the values for each entry together. – bforcer May 24 '14 at 16:44
  • Yes, but if you choose this solution you might want to use XPath for that as well, since it would be more efficient. You can return a set of nodes and loop on them. See the answer to [this question](http://stackoverflow.com/questions/2039143/how-can-i-access-attributes-and-elements-from-xmllibxml-in-perl?rq=1), for an example. – helderdarocha May 24 '14 at 16:49
  • can you explain how I would alter the registersNs line for my example, I don't see the point of that line. – bforcer May 24 '14 at 17:33
  • That's because his example uses a namespace so his element selectors need to be prefixed (`ns:tagname`) and the prefix needs to be mapped to a namespace. If your XML file does not declare a `xmlns` you don't need that. – helderdarocha May 24 '14 at 17:39
  • so the code you posted above should be storing the id of all dbReference tags that have the type ="EC". Correct? I added print $dom to the end of the code and it returns all my xml code – bforcer May 24 '14 at 17:42
  • Sorry. I didn't test it and forgot to assign the variable. I'll fix it. It's simpler now. It should print the ID. – helderdarocha May 24 '14 at 17:54
  • is there some way to store the id numbers in an array or something. I need to manipulate the ids further, so they need to be separate strings – bforcer May 24 '14 at 18:04
0

I guess yor input is somthing like

<xml>
<dbReference type="NCBI Taxonomy" id="9606"/>
<dbReference type="PubMed" id="8274401"/>
<dbReference type="EC" id="1.1.5.54"/>
<dbReference type="NCBI Taxonomy" id="9606"/>
<dbReference type="PubMed" id="8274401"/>
<dbReference type="EC" id="1.1.5.54"/>

so try:

my $ref = XMLin($String);


                #warn Dumper($ref);
       foreach $ref1 (keys %{$ref->{dbReference}}) {
            if($ref->{dbReference}->{$ref1}->{type} eq "EC"){
                #then print the id
                $ref1;
            }
        }
Jens
  • 67,715
  • 15
  • 98
  • 113