4

I'm using XML::Twig::XPath to work with ITS data, and am trying to figure out how to resolve XPath expressions with variables in them. Here's an example of what I need to work with from the ITS spec:

<its:rules version="2.0">
  <its:param name="LCID">0x0409</its:param>
  <its:translateRule selector="//msg[@lcid=$LCID]" translate="yes"/>
</its:rules>

I need to be able to evaluate the XPath expression contained in selector, with the value of the variable being the contents of the its:param element. I am at a loss as to how to do this. The documentation of XML::XPath mentions variables (which I assume should be part of the context), and it even has a class to represent them, but the documentation doesn't say how to specify variables in a context. I would be even more unsure of how to access such functionality from XML::Twig, if at all possible.

Does anyone know how to do this? Or alternatively, can you give an example of how to use such functionality with another module such as XML::LibXML (which mentions variables extensively, but leaves me a little unsure as to how to do this with variables that are strings)?

Nate Glenn
  • 6,455
  • 8
  • 52
  • 95
  • XPath variables is unfamiliar territory to me, but one naive way I see is just substituting the `$LCID` with the value from `its:param` like this: `$selector =~ s/[$]LCID/$lcid/`. – doubleDown Jun 25 '13 at 01:46
  • I was thinking of that, but am unsure if that is a robust solution. The documentation on variables [here](http://saxon.sourceforge.net/saxon6.5.3/expressions.html) seems to suggest that variables could be more than just strings to interpolate, but ITS parameters may be only strings (in which case, I think this would also need quotes: `$selector =~ s/[$]LCID/'$lcid'/`). Maybe it's my prerogative, since the ITS doc says "[the implementor should] define how variables are bind for evaluation of selector expression". – Nate Glenn Jun 25 '13 at 05:24
  • What if the value contains a quote? You need more than quotes. – ikegami Jun 25 '13 at 05:40
  • Looking at the its:doc some more, I am certain that it doesn't allow the full range of XPath variables (which can hold any object), and that I only have to worry about strings. I'll leave this question here, though, since I'd like to know how to add variables to context in XML::Twig or XML::XPath. – Nate Glenn Jun 25 '13 at 06:00

4 Answers4

3

libxml2 and XML::LibXML supports XPath 2.0 paths and their variables.

use XML::LibXML               qw( );
use XML::LibXML::XPathContext qw( );

sub dict_lookup {
   my ($dict, $var_name, $ns) = @_;
   $var_name = "{$ns}$var_name" if defined($ns);
   my $val = $dict->{$var_name};
   if (!defined($val)) {
      warn("Unknown variable \"$var_name\"\n");
      $val = '';
   }

   return $val;
}

my $xml = <<'__EOI__';
<r>
<e x="a">A</e>
<e x="b">B</e>
</r>
__EOI__

my %dict = ( x => 'b' );

my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);

my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerVarLookupFunc(\&dict_lookup, \%dict);

say $_->textContent() for $xpc->findnodes('//e[@x=$x]', $doc);
ikegami
  • 367,544
  • 15
  • 269
  • 518
2

If you were using an engine that only supports XPath 1.0 paths, you could treat the value as a template whose grammar is:

start : parts EOI
parts : part parts |
part  : string_literal | variable | other

The following produces the XPath from the XPath template.

sub text_to_xpath_lit {
   my ($s) = @_;
   return qq{"$s"} if $s !~ /"/;
   return qq{'$s'} if $s !~ /'/;

   $s =~ s/"/", '"', "/g;
   return qq{concat("$s")};
}

my $NCNameStartChar_class = '_A-Za-z\xC0-\xD6\xD8-\xF6\xF8-\x{2FF}\x{370}-\x{37D}\x{37F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}';
my $NCNameChar_class = $NCNameStartChar_class . '\-.0-9\xB7\x{300}-\x{36F}\x{203F}-\x{2040}';
my $NCName_pat = "[$NCNameStartChar_class][$NCNameChar_class]*+";

my $xpath = '';
for ($xpath_template) {
   while (1) {
      if (/\G ( [^'"\$]++ ) /xgc) {
         $xpath .= $1;
      }
      elsif (/\G (?=['"]) /xgc) {
         /\G ( ' [^\\']*+ ' | " [^\\"]*+ " ) /sxgc
            or die("Unmatched quote\n");

         $xpath .= $1;
      }
      elsif (/\G \$ /xgc) {
         /\G (?: ( $NCName_pat ) : )?+ ( $NCName_pat ) /xgc
            or die("Unexpected '\$'\n");

         my ($prefix, $var_name) = ($1, $2);
         my $ns = $ns_map{$prefix}
            or die("Undefined prefix '$prefix'\n");

         $xpath .= text_to_xpath_lit(var_lookup($ns, $var_name));
      }
      elsif (/\G \z /xgc) {
         last;
      }
   }    
}

Sample var_lookup:

sub var_lookup {
   my ($ns, $var_name) = @_;
   $var_name = "{$ns}$var_name" if defined($ns);
   my $val = $params{$var_name};
   if (!defined($val)) {
      warn("Unknown variable \"$var_name\"\n");
      $val = '';
   }

   return $val;
}

Untested.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Neat! Would you mind just putting in a few comments? Also, I think the $var match should be `( (?:[a-zA-Z][_a-zA-Z0-9]:)?[a-zA-Z][_a-zA-Z0-9]*+ )`, since variable names are [QNames](http://www.w3.org/TR/2009/REC-xml-names-20091208/#NT-QName). – Nate Glenn Jun 25 '13 at 05:45
  • `/\G.../g` in scalar context matches where the last one left off. The `/c` makes it so a failed match "leaves off" where it started (instead of resetting to "never matched"). The rest is pretty self explanatory. Matches `$var` outside of string literals by processing the string in chunks of "variable", "string literal" and "other". Can't stick around. – ikegami Jun 25 '13 at 05:54
  • Now properly matches QNames and handles prefixes correctly (i.e. uses the associated namespace rather than the prefix itself). – ikegami Jun 25 '13 at 18:50
2

Here is a complete solution.

I sidestepped the "what's a Qname" part by building a regexp from the parameter names already found. this might be slow if there are many parameters, but it works fine on the W3C's example; building the regexp means escaping each name between \Q/\E so meta-characters in the names are ignored, sorting the names by length so a shorter name doesn't match instead of a longer one, then joining them by '|',

Limitations:

  • there is no error handling if you use a parameter that's not defined previously,
  • namespaces in selectors are not handled, which is easy to add if you have real data, just add the appropriate map_xmlns declarations,
  • the whole document is loaded in memory, which is hard to avoid if you want to use generic XPath selectors

Here it is:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig::XPath;

my %param;
my $mparam;
my @selectors;

my $t= XML::Twig::XPath->new( 
  map_xmlns     => { 'http://www.w3.org/2005/11/its' => 'its' },
  twig_handlers => { 'its:param' => sub { $param{$_->att( 'name')}= $_->text; 
                                          $match_param= join '|', 
                                                         map { "\Q$_\E" }
                                                         sort { lenght($b) <=> length($a) } keys %param;
                                        },
                     'its:translateRule[@translate="yes"]' =>
                                   sub { my $selector= $_->att( 'selector');
                                         $selector=~ s{\$($mparam)}{quote($param{$1})}eg;
                                         push @selectors, $selector;
                                       },
                   },
                            )
                       ->parse( \*DATA);

foreach my $selector (@selectors)
  { my @matches= $t->findnodes( $selector);
    print "$selector: ";
    foreach my $match (@matches) { $match->print; print "\n"; }
  }

sub quote
  { my( $param)= @_;
    return $param=~ m{"} ? qq{'$param'} : qq{"$param"}; 
  }
mirod
  • 15,923
  • 3
  • 45
  • 65
  • @ikegami: in which case? When $x has been previously defined or not? Oh, I see, when the "thing that looks like a parameter" is not one. – mirod Jun 25 '13 at 08:29
  • You're right: the "mini parser" that replaces variables in the XPath expression is not complete. I'd like to know what the real data looks like before improving it though. – mirod Jun 25 '13 at 08:34
  • I know, although it may be that XML::XPathEngine supports XPath variables, I have never tried. – mirod Jun 25 '13 at 15:49
  • I actually only need XPath 1.0, and that's what XML::XPath supports. @mirod I think I figure out how to do variables in XML::XPath. You have to create your own XML::XPath::Parser and pass that into the XML::XPath constructor. ''::Parser has methods for setting variables: `set_var` and `get_var`. – Nate Glenn Jun 25 '13 at 19:17
0

In XML::XPath, you can set variables on the XML::XPath::Parser object. It doesn't seem to be directly accessible via the XML::XPath object; you have to use $xp->{path_parser}, which is undocumented, to get to it. Here's an example with a string variable and also a nodeset variable:

use XML::XPath;
use XML::XPath::Parser;
use XML::XPath::Literal;

my $xp = XML::XPath->new(xml => <<'ENDXML');
<?xml version="1.0"?>
<xml>
    <a>
        <stuff foo="bar">
            junk
        </stuff>
    </a>
</xml>
ENDXML

#set the variable to the literal string 'bar'
$xp->{path_parser}->set_var('foo_att', XML::XPath::Literal->new('bar'));
my $nodeset = $xp->find('//*[@foo=$foo_att]');

foreach my $node ($nodeset->get_nodelist) {
    print "1. FOUND\n\n",
        XML::XPath::XMLParser::as_string($node),
        "\n\n";
}

#set the variable to the nodeset found from the previous query
$xp->{path_parser}->set_var('stuff_el', $nodeset);
$nodeset = $xp->find('/*[$stuff_el]');

foreach my $node ($nodeset->get_nodelist) {
    print "2. FOUND\n\n",
        XML::XPath::XMLParser::as_string($node),
        "\n\n";
}
Nate Glenn
  • 6,455
  • 8
  • 52
  • 95