2

When I run:

#!/usr/bin/env ruby
require 'nokogiri'

xml = <<-EOXML
<pajamas>
  <bananas>
    <foo>bar</foo>
    <bar>bar</bar>
    <1>bar</1>
  </bananas>
</pajamas>
EOXML

doc = Nokogiri::XML(xml)
puts doc.at('/pajamas/bananas/foo')
puts doc.at('/pajamas/bananas/bar')
puts doc.at('/pajamas/bananas/1')

I get an ERROR: Invalid expression: /pajamas/bananas/1 (Nokogiri::XML::XPath::SyntaxError)

Is this a case of Nokogiri not liking ints as node names and/or is there a work around?

Looking at the documentation, I did not see a workaround to this. Removing the last line eliminates the error and prints the first two nodes as expected.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
engartst
  • 23
  • 3
  • 2
    That isn't an XML document, so you can't parse it using an XML parser. Element names can't start with a digit. – Michael Kay Jan 25 '23 at 18:05

1 Answers1

2

An XML element with a name that starts with a number is invalid XML.

XML elements must follow these naming rules:

  • Names can contain letters, numbers, and other characters
  • Names cannot start with a number or punctuation character
  • Names cannot start with the letters xml (or XML, or Xml, etc)
  • Names cannot contain spaces Any name can be used, no words are reserved.

You're trying to parse invalid XML with a XML parser, it's just not going to work. If you're really getting <1> as a tag and can't control that somehow, I'd suggest replacing the tags using a regex before getting to nokogiri.

Mike K.
  • 3,751
  • 28
  • 41
  • 2
    [Spec](https://www.w3.org/TR/xml/#NT-Name) *NameStartChar must meet this definition:* `":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]` – engineersmnky Jan 25 '23 at 15:09
  • Thanks! Makes sense, I will hit it with a regex first. Cheers! – engartst Jan 25 '23 at 16:11