0

I have got below xml file and I have problem parsing of this, I just want to parse each tag separately.

<pkg:xmlData>
....
</pkg:xmlData>
</pkg:part>
<pkg:part pkg:name="/word/document.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml">
<pkg:xmlData>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
<w:body>
<w:p w:rsidR="00D506C1" w:rsidRDefault="00D506C1">
<w:bookmarkStart w:id="0" w:name="_GoBack"/>
<w:bookmarkEnd w:id="0"/>
<w:r>
<w:t>Max Mara</w:t>
</w:r>
<w:r w:rsidR="00625187">
<w:t>s</w:t>
</w:r>
<w:r>
<w:t xml:space="preserve">Frühjahr/Sommer</w:t>
</w:r>
....
</w:p>
...
</w:body>
...
</pkg:part>

This is what I tried:

doc = Nokogiri::XML(File.open(@file),nil,"UTF-8")
 root = doc.root
 title = doc.xpath("//pkg:xmlData//w:body")

This is what I get:

Nokogiri::XML::XPath::SyntaxError: Undefined namespace prefix: //pkg:xmlData//w:body

any help?

tokhi
  • 21,044
  • 23
  • 95
  • 105
  • This is a microsoft word document which transformed to xml, parsing is challenging, plz have a look to the file. – tokhi Dec 23 '13 at 14:53
  • 1
    Don't put a link to your source data. *WHEN* that document changes or the link breaks, your question will make no sense and will be worthless for people searching for the same answers. Instead, reduce and summarize the XML until it's the bare minimum needed to show the problem. "Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See http://SSCCE.org for guidance." As is, you're expecting people to chase down the necessary information to understand your question. – the Tin Man Dec 23 '13 at 14:59

1 Answers1

2

When you're dealing with namespaced XML document, you also need to supply namespace parameters to your xpath call like follows:

title = doc.xpath("//pkg:xmlData//w:body", 
                  "pkg" => "http://example.com/package", 
                  "w" => "http://example.com/w")

In the above code replace http://example.com/package with the URL defined in for this namespace in the file @file. Similarly do the same for http://example.com/w.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
vee
  • 38,255
  • 7
  • 74
  • 78