2

I've got a bunch of XML messages in a PostgreSQL 9.1.3 table, with a column content of type XML). They're not all the same "type", so I'm trying to extract the root type using a query like this:

SELECT xpath('name(/*)', content) FROM message;

as recommended by this answer to a similar SO question.

A sample message is:

<?xml version="1.0" encoding="UTF-8"?>
<WML version="6" xmlns="http://example.com/schemas/WML">...</WML>

For which case I'd hope to get the result '{WML}'. Unfortunately it just returns an empty array. Adding the namespaces parameter to xpath, or removing the namespace from the message, does not help.

A discussion on the PostgreSQL mailing lists seems to explain it as a bug in XPath handling in PostgreSQL. However that was in 2008, and a look at the PostgreSQL source shows that piece of code was changed in 2009. I'm not a PostgreSQL developer so I'm not confident that the bug is or is not a factor in my case.

But I'm wondering if there's a workaround, such as an alternative XPath expression that might work? I'd prefer not to have to resort to regular expressions to parse XML, though if you can suggest a short, punchy, robust RE then it would be better than nothing.

Community
  • 1
  • 1
Edmund
  • 10,533
  • 3
  • 39
  • 57

2 Answers2

0

Clearly, this has not yet been solved as of June 2011.
I found this thread on pgsql-hackers that describes your problem exactly.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

I don't know of a workaround for older versions, but this is fixed in PostgreSQL 9.2, so that's great.

(The likeliest workaround would likely to have been to write a function to parse the XML manually, but I'm glad I don't have to resort to that now!)

Edmund
  • 10,533
  • 3
  • 39
  • 57