0

I'm writing a code that re-organizes namespaces in an arbitrary XML, potentially changing their prefixes. That was pretty straightforward until I ran into the xsi:type attribute:

<foo xsi:type="xs:string">...</foo>

If I change the xs prefix of XSD namespace, I have to do the same for this xsi:type value, e.g. into

<foo i:type="x:string">...</foo>

This attribute is well known. However, in general, if I find a code like this:

<foo xmlns:aaa="http://bbb">
   <bar name="aaa:123">...</bar>
</foo>

Is there a way to tell that in the "aaa:123" value the "aaa" part refers to "http://bbb" namespace?

I.e. it could be that the name is simply "aaa:123", without any intended reference to the namespace with "aaa" prefix, and the match is accidental.

If it helps, the implementation language is Java.

Update/Solution:

Thanks to the helpful explanations and pointers provided in the answers below, I have modified my code to work by the following rules when it encounters an attribute that has a prefixed value:

  • For xsi:type attribute, update the attribute value’s prefix to match the new prefix for http://www.w3.org/2001/XMLSchema.
  • If in the current context there IS NO namespace with a matching prefix,
    the value is considered literal (not QName) and left as is.
  • If in the current context there IS a namespace with a matching prefix, we cannot tell if the attribute value is literal or QName, and so the code cancels the processing and leaves the document as is. The document is not modified at all.

For anyone interested, the code is here.

I know the logic can be improved by not touching only the namespaces affected by the ambiguous attributes, but it is Good Enough(tm) for me.

Vladimir Dyuzhev
  • 18,130
  • 10
  • 48
  • 62
  • The **content** of an attribute (as well as a the `text()` within an element) is not living in any namespace. The namespace is related to the containing attribute (or element). In most cases attributes are living within the same namespace as their elements and it is rather strange to define different namespaces for attributes. But it's not ununsual. I see no other chance then to check this on string-level (find a `xyz:` at the beginning of the content). But this may fail accidentically... – Shnugo May 08 '18 at 13:37

2 Answers2

3

This isn't possible in a generic way without knowledge of the intepretation of the XML by the application. There is, however, a weak convention that if the attribute or element in question has an XML Schema data type of xsd:QName (thus the XML in question must be described by XML Schema in the first place), then the attribute's or element's value is subject to namespace normalization.

See also Using Qualified Names (QNames) as Identifiers in XML Content.

imhotap
  • 2,275
  • 1
  • 8
  • 16
2

A schema will tell you if an attribute is typed as xs:QName, but it won't tell you that it's a namespace-sensitive XPath expression (such as xsl:value-of/@select in XSLT or xs:selector/@xpath in XSD). And even if you knew these attributes were namespace sensitive, you would have a lot of detailed parsing to do to extract and replace the namespace prefixes.

So even with a schema, the task is not possible in the general case.

Unfortunately you're not the first person to run across this problem. Defining the data model used by XPath was always plagued by the problem of QNames-in-content (or more generally, prefixes-in-content).

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thank you. Unfortunately, the access to the schema from the layer my code is in is costly. And, since as you said, even with the schema the generic solution is not possible, I should opt to cancel the processing if QName-looking attributes are found. – Vladimir Dyuzhev May 08 '18 at 18:51