2

In SQL Server 2014, I try to add an XML element with an attribute (that contains a carriage return) using the 'modify' method on the XML datatype. The carriage returns gets removed - why is that?

Example:

declare @xmldata xml

select 
    @xmldata = '<root><child myattr="carriage returns &#xD;&#xA; are not a problem"></child></root>'

set 
   @xmldata.modify('insert <child>modifying text with carriage returns works&#xD;&#xA;ok</child> after (//child)[1]')

set 
   @xmldata.modify('insert <child myattr="but not&#xD;&#xA;attribute values... why is that?"></child> after (//child)[2]')

select @xmldata

Result:

<root>
  <child myattr="carriage returns &#xD;&#xA; are not a problem" />
  <child>modifying text with carriage returns works
ok</child>
  <child myattr="but not attribute values... why is that?" />
</root>
James A Mohler
  • 11,060
  • 15
  • 46
  • 72

2 Answers2

3

White space characters can be normalized by parsers.

cf http://www.w3.org/TR/1998/REC-xml-19980210#AVNormalize

While your XML is valid, how exactly white space is rendered is implementation dependent. As you can see the crlf was replaced with a single space.

Please note

In general XML works different with Content and Structural/Meta Data

Attribute values are considered structure and data between tags is considered content.

In the design of XML it was never expected that attributes would be displayed on end-user devices, I would suggest you just make another tag for this end user content.

Community
  • 1
  • 1
Hogan
  • 69,564
  • 10
  • 76
  • 117
  • It's not a rendering issue as the CRLF is stored and rendered correctly in the first example (). – Thomas Boel Sigurdsson Nov 23 '15 at 11:57
  • @ThomasBoelSigurdsson --- you are wrong... AS THE SPECIFICATION SAYS the requirements for rendering one are different than the requirements for rendering the other. I strongly recommend you read the XML specification it is not long and you can see how the rules for atrributes of tags are very different than the rules for document content. But it should be clear... one is "content" the other is structural or meta data. – Hogan Nov 23 '15 at 15:27
  • I hate to disagree with you - but I believe you're wrong. Please have a look at my example again. I have two elements that both have an attribute called "myattr". Both attributes should have a carriage return+newline - but only one of them does. The one that is created using the "modify" method doesn't as they are somehow stripped. – Thomas Boel Sigurdsson Nov 24 '15 at 16:50
  • @ThomasBoelSigurdsson - How am I wrong -- this functionality is platform dependent -- unless your platform software documents a different expected functionality it is working to specification and as designed. However, my answer "note" was wrong so I edited it. – Hogan Nov 24 '15 at 20:01
3

Section 3.3.3, Attribute-Value Normalization

Before the value of an attribute is passed to the application or checked for validity, the XML processor MUST normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

  1. All line breaks MUST have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.
  2. Begin with a normalized value consisting of the empty string.
  3. For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:

    For a character reference, append the referenced character to the normalized value. For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity. For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value. For another character, append the character to the normalized value. If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

The XML specification demands that your CR/LF in an attribute is converted to a single space.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569