This question involves the interplay between the XML 1.0 and HTTP 1.1 recommendations.
I have a web service that accepts a well-formed XML 1.0 document, parses it, and re-serializes it back to the client. The service supports both Content-Type text/xml
and application/xml
.
Suppose the following document is submitted as Content-Type: text/plain; charset=us-ascii
with Accept: text/plain
and Accept-Charset: us-ascii
:
<?xml version="1.0" encoding="UTF-8" ?>
<x>Inhoffenstraße</x>
The above document is well-formed and satisfies the encoding requirement.
Once parsed, the XML DOM is UTF-8. Since the encoding of the document is also UTF-8, the document would be re-serialized as:
<?xml version="1.0" encoding="UTF-8" ?>
<x>Inhoffenstraße</x>
The above document is not compatible with the Accept-Charset
header. However, there are at least three ways this request could be satisfied:
Serialize the DOM using encoding US-ASCII. This seems wrong and unnecessary because I am changing a fundamental property of the document, which may be misleading to the client (for instance, could this break something at the application layer, i.e., ESB/SOAP):
<?xml version="1.0" encoding="US-ASCII" ?> <x>Inhoffenstraße</x>
Post-process the serialized UTF-8 in the service layer by replacing non-ASCII characters with their Unicode character reference. This feels like a hack because XML-specific character encoding is being performed on the entire document using a non-XML-aware string transformation:
<?xml version="1.0" encoding="UTF-8" ?> <x>Inhoffenstraße</x>
Reject the request in the service layer as
406 Not Acceptable
. This would assume thatencoding="UTF-8"
is in conflict withAccept-Charset: us-ascii
. But, I don't think this is the case since the actual content of the request is composed entirely from ASCII characters.
What is the expected, standards-compliant behavior for the response? From my understanding of the referenced standards, any of the above might be acceptable.
The following answers to a different question provide some helpful information but do not specifically address the text/xml
case:
application/* Content-Type and charset attributes
I'm linking the following question because I believe it stems from a related problem:
Escaping Unicode string in XmlElement despite writing XML in UTF-8