0

I'm working on Delphi XE5 and Getting XML from server by using IDHTTP. Getting XML works fine but there are some broken character. The character is '•' (Bullet point). others are fine but the bullet point is broken.

I created IDHTTP like below:

idhttps := TIdHTTP.Create();
idhttps.IOHandler := TIdSSLIOHandlerSocketOpenSSL.Create(nil);
idhttps.IOHandler.DefStringEncoding := IndyTextEncoding(TEncoding.UTF8);
idhttps.HandleRedirects := True;
idhttps.ConnectTimeout := 5000;
idhttps.Request.USERNAME := 'USERNAME';
idhttps.Request.PASSWORD := 'PASSWORD';
idhttps.Request.BasicAuthentication := True;
idhttps.Request.Accept := 'text/xml';

And then getting xml like below:

SS := TStringStream.Create('', TEncoding.UTF8);

try
  self.GetIdHTTPForLexicomp.Get(URL, SS);
  XMLDoc := TXMLDocument.Create(nil);
  XMLDoc.LoadFromStream(SS, TXMLEncodingType.xetUTF_8Like);
finally
  SS.Free;
end;

In XML bullet point is displayed like below:

? Anaphylaxis/hypersensitivity: May cause hypersensitivity reactions,

XML header is below:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

What should i check?

Update: I added XML snippet. It needs an XSL file for style but in this case, I suppose that is not a problem. '?' is broken character.

<?xml version="1.0" standalone="yes"?>
<ns2:monogragh>
  <monograghFields>
    <field fieldId="234837" fieldTypeCode="war" created="2005-04-07T17:28:33Z" modified="2014-10-02T11:32:57Z" sectionId="0">
      <fieldName>Warnings/Precautions</fieldName>
      <content>
        <div id="war" class="block">
          <p style="text-indent:-2em;margin-left:2em;text-align:justify;">
            <b>
              <i>Concerns related to adverse effects:</i>
            </b>
          </p>
          <p style="text-indent:-2em;margin-left:4em;text-align:justify;">
            ? Anaphylaxis/hypersensitivity: May cause hypersensitivity reactions, including anaphylaxis; use with caution in patients with anaphylactic disorders.
          </p>
        </div>
      </content>
    </field>
  </monograghFields>
</ns2:monogragh>

It looks like i gave mis-information. i attached captured xml snippet. the first one is the result getting from browser by using rest client tool and the last one is the result getting xml thru idhttp.

Getting XML from browser by using rest client tool.

Getting XML thru idhttp

Kevin Son
  • 15
  • 2
  • 6
  • How are you "displaying" the XML? What is the actual XML content you're receiving? Your question does not include that information, so it's difficult to say whether it's a problem with the XML itself or how you're displaying it. – Ken White Feb 06 '15 at 04:01

1 Answers1

5
  1. Do not set the IOHandler.DefStringEncoding property when using TIdHTTP. Let TIdHTTP handle encodings its own ways.

  2. Using a TStream to receive the XML is the correct choice. However, using a TStringStream in particular is not a good choice, because it is bound to the TEncoding you specify in the constructor. If the XML is not encoded in the same charset that the TEncoding implements, the XML would not be decoded properly. Use a TMemoryStream or TBytesStream instead, to preserve the original XML bytes as-is.

  3. XML is self-describing when it comes to its encoding. Do not tell TXMLDocument the encoding it should use, let the XML itself tell TXMLDocument which encoding to use.

Try this:

idhttps := TIdHTTP.Create();
idhttps.IOHandler := TIdSSLIOHandlerSocketOpenSSL.Create(idhttps);
idhttps.HandleRedirects := True;
idhttps.ConnectTimeout := 5000;
idhttps.Request.USERNAME := 'USERNAME';
idhttps.Request.PASSWORD := 'PASSWORD';
idhttps.Request.BasicAuthentication := True;
idhttps.Request.Accept := 'text/xml';

MS := TMemoryStream.Create;
try
  idhttps.Get(URL, MS);
  MS.Position := 0;
  XMLDoc := TXMLDocument.Create(nil); // XMLDoc must be IXMLDocument, or a memory leak occurs
  XMLDoc.LoadFromStream(MS);
finally
  MS.Free;
end;

Now, TXMLDocument should be parsing the raw bytes that the server actually sends, without any interpretation by TIdHTTP or the RTL beforehand.

If you are still having the same problem, then either the XML itself is not properly encoded to begin with, or you are not processing/displaying the XML correctly after it has been loaded into TXMLDocument. Neither of which you have shown yet, so we can only guess where your actual problem lies, outside of what I mentioned above.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Thanks, Remy. I tried your code but there is still broken character, bullet point. With browsers, it works well but with idhttp only bullet point is broken. I'm working on Delphi XE5 and default idhttp component. Do i need to update idhttp version or something else? – Kevin Son Feb 09 '15 at 01:12
  • As I said, the code I gave you downloads the XML **as-is**, exactly as the server sends it. So either the XML itself is broken, or you are not processing/displaying it correctly. But since you have not shown the actual XML (or the URL where the XML is coming from), or how you are processing/displaying the XML, there is no way to tell you what is going wrong. This is not an Indy issue, though. If you want further help, you need to provide those details. – Remy Lebeau Feb 09 '15 at 01:15
  • Thanks, Remy. It's a kind of running product so regrettably, i can't give you information about the url and its login information. But you gave me a clue. – Kevin Son Feb 09 '15 at 03:43
  • What clue is that? You showed an XML that has a `?` character in it, and say that is broken. Is that `?` present in the raw XML that `TIdHTTP` is receiving from the server? – Remy Lebeau Feb 09 '15 at 04:04
  • As i already told you, viewing it on browsers are fine, it displays bullet point well but getting xml thru idhttp only bullet point is broken. Attached xml code is the result of getting xml thru idhttp. – Kevin Son Feb 10 '15 at 03:52
  • I solved it. As you said, I was not an indy issue. It was memory stream encoding issue. I looked into XMLDocument created with MemoryStream by using XML.SaveToFile() and as you know it displays bullet point well. So i trace my code and found what was wrong. To write HTML code to TWebbrowser, i set HTML code to StringList and then Save it to MemorySteam. At this step i didn't set MemoryStream's encoding. I gave me a CLUE. Thank you! – Kevin Son Feb 10 '15 at 09:08
  • Thanks @RemyLebeau, this helped me a lot – Wellington Silva Ribeiro Mar 03 '17 at 15:31