1

I have been using superobject for all my json parsing needs and today I ran into a bit of a problem that I cannot seem to fix. I downloaded a json file that had an entry in it that looked like this: "place" : "café"and when I tried to parse the file and show it in a messagebox the word café turned out like this: café which tells me that the there is some kind of conversion failure going on when the file was parsed using superobject so before I invest any more time in this library, I would like to know if it supports UTF-8 and if so, how would I go about enabling it.

BTW, The pseudo code I am using to parse the file looks something like this:

uses 
SuperObject
...

const
jsonstr = '{ "Place" : "café" }';
...

var
  SupOB : ISuperObject;
begin
  SupOB := SO(jsonstr);
  ShowMessage(SupOB['Place'].AsString);
end;

Is the conversion failing because I am casting the object as a string? I tried also using AsJsonto see if that would have any effect, but it did not so I am not sure what is needed to make objects like these display as they are intended and would appreciate some help. Finally, I have checked and verified that the original file that is being parsed is indeed encoded as UTF-8.

2 Answers2

1

You say you are parsing a file, but your example is parsing a string. That makes a big difference, because if you are reading file data into a string first, you are likely not reading the file data correctly. Remember that Delphi strings use UTF-16 in Delphi 2009 and later, but use ANSI in earlier versions. Either way, not UTF-8. So if your input file is UTF-8 encoded, you must decode its data to the proper string encoding before you can then parse it. café is the UTF-8 encoded form of café being mis-interpreted as ANSI.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • 1
    You need to know the encoding of the file (which you do: UTF-8, which is JSON's default), and you need to know the target encoding you wish to parse (which you do: ANSI or UTF-16, depending on Delphi version), then you can convert data from one encoding to the other encoding when you are reading the file. If you are using Delphi 2009+, you can use `TStreamReader` or `TEncoding` directly to help you with that conversion. If you are using an earlier version, use the Win32 API `MultiByteToWideChar()` and `WideCharToMultiByte()` functions. Or, use a third-party library, such as ICU or iconv. – Remy Lebeau Aug 26 '14 at 01:27
  • 1
    Or, since you know the file is UTF-8 encoded, you can read the file data directly into a `UTF8String`, and then in D2009+ you can pass that as-is to SuperObject and let the RTL convert it to UTF-16 automatically, and in earlier versions you can pass it to `Utf8ToAnsi()` first and then pass the output to SuperObject, again letting the RTL handle the conversion for you. Hard to advise you one way or the other without knowing which version of Delphi you are actually using. – Remy Lebeau Aug 26 '14 at 01:31
  • Thanks again. I guess the part I don't understand how to use TEncoding.UTF8 from superobject using it's method to parse files which is TSuperObject.ParseFile('somefile.ext',False); I tried inserting TEndocding.UTF8 after false and of course the application woudl not compile. Would loading it into a stringlist then parsing it with SuperObject work? Sorry for so many questions btw. – kathrine jensen Aug 26 '14 at 01:33
  • Nvm, I tested loading the file into a StringList like I asked which allowed me to use TEncoding.UTF8 and then using the superobject helper class to get the contents of the stringlist and it worked. As you can probably tell. I am new to programming, so I apologize for such silly questions and thank you again for your time. – kathrine jensen Aug 26 '14 at 01:39
  • I was looking at SuperObject's code earlier and I missed the `TSuperObject.ParseFile()` method. Looking at it now, I see that if the file has a UTF-16 BOM in front then `ParseFile()` (and `ParseStream()`) will handle the file data as UTF-16, otherwise it assumes ANSI instead, not even UTF-8 (which is odd since UTF-8 is part of the JSON specification). Your file is not UTF-16 encoded with a BOM, so that explains why SuperObject parses `café` as `café` - it is not handling UTF-8 at all. – Remy Lebeau Aug 26 '14 at 01:54
  • So yes, you would have to decode the file data yourself from UTF-8 to UTF-16, such as with `TStringList` (though `TStringList.LoadFrom...()` in D2009+ is very inefficient at how it loads large files/streams, so you might consider using `TStreamReader` for the loading portion). – Remy Lebeau Aug 26 '14 at 01:56
  • I think most people use SuperObject to parse JSON strings in memory, where the strings natively match the same encoding of the parser, so there probably has not been much need for it to support UTF-8 encoded files/streams properly. – Remy Lebeau Aug 26 '14 at 02:02
  • Thank you for all your assistance. I just thought it was my lack of experience that was keeping me from parsing the information correctly. About the TStreamReader vs TStringList. I will look into that post haste because while my files are rather small now, they many not always be and it's better to be ahead of the game than behind. Thanks again. – kathrine jensen Aug 26 '14 at 02:17
0

Reading and writing files json encoded utf8. Tested on Delphi 2007.

function ReadSO(const aFileName: string): ISuperObject;
var
  input: TFileStream;
  output: TStringStream;
begin
  input := TFileStream.Create(aFileName, fmOpenRead, fmShareDenyWrite);
  try
     output := TStringStream.Create('');
     try
       output.CopyFrom(input, input.Size);
       Result := TSuperObject.ParseString(PWideChar(UTF8ToUTF16(output.DataString)), true, true);
     finally
       output.Free;
     end;

  finally
    input.Free;
  end;
end;

procedure WriteSO(const aFileName: string; o: ISuperObject);
var
  output: TFileStream;
  input: TStringStream;
begin
  input := TStringStream.Create(UTF16ToUTF8(o.AsJSon(true)));
  try
     output := TFileStream.Create(aFileName, fmOpenWrite or fmCreate, fmShareDenyWrite);
     try
       output.CopyFrom(input, input.Size);
     finally
       output.Free;
     end;
  finally
    input.Free;
  end;
end;

Functions UTF8ToUTF16 and UTF16ToUTF8 from unit JclConversions http://sourceforge.net/projects/jcl/.

Plasticut
  • 67
  • 1
  • 1