-4

I'm trying to use a Stringlist to load a CSV file generated by Google Contacts. When i open this file in an text editor like Sublime Text, i can see the contents properly, with 75 lines. This is a sample from the Google Contacts file :

Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory Server,Mileage,Occupation,Hobby,Sensitivity,Priority,Subject,Notes,Group Membership,Phone 1 - Type,Phone 1 - Value,Phone 2 - Type,Phone 2 - Value,Phone 3 - Type,Phone 3 - Value
H,H,,,,,,,,,,,,,   1-01-01,,,,,,,,,,,,* My Contacts ::: Importado 01/02/16,,,,,,
H - ?,H,-,?,,,,,,,,,,,   1-01-01,,,,,,,,,,,,* My Contacts ::: Importado 01/02/16,Mobile,031-863-64393,,,,
H - ?,H,-,?,,,,,,,,,,,,,,,,,,,,,,,* My Contacts ::: Importado 01/02/16,Mobile,031-986-364393,,,,

BUT when i try to load this same file using Stringlist, this is what i see in the Stringlist.text property :

'ÿþN'#$D#$A

Here is my code :

procedure Tform1.loadfile;
var sl : tstringlist;
begin
sl := tstringlist.create;
sl.loadfromfile('c:\google.csv');
showmessage('lines : '+inttostr(sl.count)+' / text : '+ sl.text);
end;

This is the result i get :

'1 / 'ÿþN'#$D#$A'

What is happening here ?

Thanks

delphirules
  • 6,443
  • 17
  • 59
  • 108
  • Can you show a hex dump of the first few bytes of the file – David Heffernan Feb 03 '16 at 11:37
  • The two bytes: ÿþ looks like BOM for Unicode encoding – Blurry Sterk Feb 03 '16 at 11:39
  • @BlurrySterk I'm using Delphi 2007, so i can't load this file ? – delphirules Feb 03 '16 at 11:43
  • Can you please show us the binary content of the file, and then we can tell you what to do next. And then you can take the opportunity to learn all about text encodings. – David Heffernan Feb 03 '16 at 11:44
  • @DavidHeffernan Do you mean i upload the file ? – delphirules Feb 03 '16 at 11:48
  • 1
    That would do it. But if it were me I would load the file in a hex editor and look at the hex dump. Do you not have a hex editor to hand? Do you understand text encodings? Does Sublime tell you what the encoding is? – David Heffernan Feb 03 '16 at 11:49
  • @DavidHeffernan Yes, it shows the file is UTF-8. I don't know much about text encodings. – delphirules Feb 03 '16 at 11:53
  • 1
    Lets not waste any more time; Please get a viewer that can display files in HEX format. Then take a screen capture of that display and then upload only the first few lines of that image here. Or copy the first few lines displayed text and put it in your question above. – Blurry Sterk Feb 03 '16 at 11:57
  • 1
    The first three bytes of a UTF-8 encoded file will be EF BB BF. On your first paste of the stringlist.text: 'ÿþN'#$D#$A there is an apostrophe at the beginning. Is that part of the file as well? – Blurry Sterk Feb 03 '16 at 12:00
  • @BlurrySterk This is what the Delphi's watch shows me, exactly as i posted. – delphirules Feb 03 '16 at 12:04
  • Then the apostrophe is not part of the content. We will not be able to help until you do as David and I requested. – Blurry Sterk Feb 03 '16 at 12:06
  • I used an online editor, here is the result : http://s30.postimg.org/iyb2ww68h/screen.jpg – delphirules Feb 03 '16 at 12:09
  • 3
    This information is key to the question. Please include it in the question. With an edit. As someone with 50+ questions here to date I feel you should know this by now. – David Heffernan Feb 03 '16 at 12:15
  • 1
    Please read through: https://www.embarcadero.com/images/dm/technical-papers/delphi-and-unicode-marco-cantu.pdf. It will explain a lot about encoding. – Blurry Sterk Feb 03 '16 at 12:21

2 Answers2

2

According to the hex dump you provided, the BOM indicates that your file is encoded using UTF-16LE. You a few options in front of you, as I see it:

  1. Switch to Unicode and use the TnT Unicode controls to work with this file.
  2. Read the file as an array of bytes. Convert to ANSI and then continue using ANSI encoded text. Obviously you'll lose information for any characters than cannot be encoded by your ANSI code page. A cheap way to do this would be to read the file as a byte array. Copy the content after the first two bytes, the BOM, into a WideString. Then assign that WideString to an ANSI string.
  3. Port your program to a Unicode version of Delphi (anything later than Delphi 2007) and work natively with Unicode.

I rather suspect that you are not very familiar with text encodings. If you were then I think you would have been able to answer the question yourself. That's just fine but I urge you to take the time to learn about this issue properly. If you rush into coding now, before having a sound grounding, you are sure to make a mess of it. And we've seen so many people make that same mistake. Please don't add to the list of text encoding casualties.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
-2

Thanks to the information of David, i could achieve the task by using the function below ; because Delphi 2007 does not have unicode support, it needs third-party function to do it.

procedure loadUnicodeFile( const filename: String; strings: TStringList);
Procedure SwapWideChars( p: PWideChar );
Begin
While p^ <> #0000 Do Begin
// p^ := Swap( p^ ); //<<< D3
p^ := WideChar( Swap( Word(p^)));
Inc( p );
End; { While }
End; { SwapWideChars }

Var
ms: TMemoryStream;
wc: WideChar;
pWc: PWideChar;
Begin
ms:= TMemoryStream.Create;
try
ms.LoadFromFile( filename );
ms.Seek( 0, soFromend );
wc := #0000;
ms.Write( wc, sizeof(wc));

pWC := ms.Memory;
If pWc^ = #$FEFF Then // normal byte order mark
Inc(pWc)
Else If pWc^ = #$FFFE Then Begin // byte order is big-endian
SwapWideChars( pWc );
Inc( pWc );
End { If }
Else; // no byte order mark
strings.Text := WideChartoString( pWc );
finally
ms.free;
end;
End;
delphirules
  • 6,443
  • 17
  • 59
  • 108