12

I finally upgraded to Delphi XE. I have a library of units where I use strings to store plain ANSI characters (chars between A and U). I am 101% sure that I will never ever use UNICODE characters in those places.

I want to convert all other libraries to Unicode, but for this specific library I think it will be better to stick with ANSI. The advantage is the memory requirement as in some cases I load very large TXT files (containing ONLY Ansi characters). The disadvantage might be that I have to do lots and lots of typecasts when I make those libraries to interact with normal (unicode) libraries.

There are some general guidelines to show when is good to convert to Unicode and when to stick with Ansi?

Dalija Prasnikar
  • 27,212
  • 44
  • 82
  • 159
Gabriel
  • 20,797
  • 27
  • 159
  • 293

6 Answers6

12

The problem with general guidelines is that something like this can be very specific to a person's situation. Your example here is one of those.

However, for people Googling and arriving here, some general guidelines are:

  • Yes, convert to Unicode. Don't try to keep an old app fully using AnsiStrings. The reason is that the whole VCL is Unicode, and you shouldn't try to mix the two, because you will convert every time you assign a Unicode string to an ANSI string, and that is a lossy conversion. Trying to keep the old way because it's less work (or some similar reason) will cause you pain; just embrace the new string type, convert, and go with it.

  • Instead of randomly mixing the two, explicitly perform any conversions you need to, once - for example, if you're loading data from an old version of your program you know it will be ANSI, so read it into a Unicode string there, and that's it. Ever after, it will be Unicode.

  • You should not need to change the type of your string variables - string pre-D2009 is ANSI, and in D2009 and alter is Unicode. Instead, follow compiler warnings and watch which string methods you use - some still take an AnsiString parameter and I find it all confusing. The compiler will tell you.

  • If you use strings to hold bytes (in other words, using them as an array of bytes because a character was a byte) switch to TBytes.

  • You may encounter specific problems for things like encryption (strings are no longer byte/characters, so 'character' for 'character' you may get different output); reading text files (use the stream classes and TEncoding); and, frankly, miscellaneous stuff. Search here on SO, most things have been asked before.

Commenters, please add more suggestions... I mostly use C++Builder, not Delphi, and there are probably quite a few specific things for Delphi I don't know about.

Now for your specific question: should you convert this library?

If:

  • The values between A and U are truly only ever in this range, and
  • These values represent characters (A really is A, not byte value 65 - if so, use TBytes), and
  • You load large text files and memory is a problem

then not converting to Unicode, and instead switching your strings to AnsiStrings, makes sense.

Be aware that:

  • There is an overhead every time you convert from ANSI to Unicode
  • You could use UTF8String, which is a specific type of AnsiString that will not be lossy when converted, and will still store most text (Roman characters) in a single byte
  • Changing all the instances of string to AnsiString could be a bit of work, and you will need to check all the methods called with them to see if too many implicit conversions are being performed (for performance), etc
  • You may need to change the outer layer of your library to use Unicode so that conversion code or ANSI/Unicode compiler warnings are not visible to users of your library
  • If you convert to Unicode, sets of characters (can't remember the syntax, maybe if 'S' in MySet?) won't work. From your description of characters A to U, I could guess you would like to use this syntax.

My recommendation? Personally, the only reason I would do this from the information you've given is the memory use, and possibly performance depending on what you're doing with this huge amount of A..Us. If that truly is significant, it's both the driver and the constraint, and you should convert to ANSI.

Community
  • 1
  • 1
David
  • 13,360
  • 7
  • 66
  • 130
  • Thanks David. I started to convert this library to Ansi and it makes sense. I also see that the interaction between this library and other classic (unicode) library is not as big as I have feared. Mostly I have to 'print' those A-U strings on a canvas. Too bad I haven't got the idea with UTF8String earlier. I already started my conversion. BUT if I see problems, I will definitively think to it. Thanks again. – Gabriel May 19 '11 at 09:44
4

You should be able to wrap up the conversion at the interface between this unit and its clients. Use AnsiString internally and string everywhere else and you should be fine.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • @Altar - I think David has summed it up very neatly (he usually does). And I'm not sure why you would need 'lots and lots of typecasts' if you have discreet well designed interfaces for accessing the Ansi data - just use System.StringToWideChar in functions on the threshold between Ansi and Unicode - see ms-help://embarcadero.rs_xe/vcl/System.StringToWideChar.html in XE help "Returns a UNICODE string from an AnsiString." HTH MN – Vector May 20 '11 at 05:16
  • I know - that's what I myself do and I've never had a problem - but since there's a documented VCL function, I figured that carries more weight... :-) – Vector May 20 '11 at 06:21
3

In general only use AnsiString if it is important that the Chars are single bytes, Otherwise the use of string ensures future compatibility with Unicode.

Mike Taylor
  • 2,376
  • 2
  • 17
  • 33
0

You need to check all libraries anyway because all Windows API functions in Delhpi XE replaced by their unicode-analogues, etc. If you will never use UNICODE you need to use Delphi 7.

Dow Harris
  • 13
  • 2
  • 1
    Delphi 2007 will work fine if you don't need Unicode and is a lot more up-to-date. – Johan May 18 '11 at 19:03
  • I didn't said that I don't use UNICODE (even though its true, I don't care that much about it). I use ANSI strings ONLY for this specific library because I use a reduce alphabet and I need small strings to decrease memory requirements. – Gabriel May 19 '11 at 09:46
0

Use AnsiString explicitly everywhere in this unit and then you'll get compiler warning errors (which you should never ignore) for String to AnsiString conversion errors if you happen to access the routines incorrectly.

Alternately, perhaps preferably depending on your situation, simply convert everything to UTF8.

Darian Miller
  • 7,808
  • 3
  • 43
  • 62
0

Stick with Ansi strings ONLY if you do not have the time to convert the code properly. The use of Ansi strings is really only for backward compatibility - to my knowledge C# does not have an equiavalent to Ansi strings. Otherwise use the standard Unicode strings. If you have a look on my web-site I have a whole strings routines unit (about 5,000 LOC) that works with both Delphi 2007 (non-Uniocde) and XE (Unicode) with only "string" interfaces and contains almost all of the conversion issues you might face.

Misha
  • 1,816
  • 1
  • 13
  • 16
  • "The use of Ansi strings is really only for backward compatibility" - - - Actually, in my case is for memory requirements. UNICODE will make my program to require 2x more RAM! This will push over the limit of normal (2-4GB RAM) computers today. All other libraries of mine have been already converted to UNICODE. – Gabriel May 19 '11 at 09:49