0

I have some legacy code (1 million lines) written in Delphi 7 Pascal which for various reasons can't be upgraded to a more recent version of Delphi. The program outputs documents in about 30 languages and makes a very good job of producing the various characters in all languages apart from Turkish. The coding sets the charset to TURKISH_CHARSET (162). When it tries to print char #351 (ş, hex 15f), char #285 (ğ, hex 11f) or char #305 (ı, hex 131), it prints only "s", "g" or "i". It uses a simple

Printer.Canvas.TextOut(x, y, sText)  

to output the text.

I tried compiling the code on different machines and running it on different versions of Windows but always with the same result.

RRUZ
  • 134,889
  • 20
  • 356
  • 483
Chris Johnson
  • 59
  • 1
  • 1
  • 4
  • Insisting on using ANSI encoded text in 2016 is silly. – David Heffernan Aug 23 '16 at 09:36
  • When I google TURKISH_CHARSET I find that it corresponds to code page 1254. When I google code page 1254, I see a table of characters. It appears that the codes for these characters are $FE, $F0 and $FD. – David Dubois Aug 23 '16 at 19:26
  • TextOut takes a string parameter, which in Delphi 7 means ansistring. Each character in an ansistring is one byte. – David Dubois Aug 23 '16 at 19:27
  • @DavidDubois: Each `AnsiChar` element in an `AnsiString` is 1 byte, but an `AnsiString` can hold MBCS strings, where Unicode characters may be encoded using multiple bytes in some encodings. – Remy Lebeau Aug 23 '16 at 20:18
  • 2
    @DavidHeffernan: There are billions of lines of legacy code that can't readily be ported, and thousands of LOB applications for which source isn't available in order to port them. Making a blanket statement like that is silly - because your particular company doesn't use those sorts of application or have those legacy apps or codebases doesn't make everyone else wrong. Where you live is not the same as every other location on the planet, and where you work isn't the same as every other business in the world. – Ken White Aug 24 '16 at 00:40

1 Answers1

0

In Delphi 7, string is an alias for AnsiString, which encodes Unicode characters as 8-bit bytes using Windows codepages. In some MBCS codepages, Unicode characters may require multiple bytes (Turkish is not one of them, though).

Microsoft has several codepages for Turkish:

  • 857 (MS-DOS)
  • 1254 (Windows)
  • 10081 (Macintosh)
  • 28599 (ISO-8859-9)

In both codepages 1254 and 28599 (where 1254 is the most likely one you will run into), the Unicode characters in question are encoded in 8-bit as hex $FE (ş), $F0 (ğ), and $FD (ı).

Make sure your sText string variable actually contains those byte values to begin with, and not ASCII bytes $73 (s), $67 (g), and $69 (i) instead. If it contains the latter, you are losing the Turkish data before it even reaches Canvas.TextOut(). That would be an issue earlier in your code.

However, If sText contains the correct bytes, then the problem has to be on the OS side, as TCanvas.TextOut() is just a thin wrapper for the Win32 API ExtTextOutA() function, where sText gets passed as-is to the API. Maybe the particular font you are using doesn't support Turkish, or at least those particular characters. Or maybe there is a problem with the printer driver. Either way, you might have to resort to converting your sText value to a WideString using MultiByteToWideChar() and then call ExtTextOutW() (not ExtTextOutA()) directly, eg:

var
  wText: WideString;
  size: TSize;
begin
  //Printer.Canvas.TextOut(x, y, sText);
  SetLength(wText, MultiByteToWideChar(1254{28599}, 0, PAnsiChar(sText), Length(sText), nil, 0));
  MultiByteToWideChar(1254{28599}, 0, PAnsiChar(sText), Length(sText), PWideChar(wText), Length(wText)));
  Windows.ExtTextOutW(Printer.Canvas.Handle, x, y, Printer.Canvas.TextFlags, nil, PWideChar(wText), Length(wText), nil);
  size.cX := 0;
  size.cY := 0;
  Windows.GetTextExtentPoint32W(Printer.Canvas.Handle, PWideChar(wText), Length(wText), size);
  Printer.Canvas.MoveTo(x + size.cX, Y);
end;
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Thanks, Remy, that's really helpful. I've written a small function that diverts any attempts to print Turkish and puts the text into WideString format with WideString print routines (as in your code above). All coming out fine :-) – Chris Johnson Aug 24 '16 at 07:09