Lazarus: StringReplace ineffective when working with files (unicode issue)

Question

I'm using Lazarus to build a simple app that builds Outlook signatures based on a template. The idea is to extract the template (a ZIP file), and replace variables within the files it contains.

For example, I may want to replace {fullname} with the name provided by the user.

I am currently using the implementation below, but it seems to be ineffective. The file is read and written to, but it appears the replacements are not being made. I have tested to see if my implementation of TFileStream is not correct, but using WriteAnsiString to append dummy text onto the end of the output file works.

Please would you kindly have a look at my code below and let me know what I may have done wrong, or if there are any better alternatives to StringReplace? I am aware that one can use TStringList - however, doing so breaks line endings. As memos and rich edits use TStringList, using those won't help either.

Update:

I have seen this, but using AnsiString makes no difference. If I'm not mistaken, FPC uses it by default anyway, instead of UnicodeString.

Update 2:

Indeed, AnsiString is the default. Using a unicode string (which makes the replacements work) adds ? to the beginning and end of the file. Why would it do that?

function multiStringReplace(const s: string; search, replace : array of string; flags : tReplaceFlags): string;
var c : cardinal;
begin
    assert(length(search) = length(replace), 'Array lengths differ.');
    result := s;
    for c := low(search) to high(search) do
        result := stringReplace(result, search[c], replace[c], flags);
end;

procedure fileReplaceString(const fileName: string; search, replace: array of string);
var
    fs: tFileStream;
    s: string;
begin
    fs := tFileStream.create(fileName, fmOpenRead or fmShareDenyNone);
    try
        setLength(s, fs.size);
        fs.readBuffer(s[1], fs.size);
    finally
        fs.free();
    end;
    s := multiStringReplace(s, search, replace, [rfReplaceAll, rfIgnoreCase]);
    fs := tFileStream.create(fileName, fmOpenWrite);
    try
        fs.writeBuffer(s[1], length(s));
    finally
        fs.free();
    end;
end;

Usage:

fileReplaceString(currentFile, ['{fullname}'], ['Full Name']);

Is your template file contains UTF16 text? If yes - try to recode it to UTF8 (native encoding for Lazarus). — Abelisto, Jul 27 '15 at 16:40
It appears that the `.txt` file Outlook generates is indeed UTF-16LE (whilst the `.htm` file is Windows-1252). How can I chck the encoding of the file in question and convert accordingly? — Mike Rockétt, Jul 27 '15 at 16:48
About text transcoding you can read [here](http://wiki.lazarus.freepascal.org/Multiplatform_Programming_Guide#Text_encoding) for example. However for my little test file in UTF-16LE function `GuessEncoding` returns `utf8` for some unknown reason. Using hardcoded source encoding works fine: `ShowMessage(ConvertEncoding(s, EncodingUCS2LE, EncodingUTF8));` Do not forget to use `LConvEncoding` unit. — Abelisto, Jul 27 '15 at 17:13
Thanks! That indeed looks very useful - shall give it a try tomorrow and report back. — Mike Rockétt, Jul 27 '15 at 17:23
@Abelisto - that indeed does the trick, thank you. Just used `s := convertEncoding(s, guessEncoding(s), encodingUTF8);` before replacement, and now all works as expected. And Outlook doesn't seem to disagree. — Mike Rockétt, Jul 28 '15 at 14:33
You can to post your corrected code piece as answer and accept it to mark your question as answered. — Abelisto, Jul 28 '15 at 15:48

score 1 · Accepted Answer · answered Jul 28 '15 at 16:04

Thanks to Abelisto's comment above, it appears the issue is due to the fact that Outlook saves the three files it creates with different encodings. To get around it, I simply used convertEncoding and guessEncoding from lconvencoding, as below:

uses
    lconvencoding;

// Read string
s := convertEncoding(
    multiStringReplace(s, search, replace, [rfReplaceAll, rfIgnoreCase]),
    guessEncoding(s), encodingAnsi
);
// Write modified and converted string back to file

encodingAnsi appears to be the best conversion, at least in my case. Converting to UTF8 (with or without BOM) caused a bit of a headache with certain characters, specifically EmDash or EnDash.

Lazarus: StringReplace ineffective when working with files (unicode issue)

1 Answers1