How to convert a WideString (or other long string) to byte array in UTF-8?
6 Answers
A function like this will do what you need:
function UTF8Bytes(const s: UTF8String): TBytes;
begin
Assert(StringElementSize(s)=1);
SetLength(Result, Length(s));
if Length(Result)>0 then
Move(s[1], Result[0], Length(s));
end;
You can call it with any type of string and the RTL will convert from the encoding of the string that is passed to UTF-8. So don't be tricked into thinking you must convert to UTF-8 before calling, just pass in any string and let the RTL do the work.
After that it's a fairly standard array copy. Note the assertion that explicitly calls out the assumption on string element size for a UTF-8 encoded string.
If you want to get the zero-terminator you would write it so:
function UTF8Bytes(const s: UTF8String): TBytes;
begin
Assert(StringElementSize(s)=1);
SetLength(Result, Length(s)+1);
if Length(Result)>0 then
Move(s[1], Result[0], Length(s));
Result[high(Result)] := 0;
end;

- 601,492
- 42
- 1,072
- 1,490
-
1@Cosmin No it will not. That's the thing about assertions! – David Heffernan Mar 08 '11 at 14:36
-
one question.. what unit do I have to add to use StringElementSize()?(lazarus). Sorry for such questions, im a newbie – Mariusz Mar 08 '11 at 14:53
-
@Mariusz What does your "lazarus" statement mean? You tagged the question Delphi. In Delphi it's in system.pas and so automatically used by all units. – David Heffernan Mar 08 '11 at 14:54
-
@Mariusz: You can remove the entire `Assert...` line. But since you tagged your question `Delphi`, and *not* `free-pascal`, @David's answer applies to Delphi, and not Free Pascal. But the code above *might* work in Free Pascal, too. I don't know. Try it. – Andreas Rejbrand Mar 08 '11 at 14:57
-
It is D2009+ specific code, and thus will not work on FPC which follows pre D2009 semantics. Passing a widestring (see original question) to a "UTF8string" will change it to the local encoding (NOT UTF-8 like in D2009+), and thus garble the string. FPC has special documented functions for this, see separate answer – Marco van de Voort Mar 09 '11 at 12:46
You can use TEncoding.UTF8.GetBytes
in SysUtils.pas

- 136,425
- 22
- 210
- 281
-
5Note that if the input string is *already* encoded as UTF-8, `GetBytes` will be very wasteful. The compiler will convert the input string to UnicodeString since that's the only string argument `GetBytes` allows, and the `GetBytes` will convert the characters back to UTF-8 to generate its result. – Rob Kennedy Mar 08 '11 at 15:04
If you're using Delphi 2009 or later (the Unicode versions), converting a WideString to a UTF8String is a simple assignment statement:
var
ws: WideString;
u8s: UTF8String;
u8s := ws;
The compiler will call the right library function to do the conversion because it knows that values of type UTF8String have a "code page" of CP_UTF8
.
In Delphi 7 and later, you can use the provided library function Utf8Encode
. For even earlier versions, you can get that function from other libraries, such as the JCL.
You can also write your own conversion function using the Windows API:
function CustomUtf8Encode(const ws: WideString): UTF8String;
var
n: Integer;
begin
n := WideCharToMultiByte(cp_UTF8, 0, PWideChar(ws), Length(ws), nil, 0, nil, nil);
Win32Check(n <> 0);
SetLength(Result, n);
n := WideCharToMultiByte(cp_UTF8, 0, PWideChar(ws), Length(ws), PAnsiChar(Result), n, nil, nil);
Win32Check(n = Length(Result));
end;
A lot of the time, you can simply use a UTF8String as an array, but if you really need a byte array, you can use David's and Cosmin's functions. If you're writing your own character-conversion function, you can skip the UTF8String and go directly to a byte array; just change the return type to TBytes
or array of Byte
. (You may also wish to increase the length by one, if you want the array to be null-terminated. SetLength will do that to the string implicitly, but to an array.)
If you have some other string type that's neither WideString, UnicodeString, nor UTF8String, then the way to convert it to UTF-8 is to first convert it to WideString or UnicodeString, and then convert it back to UTF-8.

- 161,384
- 21
- 275
- 467
var S: UTF8String;
B: TBytes;
begin
S := 'Șase sași în șase saci';
SetLength(B, Length(S)); // Length(s) = 26 for this 22 char string.
CopyMemory(@B[0], @S[1], Length(S));
end.
Depending on what you need the bytes for, you might want to include an NULL terminator.
For production code make sure you test for empty string. Adding the 3-4 LOC required would just make the sample harder to read.

- 25,498
- 2
- 60
- 104
-
1The string is not empty. It contains the value `'Șase sași în șase saci'` – Cosmin Prund Mar 08 '11 at 14:13
-
+1. Not everyone (to say the least!) knows how the `Length` function really works! – Andreas Rejbrand Mar 08 '11 at 14:14
-
@Cosmin I can see that the string is not empty. I just have a feeling that the OP may be interested in text other than `'Șase sași în șase saci'`. – David Heffernan Mar 08 '11 at 14:16
-
@Cosmin, @David: Surely @Cosmin was joking! (Indeed, David's point is very good.) – Andreas Rejbrand Mar 08 '11 at 14:16
-
I have the following two routines (source code can be downloaded here - http://www.csinnovations.com/framework_utilities.htm):
function CsiBytesToStr(const pInData: TByteDynArray; pStringEncoding: TECsiStringEncoding; pIncludesBom: Boolean): string;
function CsiStrToBytes(const pInStr: string; pStringEncoding: TECsiStringEncoding; pIncludeBom: Boolean): TByteDynArray;

- 1,816
- 1
- 13
- 16
widestring -> UTF8:
http://www.freepascal.org/docs-html/rtl/system/utf8decode.html
the opposite:
http://www.freepascal.org/docs-html/rtl/system/utf8encode.html
Note that assigning a widestring to an ansistring in a pre D2009 system (including current Free Pascal) will convert to the local ansi encoding, garbling characters.
For the TBytes part, see the remark of Rob Kennedy above.

- 25,628
- 5
- 56
- 89