faster alternative to InttoStr/StrToInt?

Question

I wonder if there are faster alternative than System.IntToStr / System.StrToInt. There is a fast version but only UTF8. Which is Int32ToUTF8 from SynCommons.pas and due to slow string conversions it is bound to be slow. The purepascal RTL versions are really slow for 64 bit.

The slowest part of IntToStr is the heap allocation. I would use sprintf from msvc. — David Heffernan, Feb 22 '14 at 17:24
If you really need a very fast routine. What should it output (Wide of ANSI chars) and should it output into a buffer or into a dynamically allocated string. Also is x86/x64 asm implementation OK next to a pure pascal version for other platforms or is asm out of the question? — Ritsaert Hornstra, Feb 22 '14 at 18:04
@RitsaertHornstra Widechars ofcourse. It should output in to result string. Doesn't matter aslong as its fast. I think this will help a lot of developers. — user3323367, Feb 22 '14 at 18:08
"...and due to slow string conversions it is bound to be slow. " It sounds like you're falling prey to premature optimization! Is it likely converting between ints and strings is going to be the bottleneck for your application? — alcalde, Feb 22 '14 at 18:33
@alcalde Converting between `UTF8` and `String` is painfully slow. And therefore incompatible with VCL. And other D2010+ components. — user3323367, Feb 22 '14 at 18:40
@user3323367: UTF8 conversion of known "integer" content is exactly the same as just widening the bytes to words and is relatively cheap; you won't need an UTF8 decode at all... Where do you need the converted numbers. In an export or somewhere in a concatenated string or in a separate string. For example I rarely need strings at all because the strings are almost always concatenated afterwards into something larger. If the end is a bytewise buffer you should use AnsiStrings and not strings just because it's "VCL". VCL is GUI and rarely needs high speed (limited # of chars are shown anyway). — Ritsaert Hornstra, Feb 22 '14 at 19:03
@RitsaertHornstra Sorry but VCL in Delphi XE5 uses `String`. Not to mention other functions. `StringReplace` and others. — user3323367, Feb 22 '14 at 19:09
@user If you really care about perf, stop using string and heap allocation — David Heffernan, Feb 22 '14 at 21:09
@DavidHeffernan Might aswell use Pointers and C++ at that point. — user3323367, Feb 24 '14 at 11:36
@user3323367 Not really. Using pointers is no guarantee of performance. Indeed indexing in a loop is generally faster than incrementing pointers in a loop. Here are my functions for this (http://pastebin.com/aKqYCJUv). The significant benefit is in avoiding the heap allocation. — David Heffernan, Feb 24 '14 at 11:42
You can for sure optimise the actual conversion to decimal. But the big gain is in avoiding the heap allocs. Especially if you have multiple threads. Here's a more complete pastebin with some code I missed in the first version: http://pastebin.com/MYc985DT — David Heffernan, Feb 24 '14 at 11:48
OK, I added an answer with the code I use mostly to make sure that the code can be seen by future visitors, in case it proves useful. — David Heffernan, Feb 24 '14 at 11:53
A good alternative to strtoint or strtointdef is what i'd be interested in reading here,. all i see is unfortunately inttostr, which isn't really that "hard" to come by or optimize. — T.S, Oct 04 '19 at 13:37

Ritsaert Hornstra · Accepted Answer · 2014-02-23T11:31:45.120

This routine is approximately 40% faster than the routine in the RTL. It could be much faster if you worked with WideChar[] buffers because the string allocation is taking up 75% of the time used by the conversion routine:

IntS32ToWide: 5,50 ns/item (PWideChar)
IntToStr: 34,51 ns/item (RTL)
IntS32ToStr: 24,77 ns/item (RTL replacement)

Please note that the routine below uses SSE2 and only x86 and x64 versions are fully implemented and tested.

In the initialization:

function IntU32ToWide( X: Longword; P: PWideChar ): PWideChar; register;
function IntS32ToWide( X: Integer;  P: PWideChar ): PWideChar; register;
function IntS32ToStr ( X: Longword ): UnicodeString; register; inline;

In the implementation:

{$CODEALIGN 16}
{$ALIGN 16}

const
  DigitsClippedW: array [ 0..99 ] of LongWord = (
    $000030, $000031, $000032, $000033, $000034, $000035, $000036, $000037, $000038, $000039,
    $300031, $310031, $320031, $330031, $340031, $350031, $360031, $370031, $380031, $390031,
    $300032, $310032, $320032, $330032, $340032, $350032, $360032, $370032, $380032, $390032,
    $300033, $310033, $320033, $330033, $340033, $350033, $360033, $370033, $380033, $390033,
    $300034, $310034, $320034, $330034, $340034, $350034, $360034, $370034, $380034, $390034,
    $300035, $310035, $320035, $330035, $340035, $350035, $360035, $370035, $380035, $390035,
    $300036, $310036, $320036, $330036, $340036, $350036, $360036, $370036, $380036, $390036,
    $300037, $310037, $320037, $330037, $340037, $350037, $360037, $370037, $380037, $390037,
    $300038, $310038, $320038, $330038, $340038, $350038, $360038, $370038, $380038, $390038,
    $300039, $310039, $320039, $330039, $340039, $350039, $360039, $370039, $380039, $390039 );

// Delphi XE3 has no working alignment for 16 bytes for data but it has alignment for 16 bytes for code!
// So we encode our constants as a procedure and use constant offsets to the data.
const
  Div10000_Shl45d        = $00;
  Shl16_minus_10000d     = $10;
  Div_1000_100_10_1w     = $20;
  Shl_1000_100_10_1w     = $30;
  Mul_10w                = $40;
  To_Asciiw              = $50;
  Mul_10000d             = $60;
  Div100_Shl19w          = $70;
  Mul100w                = $80;
  Div10_shl16w           = $90;
  To_Asciib              = $A0;

procedure IntUToStrConsts();
asm
  {$if defined( CPUX64 )}.NOFRAME{$ifend}
  dd $d1b71759,         $d1b71759,          $d1b71759,         $d1b71759;          // RoundUp( 2^45 / 10000 )
  dd $10000 - 10000,    $10000 - 10000,     $10000 - 10000,    $10000 - 10000;     // 1 shl 16 - 1e4
  dw 8389,    5243,     13108,    $8000,    8389,    5243,     13108,    $8000;    // 1000 100 10 1 div
  dw 1 shl 7, 1 shl 11, 1 shl 13, 1 shl 15, 1 shl 7, 1 shl 11, 1 shl 13, 1 shl 15; // 1000 100 10 1 shr
  dw 10,      10,       10,       10,       10,      10,       10,       10;       // 10
  dw $30,     $30,      $30,      $30,      $30,     $30,      $30,      $30;      // To Unicode / ASCII
  dd 10000,             10000,              10000,             10000;              // 10000
  dw $147b,   $147b,    $147b,    $147b,    $147b,   $147b,    $147b,    $147b     // RoundUp( 2^19 / 100 )
  dw 100,     100,      100,      100,      100,     100,      100,      100       // 100
  dw $199a,   $199a,    $199a,    $199a,    $199a,   $199a,    $199a,    $199a     // RoundUp( 2^16 / 10 )
  dd $30303030,         $30303030,          $30303030,         $30303030           // To bytewise / ASCII
end;


function IntS32ToStr( X: Longword ): UnicodeString; register;
var
  P, Q: PWideChar;
begin
  SetLength( Result, 11 );
  P := PWideChar( Pointer( Result ) );
  // Full string buffer and set the length of the string with no resizing!
  PLongword( ( NativeInt( Result ) - sizeof( Longword ) ) )^ := IntS32ToWide( X, P ) - P;
end;

function IntS32ToWide( X: Integer; P: PWideChar ): PWideChar;
{$if defined( CPUX86 )}
asm // eax = X, edx = P
  cmp  eax, 0
  jge  IntU32ToWide
  mov  word ptr [ edx ], Word( '-' )
  neg  eax
  lea  edx, [ edx + 2 ]
  jmp  IntU32ToWide
end;
{$else if defined( CPUX64 )}
asm // ecx = X, rdx = P
  .NOFRAME
  cmp  ecx, 0
  jge  IntU32ToWide
  mov  word ptr [ rdx ], Word( '-' )
  neg  ecx
  lea  rdx, [ rdx + 2 ]
  jmp  IntU32ToWide
end;
{$else}
begin
  if X >= 0 then begin
    Result := IntU32ToWide( Longword( X ), P );
  end else begin
    P^ := '-';
    Result := IntU32ToWide( Longword( -X ), P + 1 );
  end;
end;
{$ifend}

function IntU32ToWide( X: Longword; P: PWideChar ): PWideChar; register;
{$if defined( CPUX86 )}
asm
  cmp       eax, 100000000
  jb        @Medium
@Large:
  push      edx
  xor       edx, edx
  mov       ecx, 100000000
  div       ecx
  pop       ecx

  //        eax = high one or two digit value, edx = 8 digit value, ecx = pointer
  // Emit the first 2 digits
  mov       eax, dword ptr [ DigitsClippedW + eax * 4 ]
  mov       [ ecx ], eax
  cmp       eax, $10000
  setae     al
  movzx     eax, al
  lea       eax, [ eax * 2 + ecx + 18 ]

  //        edx = 8 digit value, ecx = pointer
  // Emit 8 follow digits
  movd      xmm1, edx          // xmm1 = Value
  movdqa    xmm0, dqword ptr [ IntUToStrConsts + Div10000_Shl45d ]
  pmuludq   xmm0, xmm1
  psrlq     xmm0, 45           // xmm0 = xmm1 div 10000
  pmuludq   xmm0, dqword ptr [ IntUToStrConsts + Shl16_minus_10000d ]
  paddd     xmm0, xmm1         // xmm0 = word( lo digits ), word( hi digit ), 0 (6x)
  psllq     xmm0, 2
  punpcklwd xmm0, xmm0
  punpckldq xmm0, xmm0         // xmm0 *= 4 (lo, lo, lo, lo, hi, hi, hi, hi)W (LSW, MSW)
  pmulhuw   xmm0, dqword ptr [ IntUToStrConsts + Div_1000_100_10_1w ]
  pmulhuw   xmm0, dqword ptr [ IntUToStrConsts + Shl_1000_100_10_1w ] // xmm0 = ( lo, lo div 10, lo div 100, lo div 100, (same with hi) )W
  movdqa    xmm2, dqword ptr [ IntUToStrConsts + Mul_10w ]            // xmm2 := xmm0 * 10; shift to left one word.
  pmullw    xmm2, xmm0
  psllq     xmm2, 16
  psubw     xmm0, xmm2         // Extract digits
  por       xmm0, dqword ptr [ IntUToStrConsts + To_ASCIIw ]          // Digits to ASCII
  shufps    xmm0, xmm0, $4E
  movdqu    [ eax - 16 ], xmm0    // And save 8 digits at once
  ret
@Medium:
  cmp       eax, 100
  jb        @Small
  //        eax 2..8 digits, edx = pointer
  // Emit 2..8 digits
  movd      xmm1, eax          // xmm1 = Value
  movdqa    xmm0, dqword ptr [ IntUToStrConsts + Div10000_Shl45d ]
  pmuludq   xmm0, xmm1
  psrlq     xmm0, 45           // xmm0 = xmm1 div 10000
  pmuludq   xmm0, dqword ptr [ IntUToStrConsts + Shl16_minus_10000d ]
  paddd     xmm0, xmm1         // xmm0 = word( lo digits ), word( hi digit ), 0 (6x)
  psllq     xmm0, 2
  punpcklwd xmm0, xmm0
  punpckldq xmm0, xmm0         // xmm0 *= 4 (lo, lo, lo, lo, hi, hi, hi, hi)W (LSW, MSW)
  pmulhuw   xmm0, dqword ptr [ IntUToStrConsts + Div_1000_100_10_1w ]
  pmulhuw   xmm0, dqword ptr [ IntUToStrConsts + Shl_1000_100_10_1w ] // xmm0 = ( lo, lo div 10, lo div 100, lo div 100, (same with hi) )W
  movdqa    xmm2, dqword ptr [ IntUToStrConsts + Mul_10w ]            // xmm2 := xmm0 * 10; shift to left one word.
  pmullw    xmm2, xmm0
  psllq     xmm2, 16
  psubw     xmm0, xmm2         // Extract digits

  movdqa    xmm1, dqword ptr [ IntUToStrConsts + To_ASCIIw ]          // Digits to ASCII
  por       xmm0, xmm1
  shufps    xmm0, xmm0, $4E
  // Now we have 8 Unicode characters in the xmm0 register in the correct order.
  pcmpeqw   xmm1, xmm0         // scan for zeroes.
  pmovmskb  eax, xmm1
  packuswb  xmm0, xmm0         // convert to bytes
  xor       eax, $FFFF         // change polarity
  bsf       eax, eax           // amount to shift in bytes.
  lea       ecx, [ eax * 4 ]
  movd      xmm1, ecx
  psrlq     xmm0, xmm1         // bytes shifted.
  pxor      xmm2, xmm2
  punpcklbw xmm0, xmm2
  neg       eax
  movdqu    dqword ptr [ edx ], xmm0
  lea       eax, [ edx + 16 + eax ]
  ret
@Small:
  //        eax 1..2 digits, edx = pointer
  // Emit one or two digits
  mov       eax, dword ptr [ DigitsClippedW + eax * 4 ]
  mov       [ edx ], eax
  cmp       eax, $10000
  setae     al
  movzx     eax, al
  lea       eax, [ edx + eax * 2 + 2 ]
end;
{$else if defined( CPUX64 )}
asm
  cmp       ecx, 100000000
  jb        @Medium
@Large:
  mov       r8,  rdx                  // r8 = pointer

  // Split up low 8 digits from high 1 or 2 digits..
  mov       eax, ecx
  mov       r9,  12379400392853802749 // RoundUp( 2^64+26 / 1e8 )
  mul       rax, r9
  shr       rdx, 26
  mov       r10, rdx                  // r10 = eax div 1e8
  mov       rax, rdx
  mov       r9,  100000000
  mul       rax, r9
  sub       ecx, eax                  // ecx = eax mod 1e8

  // Emit the first 2 digits
  lea       r9, [ DigitsClippedW ]
  mov       eax, dword ptr [ r9 + r10 * 4 ]
  mov       dword ptr [ r8 ], eax
  // advance pointer ( also for the next 8 bytes)
  cmp       eax, $10000
  setae     al
  movzx     rax, al
  lea       rax, [ rax * 2 + r8 + 2 + 16 ]

  // ecx = 8 digit value, r8 = pointer + 8
  movd      xmm1, ecx          // xmm1 = Value
  movdqa    xmm0, dqword ptr [ IntUToStrConsts + Div10000_Shl45d ]
  pmuludq   xmm0, xmm1
  psrlq     xmm0, 45           // xmm0 = xmm1 div 10000
  pmuludq   xmm0, dqword ptr [ IntUToStrConsts + Shl16_minus_10000d ]
  paddd     xmm0, xmm1         // xmm0 = word( lo digits ), word( hi digit ), 0 (6x)
  psllq     xmm0, 2
  punpcklwd xmm0, xmm0
  punpckldq xmm0, xmm0         // xmm0 *= 4 (lo, lo, lo, lo, hi, hi, hi, hi)W (LSW, MSW)
  pmulhuw   xmm0, dqword ptr [ IntUToStrConsts + Div_1000_100_10_1w ]
  pmulhuw   xmm0, dqword ptr [ IntUToStrConsts + Shl_1000_100_10_1w ] // xmm0 = ( lo, lo div 10, lo div 100, lo div 100, (same with hi) )W
  movdqa    xmm2, dqword ptr [ IntUToStrConsts + Mul_10w ]            // xmm2 := xmm0 * 10; shift to left one word.
  pmullw    xmm2, xmm0
  psllq     xmm2, 16
  psubw     xmm0, xmm2         // Extract digits
  por       xmm0, dqword ptr [ IntUToStrConsts + To_ASCIIw ]          // Digits to ASCII
  shufps    xmm0, xmm0, $4E
  movdqu    [ rax - 16 ], xmm0    // And save 8 digits at once
  ret
@Medium:
  cmp       ecx, 100
  jb        @Small
  //        eax 2..8 digits, rdx = pointer
  //        Emit 2..8 digits
  movd      xmm1, ecx          // xmm1 = Value
  movdqa    xmm0, dqword ptr [ IntUToStrConsts + Div10000_Shl45d ]
  pmuludq   xmm0, xmm1
  psrlq     xmm0, 45           // xmm0 = xmm1 div 10000
  pmuludq   xmm0, dqword ptr [ IntUToStrConsts + Shl16_minus_10000d ]
  paddd     xmm0, xmm1         // xmm0 = word( lo digits ), word( hi digit ), 0 (6x)
  psllq     xmm0, 2
  punpcklwd xmm0, xmm0
  punpckldq xmm0, xmm0         // xmm0 *= 4 (lo, lo, lo, lo, hi, hi, hi, hi)W (LSW, MSW)
  pmulhuw   xmm0, dqword ptr [ IntUToStrConsts + Div_1000_100_10_1w ]
  pmulhuw   xmm0, dqword ptr [ IntUToStrConsts + Shl_1000_100_10_1w ] // xmm0 = ( lo, lo div 10, lo div 100, lo div 100, (same with hi) )W
  movdqa    xmm2, dqword ptr [ IntUToStrConsts + Mul_10w ]            // xmm2 := xmm0 * 10; shift to left one word.
  pmullw    xmm2, xmm0
  psllq     xmm2, 16
  psubw     xmm0, xmm2         // Extract digits
  movdqa    xmm1, dqword ptr [ IntUToStrConsts + To_ASCIIw ]          // Digits to ASCII
  por       xmm0, xmm1
  shufps    xmm0, xmm0, $4E
  // Now we have 8 Unicode characters in the xmm0 register in the correct order.
  pcmpeqw   xmm1, xmm0         // scan for zeroes.
  pmovmskb  eax, xmm1
  packuswb  xmm0, xmm0         // convert to bytes
  xor       eax, $FFFF         // change polarity
  bsf       eax, eax           // amount to shift in bytes.
  lea       ecx, [ eax * 4 ]
  movd      xmm1, ecx
  psrlq     xmm0, xmm1         // bytes shifted.
  pxor      xmm2, xmm2
  punpcklbw xmm0, xmm2
  neg       rax
  movdqu    dqword ptr [ rdx ], xmm0
  lea       rax, [ rdx + 16 + rax ]
  ret
@Small:
  //        ecx 1..2 digits, rdx = pointer
  // Emit one or two digits
  lea       r9, [ DigitsClippedW ]
  mov       eax, dword ptr [ r9 + rcx * 4 ]
  mov       [ rdx ], eax
  cmp       eax, $10000
  setae     al
  movzx     rax, al
  lea       rax, [ rdx + rax * 2 + 2 ]
end;
{$else}
begin
  Assert( False, 'Not implemented.' );
end;
{$ifend}

Nice asm tricks. Putty we have to circumvent alignment by using a code block. — Arnaud Bouchez, Feb 22 '14 at 21:07
You really should point out that the code relies on SSE which may not be present under CPUX86. — David Heffernan, Feb 23 '14 at 07:04
Is it really worth all of this effort for *nano*seconds? How long did it take to write, test and debug this by comparison? — alcalde, Feb 23 '14 at 07:31
@DavidHeffernan Most CPUs these days have SSE. It would be great if 32 bit compiler generated SSE but sadly it does not. — user3323367, Feb 23 '14 at 09:47
@DavidHefferman: User said it is slow for 64bit hence I concluded he is using x64 and hence has SSE2 which this routine uses. For x86 your remark is valid. Both x64 and SSE2 originated in 2003; 11 years ago. But your right: I will alter the text to show it uses SSE2. — Ritsaert Hornstra, Feb 23 '14 at 11:30
@alcalde: The library with int, int64, float conversion routines hand coded and all the unit tests took me a few days. This routine is only one of them.. The library is part of an (not yet fully finished) JSON serializer/deserializer. It does so by avoiding memory reallocations where possible (David said it more bluntly but it's true, heap allocations kill these kind of conversions). And now for the JSON objects I have the speed of conversion is in the 250-300MB/sec on my laptop. It will be used by a lot of people here in time critical software so it is still worth it to optimize this far. — Ritsaert Hornstra, Feb 23 '14 at 11:41
@alcalde optimising this code is critical for large scale JSON, XML, YAML etc. production. I recently did something similar for my YAML emitter. I did not use x64 asm or SSE. All I did was convert int to static char array (local variable for instance) and got very significant gains. — David Heffernan, Feb 23 '14 at 11:54
@user: yes, It will be MPL-ed as soon as I am confident it is fully tested for all routines and when my new site will be up and running — Ritsaert Hornstra, Feb 23 '14 at 17:56
@DavidHeffernan FPC has `-CfSSE42` support for `SSE4.2` yet Delphi is still in the `x87` days. Plus 3 levels of optimizations. I am not sure how Embarcadero can still sell for such high price yet the compiler is utter rubbish. — user3323367, Feb 23 '14 at 21:56
@RitsaertHornstra Your JSON serializer/deserializer sounds just like the one used by mORMot. No heap allocation during the process: even the JSON text is decoded in-place, with an ending #0, and a pointer to the UTF-8 content is returned (this is one of the reasons why our framework uses UTF-8 everywhere, and not a `string` type). See http://blog.synopse.info/post/2011/06/02/Fast-JSON-parsing - we have a SAX approach, but also a DOM approach, the later using either classes, or records, or a custom kind of variant (`TDocVariant`) which will use much less memory allocations than alternatives. — Arnaud Bouchez, Feb 24 '14 at 13:19
@Arnaud: We first used Superobject. We use it for small documents of data (up to 10s kb of data). Given the memory usage of that solution and the performance let me to pursue a different JSON (de)ser solution. I tried JSON -> BSON and then use BSON as the internal format but it has the problematic trait that it uses #0 terminated names and hence you need scanning of the string names. Thus own own parser was born. I will look at your syn... code in more detail the coming days, see if I can get some ideas and perhaps use the syn... units in my own projects. — Ritsaert Hornstra, Feb 24 '14 at 16:46
@RitsaertHornstra So take a look at our `TDocVariant` and `TBsonVariant` types. I'm finishing the [SynMongoDB.pas unit](http://synopse.info/fossil/finfo?name=SynMongoDB.pas) which is able to create or consumme BSON content from JSON, `TDocVariant` values or directly. It supports BSON/MongoDB types (e.g. ObjectID, RegEx, Date..) directly, also with [MongoDB extended JSON syntax](http://docs.mongodb.org/manual/reference/mongodb-extended-json). Of course, there is a *MongoDB* client, more complete than *TMongoWire*. Speed and memory use have been optimized. I'll finish it in the coming days. — Arnaud Bouchez, Feb 27 '14 at 21:31
@RitsaertHornstra Possible to write a SSE4.1 or SSE3 compatible version? Not a ASM expert myself. :) — user3060326, Apr 28 '14 at 11:15
@user3060326: it is possible but you will not win that much: biggest win is using memory buffers instead of strings and replacing the divides with equivalent multiplications+shifts (the reason why the 32 bit variant is substantianlly slower), you *might* shave of two clockcyles on the 8 digit variant when using SSE4.1 but usually you see less than that. SSE2 is always compatible with SSE3 and SSE4.1 so this version is compatible too. — Ritsaert Hornstra, May 06 '14 at 14:52
@RitsaertHornstra If you patch the RTL version with this one there will be a BIG speed up. For the entire application. So on newer CPU's etc it will run this much faster. :) — user3060326, May 06 '14 at 15:07
@RitsaertHornstra Make a demo how to use `IntS32ToWide` with `PWideChar` — John Lewis, Jan 07 '15 at 17:42

Arnaud Bouchez · Answer 2 · 2014-02-22T21:05:27.660

5

In SynCommons.pas, you have also the following function:

function IntToString(Value: integer): string;
var tmp: array[0..15] of AnsiChar;
    P: PAnsiChar;
begin
  P := StrInt32(@tmp[15],Value);
  Ansi7ToString(PWinAnsiChar(P),@tmp[15]-P,result);
end;

I suspect it will be also fast, even on Win64 platform. Slower than asm, but fast enough for small numbers (which tends to be most of the integer in the wild).

There will be only one memory allocation in this function, which is pretty fast even on Win64, thanks to the updated version of FastMM4, which has its own optimized x64 asm.

edited Feb 22 '14 at 21:05

answered Feb 22 '14 at 20:57

Arnaud Bouchez

42,305
3
71
159

Does SynCommons.pas have String -> AnsiString optimized function also? – user3323367 Feb 22 '14 at 21:20
1

Yes of course. It is faster than the Windows Api, which is used by the Delphi RTL. – Arnaud Bouchez Feb 23 '14 at 08:18

score 2 · Answer 3 · answered Feb 24 '14 at 11:53

In my opinion, the key way to improve performance is to avoid heap allocations. The time spent by IntToStr doing the allocations is greater than the time spent doing the decimal conversion. And if you are wanting to use multiple threads then this is even more important because the default Delphi memory manager does not scale well under thread contention.

It's true that the decimal conversion can also be optimised, but I always try to optimise by picking off the low-hanging fruit first.

So, for the sake of completeness, in case these functions prove useful to others, here are my routines for heap allocation free integer to string conversion:

procedure DivMod(Dividend, Divisor: Cardinal; out Quotient, Remainder: Cardinal);
{$IFDEF CPUX86}
asm
        PUSH  EBX
        MOV   EBX,EDX
        XOR   EDX,EDX
        DIV   EBX
        MOV   [ECX],EAX
        MOV   EBX,Remainder
        MOV   [EBX],EDX
        POP   EBX
end;
{$ELSE IF Defined(CPUX64)}
asm
        .NOFRAME
        MOV   EAX,ECX
        MOV   ECX,EDX
        XOR   EDX,EDX
        DIV   ECX
        MOV   [R8],EAX
        MOV   [R9],EDX
end;
{$ELSE}
  {$Message Error 'Unrecognised platform.'}
{$ENDIF}

{$IFOPT R+}
  {$DEFINE RANGECHECKSON}
  {$R-}
{$ENDIF}
{$IFOPT Q+}
  {$DEFINE OVERFLOWCHECKSON}
  {$Q-}
{$ENDIF}

// disable range checks and overflow checks so that abs() functions in case Value = low(Value)

function CopyIntegerToAnsiBuffer(const Value: Integer; var Buffer: array of AnsiChar): Integer;
var
  i, j: Integer;
  val, remainder: Cardinal;
  negative: Boolean;
  tmp: array [0..15] of AnsiChar;
begin
  negative := Value<0;
  val := abs(Value);
  Result := 0;
  repeat
    DivMod(val, 10, val, remainder);
    tmp[Result] := AnsiChar(remainder + ord('0'));
    inc(Result);
  until val=0;
  if negative then begin
    tmp[Result] := '-';
    inc(Result);
  end;
  Assert(Result<=Length(Buffer));

  i := 0;
  j := Result-1;
  while i<Result do begin
    Buffer[i] := tmp[j];
    inc(i);
    dec(j);
  end;
end;

function CopyInt64ToAnsiBuffer(const Value: Int64; var Buffer: array of AnsiChar): Integer;
var
  i, j: Integer;
  val, remainder: UInt64;
  negative: Boolean;
  tmp: array [0..23] of AnsiChar;
begin
  negative := Value<0;
  val := abs(Value);
  Result := 0;
  repeat
    DivMod(val, 10, val, remainder);
    tmp[Result] := AnsiChar(remainder + ord('0'));
    inc(Result);
  until val=0;
  if negative then begin
    tmp[Result] := '-';
    inc(Result);
  end;
  Assert(Result<=Length(Buffer));

  i := 0;
  j := Result-1;
  while i<Result do begin
    Buffer[i] := tmp[j];
    inc(i);
    dec(j);
  end;
end;

{$IFDEF RANGECHECKSON}
  {$R+}
  {$UNDEF RANGECHECKSON}
{$ENDIF}
{$IFDEF OVERFLOWCHECKSON}
  {$Q+}
  {$UNDEF OVERFLOWCHECKSON}
{$ENDIF}

My use case requires an array of AnsiChar, but it is of course simple to amend these functions to populate WideChar arrays.

Can this be used with `PWideChar/PAnsiChar` instead of `Array of AnsiChar`? — John Lewis, Jan 08 '15 at 20:31
Sure it could. You'd pass the address of the first character in the buffer, and the buffer length to ensure that there was no overrun. Obviously you'd need to change the signature of the functions! FWIW, the other answers miss the point completely in my view. The real bottleneck is the heap allocation. — David Heffernan, Jan 08 '15 at 20:34
While a great and interesting answer, it's a shame every answer in this thread is focused on intToStr, while StrToInt, and StrToIntDef are also part of the question, and noticeably slow when used in a loop. — T.S, Oct 04 '19 at 13:32
@T.S. Is there much scope for improvement in performance for string to int? That seems less likely to me. — David Heffernan, Oct 04 '19 at 14:17
@DavidHeffernan difficult to say. I'd be all for seeing a deeper analysis on it, even the exact code. I Imagine it runs on a character by character basis in a loop,. hence why it's slow. — T.S, Oct 08 '19 at 09:52
@T.S I was not aware that `StrToInt` is slow. Can you share your benchmark? — David Heffernan, Oct 08 '19 at 10:18

faster alternative to InttoStr/StrToInt?

3 Answers3

Linked