10

I noticed decimal.Parse(number, NumberStyles.AllowDecimalPoint, CultureInfo.InvariantCulture) is about 100% slower than custom decimal parse method based on Jeffrey Sax's code from Faster alternative to Convert.ToDouble

public static decimal ParseDecimal(string input) {
    bool negative = false;
    long n = 0;

    int len = input.Length;
    int decimalPosition = len;

    if (len != 0) {
        int start = 0;
        if (input[0] == '-') {
            negative = true;
            start = 1;
        }

        for (int k = start; k < len; k++) {
            char c = input[k];

            if (c == '.') {
                decimalPosition = k +1;
            } else {
                n = (n *10) +(int)(c -'0');
            }
        }
    }

    return new decimal(((int)n), ((int)(n >> 32)), 0, negative, (byte)(len -decimalPosition));
}

I assume that is because native decimal.Parse is designed to struggle with number style and culture info.

However, above mentioned method doesn't use 3rd parameter hi byte in new decimal so it won't work with larger numbers.

Is there a faster alternative to decimal.Parse to convert string that consists only of numbers and decimal dot to decimal which would work with large numbers?

EDIT: Benchmark:

var style = System.Globalization.NumberStyles.AllowDecimalPoint;
var culture = System.Globalization.CultureInfo.InvariantCulture;
System.Diagnostics.Stopwatch s = new System.Diagnostics.Stopwatch();
s.Reset();
s.Start();
for (int i=0; i<10000000; i++)
{
    decimal.Parse("20000.0011223344556", style, culture);
}
s.Stop();
Console.WriteLine(s.Elapsed.ToString());

s.Reset();
s.Start();
for (int i=0; i<10000000; i++)
{
    ParseDecimal("20000.0011223344556");
}
s.Stop();
Console.WriteLine(s.Elapsed.ToString());

output:

00:00:04.2313728
00:00:01.4464048

Custom ParseDecimal is in this case significantly faster than decimal.Parse.

Community
  • 1
  • 1
LukAss741
  • 771
  • 1
  • 7
  • 24
  • Should negative numbers still be considered? – Willem Van Onsem Jun 05 '16 at 20:24
  • WillemVanOnsem Van Onsem, in my current use case I don't need negative numbers. However, this is a general question that may be useful for anyone so it would be better if it supported negative numbers. – LukAss741 Jun 05 '16 at 20:36
  • 12
    Decimal.Parse() is as fast as it needs to be, written in C++ and built into the OS for the past 20 years. You can only speed it up by cutting corners. You are not explicit enough what kind of bugs you do find acceptable. – Hans Passant Jun 05 '16 at 22:59
  • I can't imagine that computing a large binary value equivalent to the input value, and then converting that to decimal as the last step, is faster than constructing the decimal number directly. I'd guess that multiplying n by *16* would make more sense; then n contains the BCD equivalent of the number. Converting that to an actual decimal value should be pretty easy in machine code; dunno about in C#. If a long isn't long enough, then use it for the first 16 decimal digits and do something more expensive when there are more than 16; that will be pretty rare in practice anyway. – Ira Baxter Jun 06 '16 at 03:03
  • 1
    Statistical distribution of input values is your friend if you want to optimize. For example, if most input values came in as 1 character strings, you can write a much simpler/faster bit of conversion code (you don't even have to deal with the sign character) for that case. If negative values are uncommon, code that converts positive-only values will be faster; you might have to test for the sign once, but if not present, you don't have to test its value again. I'd look at your input value distribution and see if you can't take advantage of it. – Ira Baxter Jun 06 '16 at 03:06
  • You can also consider unrolling the loop and starting at the right point for a N-digit number to get rid of the loop overhead. This will work especially well if you know the number doesn't contain a decimal point, or if you know where the decimal point is by prescanning the string (string searches are really fast on x86 boxes). – Ira Baxter Jun 06 '16 at 03:14
  • Is this run in `release` or `debug` mode ? – Fabjan Jun 06 '16 at 11:08
  • I added a benchmark which shows the custom method is significantly faster. – LukAss741 Jun 06 '16 at 11:09
  • 2
    IMHO, this belongs to [codereview.se] – Thomas Ayoub Jun 06 '16 at 11:37
  • Also, I wouldn't use your parser with `"-0.SOME_CHARS"` as input – Thomas Ayoub Jun 06 '16 at 11:40
  • Thomas Ayoub, any other characters than 1x "-" at the begining, 1x "." and numbers are not supported. Only json like decimal number 123.456789 – LukAss741 Jun 06 '16 at 11:44
  • @IraBaxter keep in mind, that decimal in .net is not stored as BCD, but as 96bit int plus decadic exponent. – Antonín Lejsek Jun 08 '16 at 00:15
  • @AntonínLejsek: Ah. No, didn't realize that. OK, multiply by 10 is the right thing to do. My error. – Ira Baxter Jun 08 '16 at 01:42
  • On my PC, it's only 2x faster. Plus, if you use all that in real code (if you do other things around, not only doing benchmarking), the difference tends to disappear almost completely. Jeffrey Sax' code is more or less a "Parse a 64 bits, cast to decimal and move floating point", you'll have hard time to really do better than the CLR (which is internally using unsafe code/pointers) with a full 128bits decimal. – Simon Mourier Jun 08 '16 at 13:48
  • @Simon Mourier my program keeps downloading loads of json with numbers all the time and this would be apparently a good optimization. ALso if I add a simple if length of input string is less than something like 20 characters then it should use this custom parse method otherwise it would use normal parse then it will be still a significant performance improvement as almost all input numbers are less than 20 characters long. – LukAss741 Jun 08 '16 at 14:15
  • yep, that's basically what I meant. decimal parsing is probably nothing compared to JSON parsing in general. If I was looking at that kind of optimization, I would also reconsider JSON itself (20000.0011223344556 takes for example 19 bytes on the wire, more than the raw decimal, 16 bytes) – Simon Mourier Jun 08 '16 at 14:34

2 Answers2

5

Thanks for all your comments which gave me a little more insight. Finally I did it as follows. If input is too long then it separates input string and parses first part using long and the rest with int which is still faster than decimal.Parse.

This is my final production code:

public static int[] powof10 = new int[10]
{
    1,
    10,
    100,
    1000,
    10000,
    100000,
    1000000,
    10000000,
    100000000,
    1000000000
};
public static decimal ParseDecimal(string input)
{
    int len = input.Length;
    if (len != 0)
    {
        bool negative = false;
        long n = 0;
        int start = 0;
        if (input[0] == '-')
        {
            negative = true;
            start = 1;
        }
        if (len <= 19)
        {
            int decpos = len;
            for (int k = start; k < len; k++)
            {
                char c = input[k];
                if (c == '.')
                {
                    decpos = k +1;
                }else{
                    n = (n *10) +(int)(c -'0');
                }
            }
            return new decimal((int)n, (int)(n >> 32), 0, negative, (byte)(len -decpos));
        }else{
            if (len > 28)
            {
                len = 28;
            }
            int decpos = len;
            for (int k = start; k < 19; k++)
            {
                char c = input[k];
                if (c == '.')
                {
                    decpos = k +1;
                }else{
                    n = (n *10) +(int)(c -'0');
                }
            }
            int n2 = 0;
            bool secondhalfdec = false; 
            for (int k = 19; k < len; k++)
            {
                char c = input[k];
                if (c == '.')
                {
                    decpos = k +1;
                    secondhalfdec = true;
                }else{
                    n2 = (n2 *10) +(int)(c -'0');
                }
            }
            byte decimalPosition = (byte)(len -decpos);
            return new decimal((int)n, (int)(n >> 32), 0, negative, decimalPosition) *powof10[len -(!secondhalfdec ? 19 : 20)] +new decimal(n2, 0, 0, negative, decimalPosition);
        }
    }
    return 0;
}

benchmark code:

const string input = "[inputs are below]";
var style = System.Globalization.NumberStyles.AllowDecimalPoint | System.Globalization.NumberStyles.AllowLeadingSign;
var culture = System.Globalization.CultureInfo.InvariantCulture;
System.Diagnostics.Stopwatch s = new System.Diagnostics.Stopwatch();
s.Reset();
s.Start();
for (int i=0; i<10000000; i++)
{
    decimal.Parse(input, style, culture);
}
s.Stop();
Console.WriteLine(s.Elapsed.ToString());

s.Reset();
s.Start();
for (int i=0; i<10000000; i++)
{
    ParseDecimal(input);
}
s.Stop();
Console.WriteLine(s.Elapsed.ToString());

results on my i7 920:

input: 123.456789

00:00:02.7292447
00:00:00.6043730

input: 999999999999999123.456789

00:00:05.3094786
00:00:01.9702198

input: 1.0

00:00:01.4212123
00:00:00.2378833

input: 0

00:00:01.1083770
00:00:00.1899732

input: -3.3333333333333333333333333333333

00:00:06.2043707
00:00:02.0373628

If input consists only of 0-9, . and optionally - at the begining then this custom function is significantly faster for parsing string to decimal.

LukAss741
  • 771
  • 1
  • 7
  • 24
0

Sax's method is fast for two reasons. The first, you already know. The second, is because it is able to take advantage of the very efficient 8-byte long data type for n. Understanding this method's use of the long, can also explain why (unfortunately) it is not currently possible to use a similar method for very large numbers.

The first two parameters: lo and mid in the decimal constructor use 4 bytes each. Together this is the same amount of memory as the long. This means there is no space left to keep going once you hit the max value for a long.

To utilize a similar method you would need a 12 byte data type in place of the long. This would provide you with the extra four bytes needed to utilize the hi parameter.

Sax's method is very clever, but until someone writes a 12 byte data type, you are just going to have to rely on decimal.Parse.

Brandon Griffin
  • 348
  • 1
  • 8