12

I have a string containing a number in a non-ascii format e.g. unicode BENGALI DIGIT ONE (U+09E7) : "১"

How do I parse this as an integer in .NET?

Note: I've tried using int.Parse() specifying a bengali culture format with "bn-BD" as the IFormatProvider. Doesn't work.

James McCormack
  • 9,217
  • 3
  • 47
  • 57

3 Answers3

5

You could create a new string that is the same as the old string except the native digits are replaced with Latin decimal digits. This could be done reliably by looping through the characters and checking the value of char.IsDigit(char). If this function returns true, then convert it with char.GetNumericValue(char).ToString().

Like this:

static class DigitHelper
{
    public static string ConvertNativeDigits(this string text)
    {
        if (text == null)
            return null;
        if (text.Length == 0)
            return string.Empty;
        StringBuilder sb = new StringBuilder();
        foreach (char character in text)
        {
            if (char.IsDigit(character))
                sb.Append(char.GetNumericValue(character));
            else
                sb.Append(character);
        }
        return sb.ToString();
    }
}


int value = int.Parse(bengaliNumber.ConvertNativeDigits());
Jeffrey L Whitledge
  • 58,241
  • 9
  • 71
  • 99
  • Don't forget that in many of these cultures they will be formatted Right-to-Left ;) – James McCormack May 26 '11 at 16:07
  • @ʞɔɐɯɹoↃɔW sǝɯɐſ - RTL should not be an issue, because Unicode defines the proper (logical) digit sequence to be most-significant to least-significant for all scripts. This is described in the Unicode Standard 6.0 in §2.5 (p.15). – Jeffrey L Whitledge May 26 '11 at 16:09
  • Note that you will still want to use the correct culture to do the numeric conversion, since decimal, grouping, currency, and sign characters my differ. – Jeffrey L Whitledge May 26 '11 at 16:26
  • Thanks for that, very interesting. I had no idea that in arabic script, they read words RtL but numbers LtR. Makes my brain ache! This article was also good: http://www.i18nguy.com/MiddleEastUI.html#answer – James McCormack May 27 '11 at 10:26
3

It looks like this is not possible using built in functionality:

The only Unicode digits that the .NET Framework parses as decimals are the ASCII digits 0 through 9, specified by the code values U+0030 through U+0039.

...

The attempts to parse the Unicode code values for Fullwidth digits, Arabic-Indic digits, and Bengali digits fail and throw an exception.

(emphasis mine)

Very strange as CultureInfo("bn-BD").NumberFormat.NativeDigits does contain them.

Community
  • 1
  • 1
Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • Yeah, I saw that: one of the worst titled and most pointless articles on MSDN, right? Looks like a golden opportunity for someone to write a helper method and charge a meeelion dollars for it :D – James McCormack May 26 '11 at 15:57
  • @ʞɔɐɯɹoↃɔW sǝɯɐſ - Yeah. Though seeing as the array returned from `NativeDigits` is in the correct order (I think... don't read Bengali), a quick helper shouldn't be too difficult :) – Oded May 26 '11 at 16:01
0

Having found this question while looking for a similar answer, but not finding any of the answers quite matched what I needed, I wrote the following as it treats signs okay, and is faster to fail if given a very long string. It does not though, ignore any grouping characters like ,, ', , though that could be easily added if someone wanted (I didn't):

public static int ParseIntInternational(this string str)
{
  int result = 0;
  bool neg = false;
  bool seekingSign = true; // Accept sign at beginning only.
  bool done = false; // Accept whitespace at beginning end or between sign and number.
                     // If we see whitespace once we've seen a number, we're "done" and
                     // further digits should fail.
  for(int i = 0; i != str.Length; ++i)
  {
    if(char.IsWhiteSpace(str, i))
    {
      if(!seekingSign)
        done = true;
    }
    else if(char.IsDigit(str, i))
    {
      if(done)
        throw new FormatException();
      seekingSign = false;
      result = checked(result * 10 + (int)char.GetNumericValue(str, i));
    }
    else if(seekingSign)
      switch(str[i])
      {
        case '﬩': case '+':
          //do nothing: Sign unchanged.
          break;
        case '-': case '−':
          neg = !neg;
          break;
        default:
          throw new FormatException();
      }
    else throw new FormatException();
  }
  if(seekingSign)
    throw new FormatException();
  return neg ? -result : result;
}
Jon Hanna
  • 110,372
  • 10
  • 146
  • 251