Streamtokenizer to read very large numbers?

Question

I have to take an input containing large numbers of order 10^9 in Java. How do I handle Inputs fast? Also since streamtokenizer.nval gives a double, how can I read larger values??

that wont help in taking the input. Neither in converting from double to BigInteger. — Renegade403, Oct 06 '13 at 22:30
1. You can take the input as a `String` an pass it to the `BigDecimal` constructor; 2. Converting from `double` to `BigInteger` doesn't make sense, because _it's for integers_ and because so big values are not representable with a `double`, so you would convert a wrong value. — BackSlash, Oct 06 '13 at 22:34
@BackSlash BigInteger is not necessary – even a 32-bit int can handle numbers of order 10^9. — ntoskrnl, Nov 20 '13 at 21:00

Stefan Haustein · Answer 1 · 2015-09-06T13:49:02.543

Before parsing, reset the tokenizer syntax table and initialize it to recognize numbers as words:

StreamTokenizer tokenizer = new StreamTokenizer(r);
tokenizer.resetSyntax();

tokenizer.whitespaceChars(0, 32);

tokenizer.wordChars('0', '9');
tokenizer.wordChars('-', '.');
tokenizer.wordChars('+', '+');
tokenizer.wordChars('a', 'z');
tokenizer.wordChars('A', 'Z');
tokenizer.wordChars(0xa0, 0xff); // not really needed here. */
tokenizer.slashSlashComments(true);
tokenizer.slashStarComments(true);

tokenizer.quoteChar('"');
tokenizer.quoteChar('\'');

Then, when encountering a word, you check whether it is parseable as a number (a bit crude here, but it shows the general idea):

...
case StreamTokenizer.TT_WORD:
  if ("true".equals(tokenizer.sval)) {
    result = Boolean.TRUE;
  } else if ("false".equals(tokenizer.sval)) {
    result = Boolean.FALSE;
  } else if ("null".equals(tokenizer.sval)) {
    result = null;
  } else {
    try {
      result = Long.parseLong(tokenizer.sval);
    } catch(NumberFormatException e) {
      try {
        result = Double.parseDouble(tokenizer.sval);
      } catch (NumberFormatException e2) {
        throw new IllegalStateException(
           "Unexpected token: " + tokenizer.toString());
      }
   }
 }
 tokenizer.nextToken();
 break;

Whether this works depends on the use case: If you want to parse expressions (and not just JSON as in my case), you probably don't want to set + or - as word characters, but the general idea should still work by treating them as unary operators and detecting constants at a later stage.

Streamtokenizer to read very large numbers?

1 Answers1