5

Now this is a tricky problem for which I'm not able to figure out a good solution. Suppose we have a String in Java:- "He ate 3 apples today." Now the digit 3 can be easily identified in Java using isNumeric function or using regular expressions. But what if I have a String like: "He ate three apples today."? How can I identify that three is actually a number? I used OpenNlp and used its POS tagger but the time it takes to do is really too much! Can anyone suggest a better solution for this? Also among the ".bin" of OpenNlp, there is one file-"num.bin", but I don't know how to use this file. OpenNlp documentation also say nothing about it. Can anyone tell me if this is exactly what I've been looking for, and if yes then how to use it.

/*********************************************************************************************************************************/ I'm actually short of time here, so I've settled on a temporary solution here. Make a file/dictionary and take all the entries in a hashtable. Then I'll tokenize my sentence and check word by word for numbers, similar to what you guys suggested. I'll keep on updating the file as and when required. Thanks for your valuable suggestions guys, and if you have got something better than this I'd be really glad. OpenNlp implements this in a very good way, the only problem with it is time complexity and I want to do this in minimum time possible.

Manan Pancholi
  • 103
  • 1
  • 8
  • Do you need to only detect numbers and only in english language? If yes, I do not think you need some library for that. It will be much easier to write that manually. – bezmax Jan 23 '12 at 12:12
  • yes...numbers only....but any kind of numbers, decimals, fractions, percentage, soccer score like 3-1 should also be detected......for English language only... – Manan Pancholi Jan 23 '12 at 12:14

2 Answers2

1

Create a dictionary of numbers. Search for elements from that dictionary in the text.

Check asympotic complexity, it may be cheaper to sort the text first.

A T
  • 13,008
  • 21
  • 97
  • 158
  • yes that's possible, but then how big the dictionary will become? text like hundred, forty-third, fifty, and so on....I'll have to add all kind of those possible words in the dictionary..... – Manan Pancholi Jan 23 '12 at 12:17
  • Not a dictionary then, use a weighted directional graph. – A T Jan 23 '12 at 12:21
0

You have to keep all that words in arrays and then use it. Here is an example how to convert number to string. It may help you... I think you have to split your text into words and check if a word is a number (three). If yes check the next word because it can be say "million", then check the next word and so on. It's not easy and seems like a little library.I think you'll spend a lot of time writing this. Or try to search in google for a library like this. Maybe someone have already got this problem, wrote a library and shares it for free )) Good luck.

shift66
  • 11,760
  • 13
  • 50
  • 83