-2

What's the easiest way in Java for parsing a string to extract numbers written in natural language? For example, I'd like to extract the number in I have thirty three apples. The number should be low (less than fifty) and will be in french (so dix sept for example).

Is there already in the JDK or another lib an enum like the month one or something similar to do that easily?

Fla
  • 536
  • 6
  • 23

1 Answers1

0
import com.sun.deploy.util.StringUtils;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class FrenchTranslator {
private static final String EMPTY_SPACE = " ";
private static final Map<String, Integer> frenchNumbers = new HashMap<String, Integer>() {{
    put("un", 1);
    put("deux", 2);
    put("trois", 3);
    put("quatre", 4);
    put("cinq", 5);
}};

public static void main(String[] args) {
    String frenchSentence = "J'ai cinq tomates.";
    System.out.println("INPUT: " + frenchSentence);
    List <String> words = extractWordsFromFrenchSentence(frenchSentence);
    String translatedSentence = createTranslatedSentence(words);
    System.out.println("OUTPUT:" + translatedSentence);
}

private static List <String> extractWordsFromFrenchSentence(String frenchSentence){
    return Arrays.asList(StringUtils.splitString(frenchSentence, EMPTY_SPACE));
}

private static String createTranslatedSentence( List <String> words){
   StringBuilder translatedSentence = new StringBuilder();
   words.forEach(word -> {
       if(frenchNumbers.containsKey(word)){
           translatedSentence.append(EMPTY_SPACE + frenchNumbers.get(word));
       }
       else {
           translatedSentence.append(EMPTY_SPACE + word);
       }
   });
   return translatedSentence.toString();
}
}

How the algorithm works:

INPUT: J'ai cinq tomates.

OUTPUT: J'ai 5 tomates

From what I understood, you want to detect a written number in a french sentence.

I don't know your precise requirements but in order to help you I have written an algorithm which translates written French numbers (from 1-5) into natural numbers and then recreates the original sentence.

SebastianJ
  • 95
  • 1
  • 10
  • I was wondering if it exists a class in the JDK or elsewhere (a library?) which could do that and avoid me to initialize a Map of many strings (for example, [Month](https://docs.oracle.com/javase/8/docs/api/java/time/Month.html) is an already existing enum with the months (but in english). Moreover, you example is based on space, so the number "dix sept" for example wouldn't work (that's why I put thirty three as an example in the english sentence). – Fla Jan 08 '19 at 16:16
  • For months it's possible if you use Locale classes. https://memorynotfound.com/java-get-list-month-names-locale/ but I'm not aware of libraries that stock French numbers. – SebastianJ Jan 08 '19 at 16:24