-1

I used some word counting algorithm and by a closer look I was wondering because I got out less words than originally in the text because they count for example "it's" as one word. So I tried to find a solution but without any success, so I asked myself if their exist anything to transform a "short word" like "it's" to their "base words", say "it is".

Cœur
  • 37,241
  • 25
  • 195
  • 267
Flu
  • 111
  • 10

2 Answers2

0

Well, basically you need to provide a data structure that maps abbreviated terms to their corresponding long versions. However, this will not be as simple as it sounds, for example you won't want to transform "The client's car." to "The client is car."

To manage these cases, you will probably need a heuristic that has a deeper understanding of the language you are processing and the grammar rules it incorporates.

user1438038
  • 5,821
  • 6
  • 60
  • 94
  • The example you mentioned is the reason why I am asking myself if such a tool already exist ;) – Flu Nov 27 '14 at 14:19
  • Spell checking API may provide such features, but I suppose they will only use them internally. Did you check how open source tools such as OpenOffice implement word counting? – user1438038 Nov 27 '14 at 14:28
  • Thanks I will check the API – Flu Nov 28 '14 at 10:15
  • I tried the spell checker api's found here: http://stackoverflow.com/questions/14268998/google-spell-api-and-tinymce however they see it is and it's as correct as it is ;) So spell checkers can not be used for that purpose unfortunately – Flu Nov 28 '14 at 10:29
0

I just built this from scratch for the challenge. It seems to be working on my end. Let me know how it works for you.

public static void main(String[] args) {

    String s = "it's such a lovely day! it's really amazing!";

    System.out.println(convertText(s));
    //output: it is such a lovely day! it is really amazing!

}

public static String convertText(String text) {
    String noContraction = null;
    String replaced = null;
    String[] words = text.split(' ');

    for (String word : words) {
        if (word.contains("'s")) {
            String replaceAposterphe = word.replace("'", "$");
            String[] splitWord = replaceAposterphe.split('$');
            noContraction = splitWord[0] + " is";
            replaced = text.replace(word, noContraction);
        }
    }
    return replaced;
}

I did this in C# and tried to convert it into Java. If you see any syntax errors, please point them out.

Drew Kennedy
  • 4,118
  • 4
  • 24
  • 34
  • Thanks but this algorithm has the same problem it is converting for example "The client is car." to "The client is car." – Flu Nov 28 '14 at 10:15
  • I think you meant to say "The client's car" to "The client is car." Something like that, as mentioned by user1438038, would need an algorithm that is built in such a way to know the difference between possessive's and plurals. I'm sure that can be done, but that would be one crazy algorithm. – Drew Kennedy Nov 28 '14 at 14:34