0

I am dealing with an algorithm approach to finding keywords into my string. I have a HashSet contains almost 1 million keys in it and I want to replace all of these keys with blank into my sentences. My problem is this, when i have 1000 sentences with 10 word in it, it will become 10.000 words totally.
What is the best approach to search keywords into sentences here?

    Set<String> keywords;//1.000.000 entry
    for(int i =0;i<textModel.length;i++){//1.000 entry almost
            String[] splitted = textModel[i].getText().split(" ");
            for (int j = 0; j < splitted.length; j++) {
                if(keywords.contains(splitted[j]){//?
                     splitted[j] = "";// ??
                }
            }
        }

Is this approach OK? or should i use an text search algorithm for it?

Abdullah Tellioglu
  • 1,434
  • 1
  • 10
  • 26
  • Can you assure that the keywords should match only whole words. Your algorithm will not find subwords and overlapping matches. If this is acceptable your algorithm is quite fast. – CoronA Apr 21 '17 at 14:29

1 Answers1

0

I think if you're really worried about time in your search you should use the Boyer-Moore algorithm.When there are mismatches the algorithm skips over X amount of characters/words in your case. enter image description here

But it allows for something called the Good suffix rule where a portion of a word is found and that portion that parallels with another substring aligns until it finds a pair. enter image description here

But if time isn't an issue, your algorithm should be good.

no name
  • 72
  • 6
  • Boyer-Moore is not fitting for a set of on million keywords to search for. One should better consider an indexed search or the Aho-Corasick-Algorithm. – CoronA Apr 21 '17 at 14:25