1

I have thousands of large strings that I need to compare / match against another set of hundreds of smaller phrases and words, to see if the phrases are contained in the large String.

What is the quickest way of doing this? Do I just use a String.indexOf(...), or String.matches(regularExpression), or do I go down to the byte level etc etc.

(all matches must be case insensitive; both "HI" and "hi" phrases must be found in a String "Hi there".)

Any tips?

Edit: by "quickest", I mean in terms of performance.

user85116
  • 4,422
  • 7
  • 35
  • 33

3 Answers3

2

A Trie/Prefix Tree or a Radix Tree is most likely what you are looking for.

Asgeir
  • 1,092
  • 1
  • 7
  • 11
1

I will probably consider using aho-corasick or a prefix tree for such task.
this question was already asked in this post Java: Matching Phrases in a String

Community
  • 1
  • 1
VirtualTroll
  • 3,077
  • 1
  • 30
  • 47
0

What is the quickest way of doing this? Do I just use a String.indexOf(...), or String.matches(regularExpression), or do I go down to the byte level etc etc.

Definitly not regex if you want performance. nor byte level: Java uses unicode, byte processing could be very awkward. String.indexOf() seems reasonable

(all matches must be case insensitive; both "HI" and "hi" phrases must be found in a String "Hi there".)

I'd implement that by lowercasing both the text and the search String. (Once you have the offsets you can get the original Match text from the original String)

Sean Patrick Floyd
  • 292,901
  • 67
  • 465
  • 588