0

Suppose now we have:

String[] = {
   "Name:John, State:MA, City:Boston, Degree:Master",
   "Name:Alex, State:CA, City:San Diego, Degree:PhD",
   "Name:Aaron, State:NY, City:NYC, Degree:Master",
   "Name:Lily, State:MA, City:Worcester, Degree:Master",
}

How I'd like to find ALL that contain both "State:MA" and "Degree:Master"; so obviously that'll be line 1 and 4.

So it looks like SQL database query but I need to implemented using Java or Python.

Also, the input data is supposed be very big, so I'm actually considering more efficient ways like Trie to store the information.

But usually Trie is supposed for prefix string question; say, given a list of strings we'd like to find all strings that contain pattern he, so final list could be like:

he, hell, help, hello....

While for my question, the two patterns they are not continuous together; but Trie indeed can save lots of space for big input.

So any ideas to solve such multiple pattern matching using Trie? Or other data structures I don't know?

Thanks

LookIntoEast
  • 8,048
  • 18
  • 64
  • 92
  • A suffix-tree might be more appropriate. Tries can be very memory-inefficient in non-native environments like Java and Python where you can't dictate the precise memory layout of the tree. – Dai Jan 04 '18 at 23:44
  • Looks like a duplicate question: https://stackoverflow.com/questions/954752/search-for-strings-matching-the-pattern-abcxyz-in-less-than-on – Puterdo Borato Jan 09 '18 at 10:56

1 Answers1

0

For inspiration you may look at these classes. You'd better start with samples first. The approach is kind of hybrid of a trie and FSA. You'll have to implement the logic for preparing patterns on your own. Also you'd have to take care of the order of results when multiple patterns match your string.

Puterdo Borato
  • 388
  • 4
  • 19