0

I need to match ordered (not sorted) sets of strings separated by /'s, I guess we can call them sentences, but each sentence is stored as one big string. I have a set of sentences to match against. When given an input sentence I need to determine if it matches with one of the sentences in my set. The run time should depend on the length of the input sentence rather than the number of possible sentences to match with. It also needs to support wildcards, so a wildcard would be represented as X in my matching set, but in the input string this can be anything, but only 1 word.

An example:

Set of sentences:

Hello/my/name/is/paul
Hello/I/am/sam
What/time/is/it/X
Sometimes/the/tree/X/is/green/other/times/it/is/yellow
Garfield/is/my/favorite/cat

Input:

Hello/my/name/is/paul

Would return true since it matches.

Input:

What/time/is/it/now

Sometimes/the/tree/definitely/is/green/other/times/it/is/yellow

Would both return true because of the wildcard

Input:

What/time/is/it/right/now

What/is/your/name

Would both return false, the first comes close, but would need two wildcards. The second doesn't match with anything.

Thanks for any help!

zulusam
  • 185
  • 1
  • 8
  • It's not so much an algorithm as a data structure that will allow you to achieve the desired time complexity. Put the sentences in a trie: https://en.wikipedia.org/wiki/Trie – m69's been on strike for years Jul 23 '17 at 03:10
  • Ok, this was my original thought. But wouldn't the runtime then be dependent on the matching set? Because this would affect how large the trie is. Also, the nodes would not be individual letters, like a normal trie, they would be full words. I thought tries were able to achieve this runtime because each node was limited to a set of 26. – zulusam Jul 23 '17 at 03:13
  • @m69 [this](https://stackoverflow.com/questions/45258067/what-is-the-runtime-of-my-algorithm) would be my solution using a trie. But it seems like the runtime of a search would depend on the size of the trie itself, not necessarily the input. – zulusam Jul 23 '17 at 03:22
  • If you have a relatively small amount of the "sentences", you might use a [regex with alternations](https://regex101.com/r/AmrdNw/1), however, this won't be an efficient approach if you do not optimize the alternative branches (the first two start with `Hello`, and that is bad, each alternative should match at different positions in a string). – Wiktor Stribiżew Jul 23 '17 at 08:44
  • 1
    @m69 So even if we don't count building up the trie, the lookup will still be dependent on the size of the trie itself, and not the input? I know previously you were thinking it would depend on the input. – zulusam Jul 23 '17 at 13:40
  • @WiktorStribiżew don't want to use regular expressions for this, because then my lookup runtime becomes dependent on the # of sentences in my matching set. It needs to be dependent on the input sentence size. Any idea how to do this with an n-gram? – zulusam Jul 23 '17 at 13:41
  • 1
    Indeed; However, there's a difference between theoretical worst case and actual average case. Is this an exercise or a real-world problem? You could code the try and see how it performs with real input. – m69's been on strike for years Jul 23 '17 at 17:32
  • @m69 it's an exercise. I need to find a lookup algorithm who's run time depends on the number of words in the input sentence rather than the number of sentences in the matching set. Also given the wildcards. Been banging my head against it for quite some time now. Someone had suggested I use n-grams but it's not entirely clear how I'd do that. – zulusam Jul 23 '17 at 18:08
  • Maybe the trie is good enough? Whether a trie with 26 letters is more efficient than a trie with a (limited?) set of words is a somewhat academic discussion anyway. – m69's been on strike for years Jul 23 '17 at 18:19
  • @m69 I'm thinking you're right. After the trie is built I think each lookup's runtime should depend on the depth it has to search. Though I guess this is more of a directory tree, not exactly a trie. Appreciate the help! – zulusam Jul 23 '17 at 19:23
  • @m69 I got so close with the trie implementation!! I'm having one small issue, maybe you'd be able to take a look? https://stackoverflow.com/questions/45271346/trie-implementation-with-wildcard-values – zulusam Jul 24 '17 at 01:13
  • 1
    Once you find that "member/friends" doesn't lead to a solution, you'll have to backtrack to "member" and then try "member/X". Try searching for "parsing" and "backtracking". (Or you could use recursion, but that's probably going to be more complicated.) – m69's been on strike for years Jul 24 '17 at 01:47
  • @m69 Yes! Once again, thanks so much. I got it working by keeping track of the last time it saw two possible paths and if it found a dead end on one it would go back to the point I stored and try the other path. New algorithm was added to my question if you want to see it. Thank you!! – zulusam Jul 24 '17 at 03:01

0 Answers0