0

If "Who acted as (?P<role>.*) in (?P<movie>.*)" is the template I want to match for queries like "Who acted as tony montana in Scarface".

If the role name has a "in" here or If the movie name has an "in", the regex match will go wrong.

Eg: "Who acted as k in men in black" will give "k in men" as role.

May be a non greedy approach will work for this query but it will go for a toss if the movie contains the word "in". How do I get all possible interpretations here?

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
nizam.sp
  • 4,002
  • 5
  • 39
  • 63
  • 1
    `Who acted as (?P.*?) in (?P.*)` works for your input. – Avinash Raj Nov 25 '14 at 11:48
  • 1
    This isn't a regex-writing service; have a go at modifying it yourself: http://regex101.com/r/uD0yR9/1 – jonrsharpe Nov 25 '14 at 11:48
  • 4
    Well, formally, `Who acted as man in elevator in women in red` has three possible interpretations. I don't think this is solvable using only regexes. – georg Nov 25 '14 at 11:52
  • @georg: Thank you for getting my problem statement correctly. Is there any clever implementation in python that will backpropagate and find all interpretations? – nizam.sp Nov 25 '14 at 12:13

1 Answers1

0

Given a phrase like 'a in b in c in d' this will generate all possible partitions by the word in:

words = phrase.split()

for n, w in enumerate(words):
    if w == 'in':
        print '(%s) in (%s) ' % (
            ' '.join(words[:n]),
            ' '.join(words[n+1:]))

For your specific problem, if there are three ins in the phrase, the "middle" interpretation ((a in b) in (c in d)) would be most probably correct, but with two ins there's no way to solve this by the means of text manipulations, because "left" and "right" partitions are equally probable, consider:

Who acted as jeebs in men in black
Who acted as woman in red in matrix

You'll have to use NLP or database-driven methods to parse this correctly.

georg
  • 211,518
  • 52
  • 313
  • 390