I am trying to match multiple phrases between known words.
Essentially, I want to parse out what the user filled in inside the brackets: Get information for [name] for [duration] for [location]. I want to obtain what the user entered for name, duration and location. It is okay if they only enter the name and not the duration and location. Or if they entered name and duration but not location. Just parse out whatever they entered, if anything.
So, suppose the user entered statement is:
- Get information for John -> I want to parse out John.
- Get information for John Doe for last 6 months -> I want to parse out John Doe, last 6 months.
- Get information for John Doe for last 6 months for Earth -> I want to parse out John Doe, last 6 months, Earth.
My best attempt so far is:
Get information for (.+?(?=for|$))(?:for)?(.+?(?=for|$))?(?:for)?(.*)
EDIT: it is not necessary that "for" is the differentiating word.
For example, consider the phrase:
Get information about [name] for [location] in [duration].
and suppose the user enters:
Get information about John Doe for Earth in last month.
Now if I still split on "for", the code won't work.
So I need a generic solution.
EDIT 2:
in a generic sense the question is, if unknown and known phrases are interleaved, how do I parse out the unknown phrases? For example:
known phrase 1 unknown phrase 1 known phrase 2 unknown phrase 2 and so on.
To make matters worse, sometimes the known phrases are the same (in my example, the known phrase is "for"). Hence I can't simply try to get the unknown string between two known strings using something like, knownPhrase1(.*)knownPhrase2
How do I determine the unknown phrases?