0

I am trying to match multiple phrases between known words.

Essentially, I want to parse out what the user filled in inside the brackets: Get information for [name] for [duration] for [location]. I want to obtain what the user entered for name, duration and location. It is okay if they only enter the name and not the duration and location. Or if they entered name and duration but not location. Just parse out whatever they entered, if anything.

So, suppose the user entered statement is:

  1. Get information for John -> I want to parse out John.
  2. Get information for John Doe for last 6 months -> I want to parse out John Doe, last 6 months.
  3. Get information for John Doe for last 6 months for Earth -> I want to parse out John Doe, last 6 months, Earth.

My best attempt so far is:

Get information for (.+?(?=for|$))(?:for)?(.+?(?=for|$))?(?:for)?(.*)

EDIT: it is not necessary that "for" is the differentiating word.

For example, consider the phrase:

Get information about [name] for [location] in [duration].

and suppose the user enters:

Get information about John Doe for Earth in last month.

Now if I still split on "for", the code won't work.

So I need a generic solution.

EDIT 2:

in a generic sense the question is, if unknown and known phrases are interleaved, how do I parse out the unknown phrases? For example:

known phrase 1 unknown phrase 1 known phrase 2 unknown phrase 2 and so on.

To make matters worse, sometimes the known phrases are the same (in my example, the known phrase is "for"). Hence I can't simply try to get the unknown string between two known strings using something like, knownPhrase1(.*)knownPhrase2

How do I determine the unknown phrases?

  • With your asterisks and comas it's hard to tell what you literally want here. It would help if you put the input string and output strings inside backticks. Did you mean "Given `John Doe for last 6 months`, I want to parse out `John Doe`, and `last 6 months`." – Faust Sep 06 '22 at 17:29
  • @Faust I updated my question to better express what I want. You are right. If user enters, "Get Information on John Doe for last 6 months", I want to parse out "John Doe" and "last 6 months" – everCurious1 Sep 06 '22 at 17:40
  • @anubhava because in this example the user entered phrase is between "for". However, it doesn't have to be. It is just an example. – everCurious1 Sep 06 '22 at 17:41
  • 1
    *"is just an example"*: but you have it hard-coded in your own regex? If it is not that, please update your question and explain what the possible patterns of input are and how they should be split. – trincot Sep 06 '22 at 17:46
  • @trincot updated the question. – everCurious1 Sep 06 '22 at 18:15

1 Answers1

0

You can just use split(/\sfor\s/) for achieving that, use slice(1) to get all pieces except the first one "Get information".

const parsedData = (str) => {
  return str.split(/\sfor|about|in\s/).slice(1).join(', ');
}

console.log(parsedData('Get information for John'))
console.log(parsedData('Get information for John Doe for last 6 months'))
console.log(parsedData('Get information for John Doe for last 6 months for Earth'))
Mina
  • 14,386
  • 3
  • 13
  • 26
  • Hi @Mina, thank you for your answer. Please see my update on the question. It is not necessary that "for" is the keyword being split on – everCurious1 Sep 06 '22 at 18:19
  • 1
    You need to know the separator keywords, I assumed that separator keywords are `for` or `about` or `in`, I update the answer, please check. – Mina Sep 06 '22 at 18:27
  • I am not a big fan of this solution. It doesn't account for the separator itself being present in the input etc. – everCurious1 Sep 06 '22 at 18:41
  • So, why you are not a big fan of this solution? – Mina Sep 06 '22 at 18:44
  • it will break in case the separating word is present in the user input, for one. – everCurious1 Sep 06 '22 at 18:54