1

I would like to split this text. I am trying to do it with JavaScript regular expression.

(1) Really not. (2) Uh huh. (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, for instance.

I would like to parse it to groups of fragments. I am looking for one of these results.

[
  [1, "Really not."],
  [2, "Uh huh."],
  [3, "Behold Prince"],
]


[
  {id: 1, text: "Really not."},
  {id: 2, text: "Uh huh."},
  {id: 3, text: "Behold Prince"},
]

I use this pattern.

/\(([0-9])\){1,3}(.+?)\(/g

Could you help me, please? What pattern should I use to split the text properly?

Thank you in advance!

  • You can use `array.map` – Vivek Bani Jun 02 '21 at 12:09
  • Why is the result supposed to contain only the first three of those “elements”, what about the rest of your input string? – CBroe Jun 02 '21 at 12:22
  • ... just as a side note; all purely *regex* and `matchAll` based approaches which rely on an opening paren for detecting the termination of a text fragment will fail as soon as an opening paren does occur as part of the text content (due to being an allowed character). A more simple `split` / `reduce` based approach is more reliable for covering such an edge case. – Peter Seliger Jun 02 '21 at 13:46
  • You should select an answer if it solved your problem by the way. –  Jun 15 '21 at 15:40

3 Answers3

3

You can use regex and string.matchAll function in javascript to do what you want

const str = `(1) Really not. (2) Uh huh. (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, for instance.`;

let array = [...str.matchAll(/\(([0-9]+)\)\s*(.*?)\s*(?=$|\()/g)].map(a=>[+a[1],a[2]])

console.log(array)

I updated my answer using The fourth bird's regex because it is alot cleaner than the regex I wrote.

2

Instead of matching the ( you can assert it or either the end of the string.

This part \){1,3} means repeating the closing parenthesis 1-3 times.

If you want to match 1-3 digits:

\(([0-9]+)\)\s*(.*?)\s*(?=$|\()
  • \( Match (
  • ([0-9]+) Capture 1+ digits in group 1 (Denoted by m[1] in the code)
  • \) Match )
  • \s* Match optional whitespace chars
  • (.*?) Capture as least as possible chars in group 2 (Denoted by m[2] in the code)
  • \s* Match optional whitespace chas
  • (?=$|\() Assert either the end of string or ( to the right

Regex demo

const regex = /\(([0-9]+)\)\s*(.*?)\s*(?=$|\()/g;
const str = `(1) Really not. (2) Uh huh. (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, for instance.`;
console.log(Array.from(str.matchAll(regex), m => [m[1], m[2]]));
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
1

... an approach based on matchAll as well as on RegExp which uses named capture groups and a positive lookahead ... /\((?<id>\d+)\)\s*(?<text>.*?)\s*(?=$|\()/g ...

// see ... [https://regex101.com/r/r39BoJ/1]
const regX = (/\((?<id>\d+)\)\s*(?<text>.*?)\s*(?=$|\()/g);

const text = "(1) Really not. (2) Uh huh. (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, for instance."

console.log([
  ...text.matchAll(regX)
  ].map(
    ({groups: { id, text }}) => ({ id: Number(id), text })
  )
);
.as-console-wrapper { min-height: 100%!important; top: 0; }

Note

The above approach does not cover the occurrence (allowed existence) of an opening paren/( within a text fragment. Thus, in order to always be on the save side, the OP should consider a split / reduce based approach ...

const text = "  (1) Really not. (2) Uh (huh). (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, (for instance).  "

console.log(
  text
    .split(/\s*\((\d+)\)\s*/)
    .slice(1)
    .reduce((list, item, idx) => {
      if (idx % 2 === 0) {
        list.push({ id: Number(item) });
      } else {
        // list.at(-1).text = item;
        list[list.length - 1].text = item.trim();
      }
      return list;
    }, [])
);

// test / check ...
console.log(
  'text.split(/\s*\((\d+)\)\s*/) ...',
  text.split(/\s*\((\d+)\)\s*/)
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
Peter Seliger
  • 11,747
  • 3
  • 28
  • 37