0

I have a set of InDesign documents with records in the following format -

{item_id}. {item_text} [{tags}] (options)  
{item_id}. {item_text} [{tags}] (options)  
{item_id}. {item_text} [{tags}] (options)

where item_id is an integer id, item_text consists of ( multi-line text block ) , tags consists of single-line text block and tags are optional in a record, i.e. they might be there or not.

So, now for selecting 1 group of items (including id, text, tags, options) I am trying the following regex:

item = '(([0-9])+\\.\\s+)(\\s|.|\\r)*?(?=[0-9]+\\.\\s)'  
item_text = '[0-9]+\\.\\s+((.|\\r|\\s)*)*?(?=\\[(.)*\\])'  
tags = '\\[((.)*)\\]' 

here, we are extracting group 1 in item_text, tags regex for the required data.

So, now with this I am able to get the first n-1 records correctly, but the last record is not getting selected since it is not able to find the following id block for the last record i.e. this part of the regex for item - (?=[0-9]+\.\s)

Can someone suggest a better regex to capture all such records including the last one. [We are using these regexp in extendscript for InDesign scripting, so support for Positive, Negative Lookbehinds, Lookaheads is available in the application.]

Harshit Laddha
  • 2,044
  • 8
  • 34
  • 64
  • Can you use a regex like `'([0-9]+\\.)\\s+((?:.|\\r)*?)\\s+(\\[.*?\\])\\s+(.*)'` to match item_id, item_text, tags and options as four groups and then assign the groups to variables? – Johannes Riecken Jul 15 '17 at 07:16
  • Just replace `(?=[0-9]+\\.\\s)` in the first pattern with `(?=[0-9]+\\.\\s|$)`. To make the patterns efficient, replace `(\\s|.|\\r)*?` with `[\\s\\S]*?` and replace `(.)*` with `(.*)` – Wiktor Stribiżew Jul 15 '17 at 15:38

0 Answers0