0

For the broader scope of my challenge-at-hand, I have been looking for a usable InDesign GREP statement to help automate the conversion of serial commas from the non-Oxford style to the Oxford comma (and vice versa).

For the purposes of simplifying this question for the SE community I will limit the challenge to using GREP to FIND non-Oxford style commas.

I have used two expressions, both of which draw in too much content:

  1. from my own development:
    ,(.*?),(.*?) and (.*?)
    

Result:

, and the possibility of detecting signs of life, biomarkers, occupies the thoughts of many researchers. Such measurements may be possible with the next generation of large aperture optical telescopes. Looking back to the Big Bang, we are now on the verge of measuring the impact of gravitational waves generated by quantum effects during the Inflationary epoch, an era when the Universe expanded at unprecedented rates. Aside from generating important new knowledge, modern astronomy both utilizes and

  1. from an InDesign user forum:

    (?<=\w,)(.+)(.*?)(?= and)

Result:

and the possibility of detecting signs of life, biomarkers, occupies the thoughts of many researchers. Such measurements may be possible with the next generation of large aperture optical telescopes. Looking back to the Big Bang, we are now on the verge of measuring the impact of gravitational waves generated by quantum effects during the Inflationary epoch, an era when the Universe expanded at unprecedented rates. Aside from generating important new knowledge, modern astronomy both utilizes

As you can see neither result picks up a serial comma and both are too broad to be effective.

I've struggled with this, but can not find a reasonable solution on the Web. I thought I would ply the great minds of the SE community, including coders of regex and users of other GREP tools.

I thank you in advance for your time.

Parapluie
  • 714
  • 1
  • 7
  • 22
  • What about `,\s*\w+(?:\s+\w+)*,\s*\w+(?:\s+\w+)* and\b`? Have you got any specifications? Test cases? – Wiktor Stribiżew Oct 10 '16 at 19:42
  • I don't spot the Oxford comma(s) in your sample phrases. Can you point them out? – Jongware Oct 10 '16 at 20:45
  • @WiktorStribiżew Wiktor, I think your code narrows the scope down the best, so far; but it's still picking up phrases and other such. I suspect that English is just too complex to allow automation to fix everything. Still, your having narrowed the text down will help with my problem. Thanking you. – Parapluie Oct 11 '16 at 13:55
  • @RadLexus Rad, the sample is a serial list with no Oxford comma, to which the comma _would_ be added before the final "and". – Parapluie Oct 11 '16 at 13:55

1 Answers1

0

I think the problem is too wide, and to find a 100% working solution is next to impossible. To get rid of the most evident false positives, you may use the following pattern:

,\s*\w+(?:\s+\w+)*,\s*\w+(?:\s+\w+)* and\b

Or, replace \w with \p{L} to only match letters:

,\s*\p{L}+(?:\s+\p{L}+)*,\s*\p{L}+(?:\s+\p{L}+)* and\b

See the regex demo.

Details:

  • , - a comma
  • \s* - 0+ whitespaces
  • \p{L}+ - 1+ letters
  • (?:\s+\p{L}+)* - zero or more sequences of 1+ whitespaces and 1+ letters
  • ,\s*\p{L}+ - the same as above
  • (?:\s+\p{L}+)* - the same as above
  • +and\b - 1+ spaces followed with and as a whole word (\b is a word boundary).

This can be further enhanced to fit more specific contexts.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563