4

Test link: regexr.com/42d9c

This has been driving me crazy.

I want to extract barcodes in the lines below:

Ceres Juice Apricot 12 x 1lt unit: 6001240102022
Ceres Juice Cranberry& Kiwi Juice 12 x 1lt... unit:
Ceres Juice Guava 12 x 1lt.. unit:6001240222829
Ceres Juice Orange 12x1lt... unit:
Ceres Juice Medley of Fruits 1L x 12 unit: 6001240100660

It should return:

6001240102022

6001240222829

6001240100660

I correctly use .*(\d{13}).*

And then I use $1 for it to return the first match

But my results look like this:

6001240102022
Ceres Juice Cranberry& Kiwi Juice 12 x 1lt... unit:
6001240222829
Ceres Juice Orange 12x1lt... unit:
6001240100660

Cause:

The cause of this problem is that 'Replace' returns the original string if there is nothing in the match group ($1).

Workaround:

Ensure that there is a 'match' on every single line, and put this into Match Group 1 ($1). Then put your actual match into Match Group 2 ($2). How to do this?

Language/Platform:

Any. I have tried all online Regex websites and also Notepad++

Richard Woolf
  • 559
  • 2
  • 6
  • 19

1 Answers1

6

You may add an alternative that matches any string,

.*(\d{13}).*|.*

The point is that the first alternative is tried first, and if there are 13 consecutive digits on a line, the alternative will "win" and .* won't trigger. $1 will hold the 13 digits then. See the regex demo.

Alternatively, an optional non-capturing group with the obligatory digit capturing group:

(?:.*(\d{13}))?.*

See the regex demo

Here, (?:.*(\d{13}))? will be executed at least once (as ? is a greedy quantifier matching 1 or 0 times) and will find 13 digits and place them into Group 1 after any 0+ chars other than linebreak chars. The .* at the end of the pattern will match the rest of the line.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Many thanks Wiktor. You have put me out of my misery :) And I have learnt some useful stuff here. However, I'm wondering if your terminology is 100% because I think technically your solution is causing **matches** (not just groups!) for every line, whereas my solution was not: my solution failed to even 'match' on certain lines. You also speak of a 'Group 2' when there is not any. Nonetheless I am very happy and grateful for your help. Thank you! – Richard Woolf Nov 02 '18 at 13:42
  • @RichardWoolf Sure, there is only Group 1, I fixed that part in my explanation. Matches, yes, but there is one capturing group here and I use the verb to "capture" when a pattern wrapped in a pair of unescaped parentheses matches some part of a string. – Wiktor Stribiżew Nov 02 '18 at 13:46
  • Also, thank you for demonstrating a capturing-group inside a non-capturing group; when I tried this before your help, I thought it was not possible – I thought the non-capturing part was superior to / had authority over the inner-group. – Richard Woolf Nov 02 '18 at 13:48