4

I have an input string that looks something like this:

HLI6Ch60000Ch500C0Ch46400Ch30000Ch21888Ch10E79CS07LCU3Ch37880Ch27800Ch16480CS8CA00000000000000000000

Now I don't care about the part that follows the last letter A, it'll always be A and exactly 20 numbers that are of no use to me. I do, however, need the part before the last letter A, and ideally, I'd need it to be separated into two different captures, just like this:

1: HLI6Ch60000Ch500C0Ch46400Ch30000Ch21888Ch10E79CS07
2: LCU3Ch37880Ch27800Ch16480CS8C

The only way to identify these matches is that they end with characters CS followed by two hexadecimal characters. I thought that a regular expression like (.+?CS.{2})+ (or (.+?CS[[:xdigit:]]{2})+) would do the job but when tried on www.regex101.com, it only captures the last group and gives the following warning:

Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data

Which I thought suggests that I should use regular expression like ((.+?CS.{2})+) instead and I mean – sure, now I get two captures, but they look like this:

1: HLI6Ch60000Ch500C0Ch46400Ch30000Ch21888Ch10E79CS07LCU3Ch37880Ch27800Ch16480CS8C
2: LCU3Ch37880Ch27800Ch16480CS8C

Meaning the first one is… slightly longer than I'd like it to be. If it helps in any way, I should point out that the final regular expression will be part of an iOS application so an instance of NSRegularExpression class will be used – not sure if that's a helpful information at all, it's just that I know that NSRegularExpression doesn't support every part of the world of regular expressions.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Cellane
  • 511
  • 1
  • 6
  • 16

3 Answers3

4
(.+?CS.{2})

You can direclty use this.See demo.Grab the group or capture.

https://regex101.com/r/vD5iH9/68

vks
  • 67,027
  • 10
  • 91
  • 124
2

It doesn't seem like you need a capturing group at all:

(?:(?!CS[0-9A-F]{2}).)+CS[0-9A-F]{2}

will match all strings that end in CS + 2 hex digits.

Test it live on regex101.com.

Explanation:

(?:                # Start a group.
 (?!CS[0-9A-F]{2}) # Make sure we can't match CSff here,
 .                 # if so, match any character.
)+                 # Do this at least once.
CS[0-9A-F]{2}      # Then match CSff.
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • That feels almost like magic, amazing – thank you, that works as well. I might have to resort to using vks' solution though (although yours is probably faster to process, if I'm to guess?) due to its readability in case we need to make adjustments later. Still, thank you so much for your time! – Cellane Feb 04 '15 at 10:04
1

Change your regex to,

(.+?CS[[:xdigit:]]{2})

DEMO

You don't need to put the regex inside another capturing group and make it to repeat one or more times. Just print the group index 1 to get your desired output.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • Ah, damn, I'm so sorry, I forgot to mention that there might be more HLI6/LCU3 parts before the final `A.{20}` part, not always just two! – Cellane Feb 04 '15 at 09:58
  • @Cellane you only says that it ends with `[:xdigit:]]{2}` two xdigit characters. – Avinash Raj Feb 04 '15 at 10:05
  • Oh right, I failed at reading, again ._. So yours is the same as vks' in that case, upvoted and thank you so much for your time! I can't believe I haven't tried this. – Cellane Feb 04 '15 at 10:08
  • @Cellane note that i'm the first to post tgis regex. You wrongly accepted an answer which was posted **2 mins** after mine. So vks's answer is same as mine. Not my answer is same as vks's. – Avinash Raj Feb 04 '15 at 10:14
  • Oh! You're absolutely right, I failed not only at reading but also at noticing that. Fixed. – Cellane Feb 04 '15 at 10:21