4

I have a regex designed to extract the git URL from a CocoaPods definition.

The input text is as follows:

pod 'Alamofire', :git => 'https://github.com/Alamofire/Alamofire.git', :branch => 'dev'

The regex is as follows:

(?<=('Alamofire'.*:git => '))[A-Za-z:/\.]+(?=('{1}))

This regex works correctly on RegexR, see here, however when trying to initialise NSRegularExpression with it, a error is thrown with code 2048 saying the pattern is invalid. Usually this is due to a lack of escapes but there are none here. I can't work out what the problem is even after trawling the ICU regex docs which is the engine iOS uses.

Any ideas would be well received, TIA.

rmaddy
  • 314,917
  • 42
  • 532
  • 579
Jacob King
  • 6,025
  • 4
  • 27
  • 45

2 Answers2

1

The look-behind assertion in NSRegularExpression is limited and does not support the * or + operators in it:

i.e. the .* part in (?<=('Alamofire'.*:git => '))

(?<= ... )

Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)

Ref: https://developer.apple.com/documentation/foundation/nsregularexpression

You just want the url so simply pattern match only that part and no need for the look-behind assertion in the first place.

staticVoidMan
  • 19,275
  • 6
  • 69
  • 98
1

You can't use patterns of unknown length in lookbehind patterns with ICU regular expressions. Your pattern contains .* in the lookbehind, so it is an invalid ICU regexp (see the length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.) ICU lookbehind documentation part).

There are two ways:

  • Replace .* with .{0,x} where x is the max number of chars you expect to separate the left-hand pattern from the right-hand pattern, ICU regex lookbehinds allow the limiting (or interval, range) quantifier, that is why they are also called "constrained-width")
  • Re-vamp your pattern to use a consuming pattern instead of lookarounds, wrap the part you need to extract with capturing parentheses and modify your code to grab Group 1 value.

Here is Approach 2, which is recommended:

let str = "pod 'Alamofire', :git => 'https://github.com/Alamofire/Alamofire.git', :branch => 'dev'"
let rng = NSRange(location: 0, length: str.utf16.count)
let regex = try! NSRegularExpression(pattern: "'Alamofire'.*:git\\s*=>\\s*'([^']+)'")
let matches = regex.matches(in: str, options: [], range: rng)
let group1 = String(str[Range(matches[0].range(at: 1), in: str)!])
print(group1) // => https://github.com/Alamofire/Alamofire.git

See the regex demo, the green highlighted substring is the value you get in Group 1.

Pattern details:

  • 'Alamofire' - a literal string
  • .* - any 0+ chars other than line break chars, as many as possible (replace with .*? to match as few as possible)
  • :git - a literal substring
  • \s*=>\s* - a => substring wrapped with 0+ whitespaces
  • '([^']+)' - ', then a capturing group #1 matching 1+ chars other than ' and then a ' char.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563