2

I'm trying to extract a UTM from a Google link using , but my regex doesn't seem to work properly.

Here an example of a google link :

xxx/yyy?utm_medium=display&utm_source=ogury&utm_campaign=TOTO&zzz=coco

I tried the following regex to extract TOTO:

.+&utm_campaign=([[a-z]]+)&.+

with no success.

If someone can help, thanks!

Jared Smith
  • 19,721
  • 5
  • 45
  • 83
Alex Paris
  • 21
  • 1
  • 2

4 Answers4

2

In your pattern, [[a-z]]+ is a malformed bracket expression, because it matches any char from the [[a-z] bracket expression (any lowercase ASCII letter or [) and then matches one or more ] chars. You meant to use single [ and ] here.

You may use sub with the following regex:

sub(".*[&?]utm_campaign=([^&]+).*", "\\1", s)

See the regex demo.

Details

  • .* - any 0+ chars, as many as possible
  • [&?] - a ? or &
  • utm_campaign= - a literal substring
  • ([^&]+) - Capturing group 1: one or more chars other than & chars
  • .* - any 0+ chars, as many as possible

The \1 is the replacement backreference that puts the contents of Group 1 into the result.

See the R demo:

s <- "xxx/yyy?utm_medium=display&utm_source=ogury&utm_campaign=TOTO&zzz=coco"
sub(".*[&?]utm_campaign=([^&]+).*", "\\1", s)
## => [1] "TOTO"
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

You could use:

(?:&utm_campaign=)(\w+)

and use the first group captured

Try it Online

Matheus Cuba
  • 2,068
  • 1
  • 20
  • 31
  • Note that in case `utm_campaign` is the first query string param, it will have `?` in front, so `(?:&utm_campaign=)(\w+)` might not work in all cases. Besides, note that `(?:&utm_campaign=)` = `&utm_campaign=`. – Wiktor Stribiżew Jun 27 '18 at 20:46
0

You are searching for [[a-z]]+ however TOTO is uppercase, so not between 'a' and 'z'. You can update it to [[A-Za-z]]+ to match any case letter.

EDIT: [[A-Za-z]]+ will match any case letter, but will also match any '[' or ']' characters. If you do not wish to match these then you can change it to [A-Za-z]+ to only match any case letters

Jacob Boertjes
  • 963
  • 5
  • 20
0

Here's a regex string that'll match the value of a utm_campaign parameter, regardless of its position in the query string.

(?<TOTO>(?<=utm_campaign=).*?(?=&|$))

Explanation:

  • ?<TOTO> captures the result into a TOTO key after the regex is executed

  • (?<=utm_campaign=) is a look-behind that will ensure that the value is preceded by utm_campaign=

    • .*? will find the parameter value (i.e. TOTO). The reason for the ? is lazy evaluation - it will only search until the next rule is matched (see point below)

    • (?=&|$) is a look-ahead that will match either an & or the end of the string (in the case that utm_campaign is the last parameter)

Joobsies
  • 33
  • 5