0

I have a character vector in which each entry looks like this:

"ABC1:123_CDE/CDE"

I would like to write a regular expression that matches ALL and ONLY characters trailing "_" so that I would get:

ABC1:123

I tried "^_$|[CDE/]" but that seems to select the initial C as well.

I read somewhere that lookbehind can be used in R if you set perl = TRUE, but I'm not super familiar with Perl regular expression matching either.

Many thanks, and apologies if there is something obvious I'm missing

amon
  • 57,091
  • 2
  • 89
  • 149
user3245575
  • 83
  • 2
  • 9

3 Answers3

1

You can use a split method without regex since you are looking for a literal character:

(Perl)

my @res = split('_', $str, 2);
print $res[0];

(R language)

strsplit("ABC1:123_CDE/CDE", "_", TRUE)[[1]][1]
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • 3
    OP is using R, not Perl. (It's just Perl-compatible regexes) – amon Jan 28 '14 at 17:35
  • The strsplit command worked in R. Does [[1]][1] tell R to consider only the first half of the string? If I use [[1]][2] it seem to take only take the second part of the string – user3245575 Jan 30 '14 at 02:05
  • @user3245575: When you split a string you obtain an array of strings. `[1]` is for the first item of the array, `[2]` for the second, ... With the string `A_B_C_D` `[3]` will return `C` and `[4]` `D`. – Casimir et Hippolyte Jan 30 '14 at 03:12
1
sub("_.*", "", "ABC1:123_CDE/CDE")
#[1] "ABC1:123"
eddi
  • 49,088
  • 6
  • 104
  • 155
  • Thanks, this worked perfectly. Could you explain to me step by step what each character is doing? – user3245575 Jan 30 '14 at 01:58
  • @user3245575 not much going on - it just matches the underscore and then anything after. I suggest reading up on [regex in R](http://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html) if that's still unclear. – eddi Jan 30 '14 at 16:04
0

Match anything before a _

.*(?=_)
Srb1313711
  • 2,017
  • 5
  • 24
  • 35
  • 1
    Won't work with a string of the form `ABC_DEF_GHI`. You'd rather want `.*?(?=_)` if you use that regex format. – Jerry Jan 28 '14 at 17:33
  • Doesnt that depend on whether you want all characters before the first or last occurence of '_'? – Srb1313711 Jan 29 '14 at 09:38