3

I need to align formatting of some clinical trial IDs two merge two databases. For example, in database A patient 123 visit 1 is stored as '123v01' and in database B just '123v1'

I can match A to B by grep match those containing 'v0' and strip out the trailing zero to just 'v', but for academic interest & expanding R / regex skills, I want to reverse match B to A by matching only those containing 'v' followed by only 1 digit, so I can then separately pad that digit with a leading zero.

For a reprex:

string <- c("123v1", "123v01", "123v001")

I can match those with >= 2 digits following a 'v', then inverse subset

> idx <- grepl("v(\\d{2})", string)
> string[!idx]
[1] "123v1"

But there must be a way to match 'v' followed by just a single digit only? I have tried the lookarounds

# Negative look ahead "v not followed by 2+ digits"
grepl("v(?!\\d{2})", string)

# Positive look behind "single digit following v"
grepl("(?<=v)\\d{1})", string)

But both return an 'invalid regex' error

Any suggestions?

dhanlin
  • 145
  • 7
Brent
  • 425
  • 1
  • 3
  • 10
  • I'm not much good at regex, but I suggest `[vV][0-9]{1}[!0-9]` – Agi Hammerthief Aug 19 '19 at 15:08
  • 2
    what about `grepl("v\\d{1}$", string)`? – emilliman5 Aug 19 '19 at 15:28
  • Or even shorter, as `\\d` is one digit: `grepl("v\\d$", string)`, where `$` indicates end of string. But maybe its better to remove all leading zeros e.g. with `sub("v0*", "v", string)` and then make the match. – GKi Aug 19 '19 at 16:19
  • Mind that `v(?!\d{2})` matches `vWORD_HERE` - i.e. even when no digit is there after `v`. See [my answer](https://stackoverflow.com/a/57559945/3832970) with the proper solution. – Wiktor Stribiżew Aug 19 '19 at 19:35

2 Answers2

3

You need to set the perl=TRUE flag on your grepl function.

e.g.

grepl("v(?!\\d{2})", string, perl=TRUE)
[1]  TRUE FALSE FALSE

See this question for more info.

meenaparam
  • 1,949
  • 2
  • 17
  • 29
1

You may use

grepl("v\\d(?!\\d)", string, perl=TRUE)

The v\d(?!\d) pattern matches v, 1 digits and then makes sure there is no digit immediately to the right of the current location (i.e. after the v + 1 digit).

See the regex demo.

Note that you need to enable PCRE regex flavor with the perl=TRUE argument.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563