4

I already have tried to find a solutions on the internet for my problem, and I have the feeling I know all the small pieces but I am unable to put them together. I'm quite knew at programing so pleace be patient :D...

I have a (in reality much larger) text string which look like this:

string <- "Test test [438] test. Test 299, test [82]."

Now I want to replace the numbers in square brackets using a lookup table and get a new string back. There are other numbers in the text but I only want to change those in brackets and need to have them back in brackets.

lookup <- read.table(text = "
Number   orderedNbr
1 270 1
2 299 2
3 82  3
4 314 4
5 438 5", header = TRUE)

I have made a pattern to find the square brackets using regular expressions

pattern <- "\\[(\\d+)\\]"

Now I looked all around and tried sub/gsub, lapply, merge, str_replace, but I find myself unable to make it work... I don't know how to tell R! to look what's inside the brackets, to look for that same argument in the lookup table and give out what's standing in the next column.

I hope you can help me, and that it's not a really stupid question. Thx

Solana
  • 55
  • 5

3 Answers3

2

We can use a regex look around to match only numbers that are inside a square bracket

library(gsubfn)
gsubfn("(?<=\\[)(\\d+)(?=\\])", setNames(as.list(lookup$orderedNbr), 
             lookup$Number), string, perl = TRUE)
#[1] "Test test [5] test. Test [3]."

Or without regex lookaround by pasteing the square bracket on each column of 'lookup'

gsubfn("(\\[\\d+\\])", setNames(as.list(paste0("[", lookup$orderedNbr, 
          "]")), paste0("[", lookup$Number, "]")), string)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks! This works great, but :D.... There will be also other numbers in my strings and I want prevent those numbers to be changed. Thats the reason why I searched for the number in brackets. But I still need to have these brackets in the replacement. So this has still to work: `string <- "Test test test [438]. Test 299, test [82]"` `gsubfn(pattern, c("[", setNames(as.list(lookup$orderedNbr), lookup$Number),"]"), string)` that does not work... – Solana Apr 27 '18 at 15:11
2

Read your table of keys and values (a 2 column table) into a data frame. If your source information be a flat text file, then you can easily use read.csv to obtain a data frame. In the example below, I hard code a data frame with just two entries. Then, I iterate over it and make replacements in the input string.

df <- data.frame(keys=c(438, 82), values=c(5, 3))
string <- "Test test [438] test. Test [82]."
for (i in 1:nrow(df)) {
    string <- gsub(paste0("(?<=\\[)", df$keys[i], "(?=\\])"), df$values[i], string, perl=TRUE)
}

string

[1] "Test test 5 test. Test 3."

Demo

Note: As @Frank wisely pointed out, my solution would fail if your number markers (e.g. [438]) happen to have replacements which are numbers also appearing as other markers. That is, if replacing a key with a value results in yet another key, there could be problems. If this be a possibility, I would suggest using markers for which this cannot happen. For example, you could remove the brackets after each replacement.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thanks for the anwer. The brackets are to be retained, and I have about 1000 numbers which have to be replaced by about 40 new numbers in 100 cases, for this reason I wanted to use a lookup table. Isn't it a bit complicated to allways make this list? – Solana Apr 27 '18 at 15:08
  • @Solana I updated my answer to use a data frame. This can easily handle the amount of replacements you described. – Tim Biegeleisen Apr 27 '18 at 15:19
  • Hm, one risk here -- if there is overlap between the keys and the values, you might overwrite a key with another key that is reached later in the loop. – Frank Apr 27 '18 at 15:41
  • @Frank How can this happen, assuming keys do not repeat? Also, where in the OP do you see any evidence of this edge case? The data structure we want to use here is a hash map, but R isn't big on those so I used a data frame in its place. – Tim Biegeleisen Apr 27 '18 at 15:45
  • It can happen with `df <- data.frame(keys=c(438, 82, 5), values=c(5, 3, 1))`, right? 438 becomes 5 and then 5 becomes 1. Do I see proof that the OP has this case? No. Is it possible that they or someone who comes to this question later might? I think so, which is why I brought it up, figuring you might have a remedy. You are free to ignore it, of course. I like the approach and can think of one (crude) way to address the edge case: make the replacement something like `paste0(".", df$values[i])` and then do a final gsub sweep at the end to get rid of the prefix. – Frank Apr 27 '18 at 15:54
  • @Frank Good catch. Maybe this should be deleted. – Tim Biegeleisen Apr 27 '18 at 15:57
  • I like the approach since it doesn't require exotic functions like `regmatches<-` or packages and so is easy to follow. Arguably worth keeping with a caveat about no overlap between keys and values and/or a hack to address it. – Frank Apr 27 '18 at 15:59
  • 1
    @Frank I gave a caveat above. In practice, in most languages a prepared statement would use markers for which this can't happen. I think that `[438]` should be completely replaced with the value, i.e. square brackets denote the marker (in which case we would actually use `[[438]]` since the OP wants literal brackets as well) – Tim Biegeleisen Apr 27 '18 at 16:13
1

You can use regmatches<- with a pattern containing lookahead/lookbehind:

patt = "(?<=\\[)\\d+(?=\\])"
m = gregexpr(patt, string, perl=TRUE)
v = as.integer(unlist(regmatches(string, m)))

`regmatches<-`(string, m, value = list(lookup$orderedNbr[match(v, lookup$Number)]))
# [1] "Test test [5] test. Test 299, test [3]."

Or to modify the string directly, change the last line to the more readable...

regmatches(string, m) <- list(lookup$orderedNbr[match(v, lookup$Number)])
Frank
  • 66,179
  • 8
  • 96
  • 180