0

I need to extract a nonnegative number from a string or return negative number if a number was not extracted.

For extracting the number I found the following way:

> grep("^[0-9.]+","1234.1234alsk",value=TRUE)
[1] "1234.1234alsk"

If the given string is not a number, then empty string is returned:

> grep("^[0-9.]+","",value=TRUE)
character(0)

Now I would like to replace the empty string with some proxy number, like 0 or -1 using the following kind of function:

> sub("^$","-1","")
[1] "-1"

However, If I apply that function to an empty string of character(0) I do not get the desired result:

> sub("^$","-1",grep("^[0-9.]+","",value=TRUE))
character(0)

The problem is that grep returns character(0) and not "". Then because sub works differently with character(0) and "", I do not get the desired value "-1" but unchanged character(0). As a result those values will be dropped in the following query:

> v <- c("0","","1","2")
> as.numeric(sub("^$","-1",grep("^[0-9.]+",v,value=TRUE)))
[1] 0 1 2

How could I do the above kind of conversion using one-liner?

Heikki
  • 2,214
  • 19
  • 34

1 Answers1

1

By preloading stringr, I found the following one liner to fetch the first occurrence of a number:

> library(stringr)
> x <- c("","1","1.23","1.23a","-123")
> as.vector(ifelse(is.na(str_match(x,"^[0-9.]+")),"-1",str_match(x,"^[0-9.]+")))
[1] "-1"   "1"    "1.23" "1.23" "-1"  

and as a result, I can do the following conversions as one-liners:

> as.numeric(as.vector(ifelse(is.na(str_match(x,"^[0-9.]+")),"-1",str_match(x,"^[0-9.]+"))))
[1] -1.00  1.00  1.23  1.23 -1.00
> all.is.numeric(as.numeric(as.vector(ifelse(is.na(str_match(x,"^[0-9.]+")),"-1",str_match(x,"^[0-9.]+")))))
[1] TRUE

The conversion is stored in a column field, therefore, the one-liner requirement is a necessity here.

Heikki
  • 2,214
  • 19
  • 34
  • 1
    Any chance you'll have a number after some letters? `"da23"` will return `"-1"` and not `"23"`. Is that OK? Because `"^[0-9.]+"` needs the string to start with a number. It's not the "first occurrence of a number". Maybe you can try `"[0-9.]+"` instead? – AntoniosK Oct 01 '18 at 15:01
  • Good point. I need to interpret amount as a number, or incorrect amount as a proxy value. Therefore, in practice I should maybe have even more strict condition, like `"^[0-9.]+$"`. I was thinking accepting different decimal points or spaces at the end, but then I should use another regular expression. The actual problem was however `grep` returning `character(0)` for empty strings. – Heikki Oct 01 '18 at 17:07