I am trying to separate numbers and characters in a column of strings. So far I have been using tidyr::separate
for doing this, but am encountering errors for "unusual" cases.
Suppose I have the following data
df <- data.frame(c1 = c("5.5K", "2M", "3.1", "M"))
And I want to obtain a data frame with columns
data.frame(c2 = c("5.5", "2", "3.1", NA),
c3 = c("K", "M", NA, "M))
So far I have been using tidyr::separate
df %>%
separate(c1, into =c("c2", "c3"), sep = "(?<=[0-9])(?=[A-Za-z])")
But this only works for the first three cases. I realize this is because ?<=...
and ?=...
require the presence of the regex. How would one modify this code to capture the cases where the numbers are missing before the letters? Been trying to use the extract
function too, but without success.
Edit: I suppose one solution is to break this up into
df$col2 <- as.numeric(str_extract(df$col1, "[0-9]+"))
df$col3 <- (str_extract(df$col1, "[aA-zZ]+"))
But I was curious whether were other ways to handle it.