I have a data frame:
x <- data.frame(a = letters[1:7], b = letters[2:8],
c = c("bla bla [ text1 ]", "bla bla [text2]", "how how [text3 ]",
"wow wow [ text4a ] [ text4b ]", "ba ba [ text5a ][ text5b]",
"my text A", "my text B"), stringsAsFactors = FALSE)
x
I want to split column c based on what's between two square brackets [...]
in it. If column c contains only one set of square brackets, I want the string to go to the next column. If column c contains two sets of strings surrounded by [
and ]
, I want only the string between the last [
]
to go to the new column.
Here is how I've done it. It seems complicated and I am using a loop. Is it possible to do it in a more parsimonious way?
library(stringr)
# Counting number of square brackets "[" in column c:
sqrbrack_count <- str_count(x$c, pattern = '\\[')
# Creating a new column:
x$newcolumn <- NA
for(i in 1:nrow(x)){ # looping through rows of x
if(sqrbrack_count[i] == 0) next # do nothing of 0 square brackets
minilist <- str_split_fixed(x[i, "c"], pattern = '\\[', n = Inf) # split string
if(sqrbrack_count[i] == 1) { # if there is only one square bracket "["
x[i, "c"] <- minilist[1]
x[i, "newcolumn"] <- minilist[2]
} else { # if there are >1 square bracket "["
x[i, "c"] <- paste(minilist[1:2], collapse = "+")
x[i, "newcolumn"] <- minilist[3]
}
}
# Replacing renmaning square brackets we don't need anymore:
x$c <- str_replace(x$c, pattern = " \\]", replacement = "")
x$c <- str_replace(x$c, pattern = "\\]", replacement = "")
x$newcolumn <- str_replace(x$newcolumn, pattern = " \\]", replacement = "")
x$newcolumn <- str_replace(x$newcolumn, pattern = "\\]", replacement = "")
x