1

Is there a simple way to substract strings across columns in a tibble or data.frame?

For example in the tibble below, is there a way to easily create column b from columns a and c? Similar to how I create c from a and b? (ie c = a + b, so b = c - a).

ex1 <- tibble(a = rep(c("orange", "green", "grey"), 2),
   b = rep(c("ball", "hockey puck"), each = 3),
   c = str_c(a, " ", b))

I would want the solution to work for any number of words in each string in columns a and b.

For example I was thinking something along the lines of the below code (breaking into words and doing a pair-wise comparison) but it doesn't quite work.

ex1 %>% 
  separate_rows(c) %>% 
  filter(b != c) %>% 
  group_by(a, b) %>% 
  summarize(a2 = str_c(c, collapse = " "))

Any ideas?

ColinTea
  • 998
  • 1
  • 9
  • 15

2 Answers2

3

Either of these should work:

ex1 %>% 
  rowwise() %>% 
  mutate( b = sub(a, "", c) %>% str_trim() )

# # A tibble: 6 x 3
#        a            b                  c
#    <chr>        <chr>              <chr>
# 1 orange         ball        orange ball
# 2  green         ball         green ball
# 3   grey         ball          grey ball
# 4 orange  hockey puck orange hockey puck
# 5  green  hockey puck  green hockey puck
# 6   grey  hockey puck   grey hockey puck

ex1 %>% mutate( b = str_replace(ex1$c, ex1$a, "") %>% str_trim() )

# # A tibble: 6 x 3
#        a           b                  c
#    <chr>       <chr>              <chr>
# 1 orange        ball        orange ball
# 2  green        ball         green ball
# 3   grey        ball          grey ball
# 4 orange hockey puck orange hockey puck
# 5  green hockey puck  green hockey puck
# 6   grey hockey puck   grey hockey puck
cmaher
  • 5,100
  • 1
  • 22
  • 34
  • This seems good but what happens if the values in column "a" don't work well as regular expressions? For example imagine instead of "green" the value was "grey". I guess you could avoid this with rowwise(), but then what happens if the values in a have special characters? – ColinTea Jan 25 '18 at 19:54
  • As they're written in my answer, both `sub` and `str_replace` try to use the value from `a` to match an exact substring to replace in `c`. If you want to remove the first word on a non-exact match (for, say, any color out of the set of "orange|green|grey"), then you'd need to use a regex pattern instead. – cmaher Jan 25 '18 at 20:00
  • 1
    @ If you have special characters, you should use the `fixed()` around the pattern. – Gregor Thomas Jan 25 '18 at 21:11
0

You can write a function that does this

`%-%`=function(x,y)sub(paste0("\\s*",y,"\\s*",collapse="|"),"",x)
ex1$c%-%ex1$a # To obtain b ie c-a
[1] "ball"        "ball"        "ball"        "hockey puck" "hockey puck" "hockey puck"
ex1$c%-%ex1$b # To obtain a ie c-b
[1] "orange" "green"  "grey"   "orange" "green"  "grey"  
Onyambu
  • 67,392
  • 3
  • 24
  • 53