1

I have a very similar task to the original poster in this thread: Create new column in dataframe based on partial string matching other column

however have 10 different conditions under TEST. There was a suggestion in the original thread how to code for >3 conditions however I wasn’t able to understand how to apply it to my data.

I want to create a column called DISTANCE that extracts the distance from the test. So for any test that includes "0.10m" in the name, I want to be able to have "0-10m" in the distance column. If it is "0.20m" in the name, I want it to be "0-20m" in the DISTANCE column and so on.

PLAYER      SEX     TEST        VALUE             
Player 1    Female    ICE_0.10m    2.100000
Player 1    Female    ICE_0.20m    3.475000
Player 1    Female    ICE_10.20m    1.375000
Player 1    Female    ICE_20.30m    1.246000
Player 1    Female    ICE_0.30m    4.721000
Player 1    Female    ICE_Vel_0.10m    4.761905
Player 1    Female    ICE_Vel_0.20m    5.755396
Player 1    Female    ICE_Vel_10.20m    7.272727
Player 1    Female    ICE_Vel_20.30m    8.025682
Player 1    Female    ICE_Vel_0.30m    6.354586
Player 1    Female    OFF_0.10m    1.983000
Player 1    Female    OFF_0.20m    3.380000
Player 1    Female    OFF_10.20m    1.397000
Player 1    Female    OFF_20.30m    1.380000
Player 1    Female    OFF_0.30m    4.760000
Player 1    Female    OFF_Vel_0.10m    5.042864
Player 1    Female    OFF_Vel_0.20m    5.917160
Player 1    Female    OFF_Vel_10.20m    7.158196
Player 1    Female    OFF_Vel_20.30m    7.246377
Player 1    Female    OFF_Vel_0.30m    6.302521

I tried this but it didn't work:

SpeedLong$Distance <- ifelse(grepl("0.10m", SpeedLong$Tag, ignore.case = T), "0-10m",
ifelse(grepl("0.20m", SpeedLong$Tag, ignore.case = T), "0-20m",
ifelse(grepl("0.30m", SpeedLong$Tag, ignore.case = T), "0-30m",
ifelse(grepl("0.10m", SpeedLong$Tag, ignore.case = T), "0-10m", "20-30m"))

With that code I don't get an error message but it shows the code in the console finishing with a + sign I guess meaning the code is incomplete? I don't know if if else and grepl are the best ways to go about this so alternative suggestions are welcome!

vb66
  • 353
  • 3
  • 14
  • The console `+` is because parentheses are incorrectly nested or not properly closed. I think your example should end with 4: `))))`. Consider `dplyr::case_when()` as an alternative to multiple `ifelse`. – neilfws Jul 26 '19 at 01:23
  • That also worked adding the four )))). Thanks! I'll try to find a way to do it with case_when() as well. The solution below worked but it's always good to have alternative ways to do something. – vb66 Jul 26 '19 at 02:34

1 Answers1

1

Instead of nested ifelse, a better option would be to extract the matching substring and change the . to - with a regex pattern. Here, we match characters (.*) until the _, capture the first set of numbers ([0-9]+) as a group ((...)) followed by the dot (\\. - dot is a metacharacter that matches any character, so we escape (\\) it to get the literal value), followed by second set of numbers in another capture group, and in the replacement use the backreference (\\1, \\2) of the captured group

library(dplyr)
library(stringr)
df1 %>% 
    mutate(DISTANCE = str_replace(TEST, ".*_([0-9]+)\\.([0-9]+)", "\\1-\\2"))
#     PLAYER    SEX           TEST    VALUE DISTANCE
#1  Player 1 Female      ICE_0.10m 2.100000    0-10m
#2  Player 1 Female      ICE_0.20m 3.475000    0-20m
#3  Player 1 Female     ICE_10.20m 1.375000   10-20m
#4  Player 1 Female     ICE_20.30m 1.246000   20-30m
#5  Player 1 Female      ICE_0.30m 4.721000    0-30m
#6  Player 1 Female  ICE_Vel_0.10m 4.761905    0-10m
#7  Player 1 Female  ICE_Vel_0.20m 5.755396    0-20m
#8  Player 1 Female ICE_Vel_10.20m 7.272727   10-20m
#9  Player 1 Female ICE_Vel_20.30m 8.025682   20-30m
#10 Player 1 Female  ICE_Vel_0.30m 6.354586    0-30m
#11 Player 1 Female      OFF_0.10m 1.983000    0-10m
#12 Player 1 Female      OFF_0.20m 3.380000    0-20m
#13 Player 1 Female     OFF_10.20m 1.397000   10-20m
#14 Player 1 Female     OFF_20.30m 1.380000   20-30m
#15 Player 1 Female      OFF_0.30m 4.760000    0-30m
#16 Player 1 Female  OFF_Vel_0.10m 5.042864    0-10m
#17 Player 1 Female  OFF_Vel_0.20m 5.917160    0-20m
#18 Player 1 Female OFF_Vel_10.20m 7.158196   10-20m
#19 Player 1 Female OFF_Vel_20.30m 7.246377   20-30m
#20 Player 1 Female  OFF_Vel_0.30m 6.302521    0-30m

Or using base R

df1$DISTANCE <- sub(".*_([0-9]+)\\.([0-9]+)", "\\1-\\2", df1$TEST)

data

df1 <- structure(list(PLAYER = c("Player 1", "Player 1", "Player 1", 
"Player 1", "Player 1", "Player 1", "Player 1", "Player 1", "Player 1", 
"Player 1", "Player 1", "Player 1", "Player 1", "Player 1", "Player 1", 
"Player 1", "Player 1", "Player 1", "Player 1", "Player 1"), 
    SEX = c("Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female"), TEST = c("ICE_0.10m", "ICE_0.20m", 
    "ICE_10.20m", "ICE_20.30m", "ICE_0.30m", "ICE_Vel_0.10m", 
    "ICE_Vel_0.20m", "ICE_Vel_10.20m", "ICE_Vel_20.30m", "ICE_Vel_0.30m", 
    "OFF_0.10m", "OFF_0.20m", "OFF_10.20m", "OFF_20.30m", "OFF_0.30m", 
    "OFF_Vel_0.10m", "OFF_Vel_0.20m", "OFF_Vel_10.20m", "OFF_Vel_20.30m", 
    "OFF_Vel_0.30m"), VALUE = c(2.1, 3.475, 1.375, 1.246, 4.721, 
    4.761905, 5.755396, 7.272727, 8.025682, 6.354586, 1.983, 
    3.38, 1.397, 1.38, 4.76, 5.042864, 5.91716, 7.158196, 7.246377, 
    6.302521)), class = "data.frame", row.names = c(NA, -20L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    The second set of code worked! The first set of code I entered like this: SpeedLong %>% mutate(SpeedLong$Distance = str_replace(SpeedLong$Tag, ".*_([0-9]+)\\.([0-9]+)", "\\1-\\2")) I got an error message of: Error: Column `==...` must be length 2180 (the number of rows) or one, not 0 – vb66 Jul 26 '19 at 02:01
  • @VickiB. In the first set, it was not updated unless you assign i.e. `df1 <- df1 %>% mutate(..` – akrun Jul 26 '19 at 02:02
  • Thanks so much for your help! I'm new to R and have NO idea what any of the working syntax means as I've just used basic functions. Any resources you might suggest to understand what that code means? – vb66 Jul 26 '19 at 02:07
  • @VickiB If you meant the regex, then https://www.regular-expressions.info/tutorial.html would get an idea about the syntax (which is more general as it is used in mulitiple languages). Regarding the tidyverse, [vignettes](https://dplyr.tidyverse.org/) could help you in understanding the syntax – akrun Jul 26 '19 at 02:10
  • 1
    Thank you @akrun ! – vb66 Jul 26 '19 at 02:28