2

I am trying to perform string filtering based on R. I have multiple hierarchies and I need to group them together

I have prepared an example:


library(stringr)
library(tidyverse)

numbers <- tibble(LEVEL = c('0.1', '0.1.1', '0.1.2', '0.11', '0.12', '0.11.1', '0.12.1', '0.12.2'))



# Return also different values - first shall only contained: 0.1, 0.1.1, 0.1.2
numbers %>% 
  filter(grepl("^0.1.?", LEVEL))


# Second shall only contained: 0.11, 0.11.1
# Third shall only contained: 0.12, 0.12.1, 0.12.2

String pattern I have used in grepl is not enough.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Petr
  • 1,606
  • 2
  • 14
  • 39

2 Answers2

1

You were right, the regex pattern you supplied is not enough to extract the numbers as you want it.

The following code is probably what you are looking for.

numbers %>% 
filter(grepl("^[0]{1}\\.[1]{1}$|^[0]{1}\\.[1]{1}\\.", LEVEL))
# A tibble: 3 x 1
  LEVEL
  <chr>
1 0.1  
2 0.1.1
3 0.1.2

Next we want only 0.11, 0.11.1, i.e. the number after the first has two 1s and then might be followed by another dot. We modify the code above to accomodate that change.

numbers %>% 
filter(grepl("^[0]{1}\\.(11){1}$|^[0]{1}\\.(11){1}\\.", LEVEL))

Here, we put the number 11 that we want to isolate into a group that is looked for to happen exactly once {1}. Similarly, we can write

numbers %>% 
filter(grepl("^[0]{1}\\.(12){1}$|^[0]{1}\\.(12){1}\\.", LEVEL))
# A tibble: 3 x 1
  LEVEL 
  <chr> 
1 0.12  
2 0.12.1
3 0.12.2

to get those with pattern 12.

Taufi
  • 1,557
  • 8
  • 14
1

The regex patterns can be formulated in a more concise way:

numbers %>% 
  filter(grepl("^0\\.1$|^0\\.1\\.", LEVEL))   # 0.1, 0.1.1, 0.1.2
numbers %>% 
  filter(grepl("^0\\.11$|^0\\.11\\.", LEVEL)) # 0.11, 0.11.1
numbers %>% 
  filter(grepl("^0\\.12$|^0\\.12\\.", LEVEL)) # 0.12, 0.12.1, 0.12.2
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34