1

I'm trying to filter rows containing the string "Data\\this\\way\\test". Unclear as to why this does not work. Ideally would expect to see an output like

  "D:\\Data\\this\\way\\test\\dat1",
  "D:\\Data\\this\\way\\test\\dat2",

My code:

files <- c(
  "D:\\Data\\this\\way\\test\\dat1",
  "D:\\Data\\this\\way\\test\\dat2",
  "D:\\Data\\not-this\\way\\test\\dat1",
  "D:\\Data\\not-this\\way\\test\\dat2"
)

files_filt_df <- data.frame(filenames = files, 
                             stringsAsFactors = FALSE) %>%
  filter(str_detect(filenames,"Data\\this\\way\\test"))
files_filt_df
[1] filenames
<0 rows> (or 0-length row.names)
val
  • 1,629
  • 1
  • 30
  • 56
  • 2
    "\\" is an escaped "\". To get two escaped "\", use "\\\\". Or "Data\\\\this\\\\way\\\\test" in your case. – dipetkov Nov 01 '19 at 21:01

3 Answers3

2

By default str_detect expects you to pass a regular expression. Things like \w have special meaning in regular expressions. If you just want to match a literal value, the easiest way would be

files_filt_df <- data.frame(filenames = files, 
                            stringsAsFactors = FALSE) %>%
  filter(str_detect(filenames,fixed("Data\\this\\way\\test")))

Or if you want to use a regular expression, you need to add an additional level of escaping on the slashes

files_filt_df <- data.frame(filenames = files, 
                            stringsAsFactors = FALSE) %>%
  filter(str_detect(filenames,"Data\\\\this\\\\way\\\\test"))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
2

Since these are filenames, you can also use the fs package to check is a file has a particular parent and let fs deal with the file separators.

library("tidyverse")

files <- c(
  "D:\\Data\\this\\way\\test\\dat1",
  "D:\\Data\\this\\way\\test\\dat2",
  "D:\\Data\\not-this\\way\\test\\dat1",
  "D:\\Data\\not-this\\way\\test\\dat2"
)

tibble(
  file = files
) %>%
  filter(map_lgl(file, ~ fs::path_has_parent(., "D:/Data/this/way")))
#> # A tibble: 2 x 1
#>   file                             
#>   <chr>                            
#> 1 "D:\\Data\\this\\way\\test\\dat1"
#> 2 "D:\\Data\\this\\way\\test\\dat2"

# Explanation:

# The `map_lgl` applies `fs::path_has_parent` to each file
# and returns TRUE/FALSE (logical = lgl) values.

# Without `map`:
fs::path_has_parent(files, "D:/Data/this/way")
#> [1] FALSE

# With `map`:
map_lgl(files, ~ fs::path_has_parent(., "D:/Data/this/way"))
#> [1]  TRUE  TRUE FALSE FALSE

# The `~` operator creates a formula.
# Here it is shorter than defining an inline function.

# Formula:
map_lgl(files, ~ fs::path_has_parent(., "D:/Data/this/way"))
#> [1]  TRUE  TRUE FALSE FALSE

# Function:
map_lgl(files, function(x) fs::path_has_parent(x, "D:/Data/this/way"))
#> [1]  TRUE  TRUE FALSE FALSE

Created on 2019-11-02 by the reprex package (v0.3.0)

dipetkov
  • 3,380
  • 1
  • 11
  • 19
  • could you explain what map_lgl is doing and how to interpret the `~` and `.`. I'm still learning the map() function and often confused by the use of it with and without `~`. – val Nov 02 '19 at 17:44
  • 1
    The use of `~` in purrr is subtle. Here is a nice explanation: https://stackoverflow.com/questions/44834446/what-is-meaning-of-first-tilde-in-purrrmap – dipetkov Nov 02 '19 at 18:31
0

Base R solution (find the literal pattern in files, return all values matching pattern):

data.frame(files = grep("\\Data\\this\\way\\", files, value = T, fixed = T), stringsAsFactors = F)
hello_friend
  • 5,682
  • 1
  • 11
  • 15