2

I am developing a Shiny app, which generates folders and subfolders for each user and each of their experiments.

I wish to ensure that neither the user nor experiment names contain any illegal characters.

I define a character vector with every illegal character that I know of, however, there is a chance of human errors. Is there a more precise way of doing this?

dir <- "~/home/my_app_data"
usr <- "john"
exp <- "explosion`s"
path <- paste(dir, usr, exp, sep = "/")
illegal <- c(" ", ",", "`")

if (any(illegal %in% (strsplit(x = path, split = "") %>% unlist))) {
  stop( "Illegal characters used")
} else {
  dir.create(path, recursive = T)
}
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 1
    It might be more robust to only allow legal characters. You are sure to miss some illegal characters from different language keyboards etc. – Daniel O Jul 03 '20 at 11:24
  • Your right! I considered to use [letters, numbers] objects for this check, if there are no R objects/functions made for this purpose! – Kasper Thystrup Karstensen Jul 03 '20 at 11:31
  • Sure there is `grepl("[^A-z0-9]", filenames)` will return `TRUE` if a string contains any non alpha-numeric character. – Daniel O Jul 03 '20 at 11:34
  • @DanielO - You can't use `[A-z]`, there are half a dozen symbols between `Z` and `a` on the ASCII table - some of which are illegal for filepaths. – Ritchie Sacramento Jul 03 '20 at 11:56
  • 2
    @27ϕ9 you're right, `grepl("\\W", filenames)` could be more appropriate. – Daniel O Jul 03 '20 at 12:09

2 Answers2

2

Using grepl. pattern="\\W" finds non-word characters excluding underscore "_".

FUN <- function(x) {
  if (grepl("\\W", x)) stop(sprintf("Illegal characters used in '%s'", x)) else x
}

FUN(usr)
# [1] "john"

FUN(exp)
# Error in FUN(exp) : Illegal characters used in 'explosion`s'

lapply(c(usr, exp), FUN)
# Error in FUN(X[[i]], ...) : Illegal characters used in 'explosion`s' 

FUN("john123")
# [1] "john123"

FUN("john_123")
# [1] "john_123"

(Of course you want to define your custom else condition.)

jay.sf
  • 60,139
  • 8
  • 53
  • 110
1
dir <- "~/home/my_app_data"
usr <- "john"
exp <- "explosion`s"
path <- paste(dir, usr, exp, sep = "/")

Should prevent most errors:

if(!(identical(gsub(
  "[^[:alnum:]]+",
  "_",
  iconv(exp, from = "ascii", "utf-8")
), exp))) {
  stop("Illegal characters used")
} else {
  dir.create(path, recursive = TRUE)
}
hello_friend
  • 5,682
  • 1
  • 11
  • 15
  • Thank you so much for your help! My regex is not the absolute strongest, would you mind to elaborate a bit (in the answer) on why this should mostly work? – Kasper Thystrup Karstensen Jul 03 '20 at 12:18
  • the conversion from ascii to utf-8 encoding should handle the ascii characters illegal for file paths - substituting them for utf-8 characters. The substitution of anything not alpha-numeric for a "_" handles the bulk portion of illegal characters. I am not 100% confident this will weed out all illegal characters -- but most. – hello_friend Jul 03 '20 at 14:38