5

I need a way to split a string every n letters.

For example, let s="QW%ERT%ZU%I%O%P" and n=3, I want to obtain "QW%E" "RT%Z" "U%I%O" "%P".

As you can see, the special character "%" is not considered in the division.

I tried with

strsplit(s, "(?<=.{10})(?=.*\\%)", perl = TRUE)[[1]]

but I cannot find a way to obtain what I want.

user438383
  • 5,716
  • 8
  • 28
  • 43
  • 2
    Another idea: [`strsplit(s, "(?:\\PL*\\pL){3}\\K", perl=T)[[1]]`](https://tio.run/##K/r/v1jBRldBKTBc1TUoRDUqVNVT1V81QImLq7ikqLggJ7NEo1hHQUnD3iomJsBHKyamwEez2rg2JsZbSUehILUoxzZEMzraMDb2/38A) – bobble bubble Jun 19 '23 at 21:00

2 Answers2

5

What about regmatches (instead of strsplit) like below?

> n <- 3

> regmatches(s, gregexpr(sprintf("(\\W?\\w){1,%i}", n), s))
[[1]]
[1] "QW%E"  "RT%Z"  "U%I%O" "%P"

Or tapply + strsplit

v <- unlist(strsplit(s, ""))
l <- which(grepl("\\w", v))
tapply(
    v,
    cumsum(seq_along(v) %in% (1 + by(l, ceiling(seq_along(l) / n), max))),
    paste0,
    collapse = ""
)

which gives

      0       1       2       3
 "QW%E"  "RT%Z" "U%I%O"    "%P"
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

Much less succinct then the above but a Base Solution all the same:

# Function to only consider certain characters define by a regex
# and split a string scalar into seperate elements in a vector
split_string_to_vec <- function(s, n, consider_elements_pattern = "[[:alpha:]]"){
  # Ensure s is a character scalar:
  stopifnot(is.character(s) && length(s) == 1)
  # Ensure n is an integer scalar: 
  stopifnot(is.numeric(n) && length(n) == 1)
  # Split the string into separate elements:
  # str_vec => character vector
  str_vec <- unlist(strsplit(s, ""))
  # Assign an index to the string vector: 
  # idx => named integer vector
  idx <- setNames(seq_len(length(str_vec)), str_vec)
  # Resolve which values are to be considered (only alpha numerics):
  # considered_vals => named integer vector
  considered_vals <- idx[grepl(consider_elements_pattern, names(idx))]
  # Split the string vector into a list: 
  # grpd_strings => list of character vectors
  grpd_strings <- split(
    considered_vals,
    ceiling(seq_along(considered_vals) / n)
  )
  # For each string group, resolve the group with the 
  # appropriate characters in order: res_vec => character vector
  res_vec <- vapply(
    seq_along(grpd_strings),
    function(i){
      # Get current list element: 
      curr <- grpd_strings[[i]]
      # If its the first element: 
      if(i == 1){
        # Ignore previous element only focus on this 
        # one: ir => named integer vector
        ir <- sort(c(curr, idx[min(curr):max(curr)]))
      # Otherwise:
      }else{
        # Resolve the previous element: 
        prev <- grpd_strings[[(i-1)]]
        # ir => named integer vector
        ir <- sort(c(curr, idx[(max(prev)+1):max(curr)]))
      }
      # Flatten result into a unique (by idx) string: 
      # character scalar => env
      paste0(
        names(
          subset(
            ir,
            !(duplicated(ir))
          )
        ),
        collapse = ""
      )
    },
    character(1)
  )
  # Explicitly define the returned object:
  # character vector => env
  return(res_vec) 
}
# Input Data:
# s => string scalar
s <- "QW%ERT%ZU%I%O%P"
# n => integer scalar
n <- 3
# Apply the function: string scalar => stdout(console)
split_string_to_vec(s, n, consider_elements_pattern = "[[:alpha:]]")
hello_friend
  • 5,682
  • 1
  • 11
  • 15