5

I am working on some raw text and want to replace all multiple spaces with one space. Normally, would use stringr's str_squish, but unfortunately it also removes linebreaks (\n and \r) which I have to keep.

Any idea? Below my attempts. Many thanks!

library(tidyverse)
x <- "hello     \n\r how are you \n\r    all good?"
str_squish(x)
#> [1] "hello how are you all good?"
str_replace_all(x, "[:space:]+", " ")
#> [1] "hello how are you all good?"
str_replace_all(x, "\\s+", " ")
#> [1] "hello how are you all good?"

Created on 2020-07-01 by the reprex package (v0.3.0)

zoowalk
  • 2,018
  • 20
  • 33

2 Answers2

4

With stringr, you may use \h shorthand character class to match any horizontal whitespaces.

library(stringr)
x <- "hello     \n\r how are you \n\r    all good?"
x <- str_replace_all(x, "\\h+", " ")
## [1] "hello \n\r how are you \n\r all good?"

In base R, you may use it, too, with a PCRE pattern:

gsub("\\h+", " ", x, perl=TRUE)

See the online R demo.

If you plan to still match any whitespace (including some Unicode line breaks) other than CR and LF symbols, you may plainly use [^\S\r\n] pattern:

str_replace_all(x, "[^\\S\r\n]+", " ")
gsub("[^\\S\r\n]+", " ", x, perl=TRUE)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2

You could just us a literal space in the regex instead of \\s or [:space:]:

str_replace_all(x, " +", " ") %>%
    cat()

hello 
 how are you 
 all good?

You can also include tabs by using [ \t], [:blank:], or \\h instead of . In this case, you may want to use {2,} to select 2 or more of the same selector so you don't have to write the pattern twice (ie. [:blank:][:blank:]+):

y <- "hello     \n\r\t\thow are you \n\r    all   good?"

str_replace_all(y, "[:blank:]{2,}", " ") %>%
    cat()

hello 
 how are you 
 all good?
divibisan
  • 11,659
  • 11
  • 40
  • 58
  • 3
    There is also `[[:blank:]]` that removes space and tabs. Since OP only wants to replace if there are two or more spaces, you could explicitly state that too `[[:blank:]]{2,}`, although you would obviously get the same result. – rpolicastro Jul 01 '20 at 14:13