34

Here is a function I wrote to break a long string into lines not longer than a given length

strBreakInLines <- function(s, breakAt=90, prepend="") {
  words <- unlist(strsplit(s, " "))
  if (length(words)<2) return(s)
  wordLen <- unlist(Map(nchar, words))
  lineLen <- wordLen[1]
  res <- words[1]
  lineBreak <- paste("\n", prepend, sep="")
  for (i in 2:length(words)) {
    lineLen <- lineLen+wordLen[i]
    if (lineLen < breakAt) 
      res <- paste(res, words[i], sep=" ")
    else {
      res <- paste(res, words[i], sep=lineBreak)
      lineLen <- 0
    }
  }
  return(res)
}

It works for the problem I had; but I wonder if I can learn something here. Is there a shorter or more efficient solution, especially can I get rid of the for loop?

Karsten W.
  • 17,826
  • 11
  • 69
  • 103

4 Answers4

70

How about this:

gsub('(.{1,90})(\\s|$)', '\\1\n', s)

It will break string "s" into lines with maximum 90 chars (excluding the line break character "\n", but including inter-word spaces), unless there is a word itself exceeding 90 chars, then that word itself will occupy a whole line.

By the way, your function seems broken --- you should replace

lineLen <- 0

with

lineLen <- wordLen[i]
xiechao
  • 2,291
  • 17
  • 11
41

For the sake of completeness, Karsten W.'s comment points at strwrap, which is the easiest function to remember:

strwrap("Lorem ipsum... you know the routine", width=10)

and to match exactly the solution proposed in the question, the string has to be pasted afterwards:

paste(strwrap(s,90), collapse="\n")

This post is deliberately made community wiki since the honor of finding the function isn't mine.

teukkam
  • 4,267
  • 1
  • 26
  • 35
Deer Hunter
  • 1,211
  • 1
  • 18
  • 31
  • 1
    If you need this as a function, you can also modify `strwrap` in `sapply` for the following User defined formula: `trimmer <- function(x,break_limit){ sapply(strwrap(x, break_limit, simplify=FALSE), paste, collapse="\n") } ` – Dave Gruenewald Aug 04 '16 at 19:50
21

For further completeness, there's:

  • stringi::stri_wrap
  • stringr::str_wrap (which just ultimately calls stringi::stri_wrap

The stringi version will deal with character sets better (it's built on the ICU library) and it's in C/C++ so it'll ultimately be faster than base::strwrap. It's also vectorized over the str parameter.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
5

You can look at e.g. the write.dcf() FUNCTION in R itself; it also uses a loop so nothing to be ashamed of here.

The first goal is to get it right --- see Chambers (2008).

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725