1

I have a few really long lines of text that I'd basically like to hard wrap (break) at word boundaries before or on the 80 character mark. However, I also need to prepend characters to each newly broken line, like so:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque viverra euismod pulvinar. Fusce quis nibh commodo, commodo massa eu, ultricies nisi. Phasellus ac nulla odio.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque viverra
    \ euismod pulvinar. Fusce quis nibh commodo, commodo massa eu, ultricies
    \ nisi. Phasellus ac nulla odio.

I've found many methods to break long lines, but none that I've been able to modify to generate output like the above, and I can't simply tack it on after the fact because that would expand the lines beyond 80 characters again.

Can anyone recommend a Vim-compatible regex or native commands to do the above formatting? Something using sed or fmt or other external tools also welcome, but something I can use within Vim would be preferred in this case.

What I've found and tried so far:

I'm sure there's something fairly simple I'm missing, but stuck on how to do that conditional format to break at essentially the last space before 80 characters. Any suggestions would be very much appreciated. Thanks.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Jared
  • 171
  • 7

2 Answers2

1

You need to amend two things:

  • Replace the exactly 80 quantifier with zero to 80 quantifier, \{0,80} (or \{0,78}as it seems you want to have 80-2=78 limit since \t\\ will be two additional chars that you want to insert)
  • Add a trailing word boundary at the end, \>
  • Add \t\\ to the replacement pattern to insert a TAB char and \ in the created lines.

You can use

:%s/.\{0,78}\>/&\r\t\\/g
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks, Wiktor. That gets pretty close, but I still get text exceeding 80 columns when I add the additional characters necessary to produce the formatting above: :%s/.\{0,80}\>/&\r\t\\/g – Jared Nov 19 '21 at 14:58
  • @Jared Do you mean the ``\t\\`` (two chars added) should be subtracted from 80? Then use 80-the amount of added chars. – Wiktor Stribiżew Nov 19 '21 at 15:02
  • Yes, that's correct. Total line length w/ leading '\ ' should not exceed 80 characters. Had started playing around with using 76 characters like you show in your latest update to compensate. That over-shortens the first line, but think I can live with that. – Jared Nov 19 '21 at 15:39
1

A verbose awk command may also do the job well:

awk -v n=76 '
{
   len = 0
   for (i=1; i<=NF; ++i) {
      len += 1 + length($i)
      printf "%s", $i
      if (len > n && i < NF) {
         printf "%s\t\\ ", ORS
         len = 6
      }
      else
         printf "%s", OFS
    }
    print ""
}' file

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque viverra
    \ euismod pulvinar. Fusce quis nibh commodo, commodo massa eu, ultricies
    \ nisi. Phasellus ac nulla odio.

For the 2nd sample provided below I get this output:

_Array1DToHistogram _ArrayAdd _ArrayBinarySearch _ArrayColDelete _ArrayColInsert
    \ _ArrayCombinations _ArrayConcatenate _ArrayDelete _ArrayDisplay _ArrayExtract
    \ _ArrayFindAll _ArrayInsert _ArrayMax _ArrayMaxIndex _ArrayMin _ArrayMinIndex
    \ _ArrayPermute _ArrayPop _ArrayPush _ArrayReverse _ArraySearch _ArrayShuffle

and length of each line is:

80
80
79
79
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Thanks for the suggestion! Testing it out, I get the same behavior I described for Wiktor's solution - the additional space for the prepended characters doesn't seem to be taken into account when matching line length, so this also has some lines exceeding 80 characters. – Jared Nov 19 '21 at 15:45
  • That's doesn't seem right because I reset `len = 6` which is length of padding characters after line breaks. May be you can provide a sample data that can reproduce this problem? – anubhava Nov 19 '21 at 15:47
  • Of course. Here are the first few lines worth of the data I'm actually with, which demonstrate issue: _Array1DToHistogram _ArrayAdd _ArrayBinarySearch _ArrayColDelete _ArrayColInsert _ArrayCombinations _ArrayConcatenate _ArrayDelete _ArrayDisplay _ArrayExtract _ArrayFindAll _ArrayInsert _ArrayMax _ArrayMaxIndex _ArrayMin _ArrayMinIndex _ArrayPermute _ArrayPop _ArrayPush _ArrayReverse _ArraySearch _ArrayShuffle – Jared Nov 19 '21 at 16:21
  • I have run this `awk` command with the 2nd sample you've have provided and also shown length of each line. Note that it never exceeds `80` – anubhava Nov 19 '21 at 17:01
  • 1
    Ah, I see what's happening. It's the difference between the text/byte column and display column since the tab character expands. So yes, treating \t as a single character, it doesn't exceed 80, as you've pointed out. – Jared Nov 19 '21 at 20:01