8

Say I have text like the following text selected with the cursor:

This is a test. 
This 
is a test.

This is a test. 
This is a 
test.

I would like to transform it into:

This is a test. This is a test

This is a test. This is a test

In other words, I would like to replace single line breaks by spaces, leaving empty lines alone.

I thought something like the following would work:

RemoveSingleLineBreaks()
{
  ClipSaved := ClipboardAll
  Clipboard =
  send ^c
  Clipboard := RegExReplace(Clipboard, "([^(\R)])(\R)([^(\R)])", "$1$3")    
  send ^v
  Clipboard := ClipSaved
  ClipSaved = 
}

But it doesn't. If I apply it to the text above, it yields:

This is a test. This is a test.
This is a test. This is a test.

which also removed the "empty line" in the middle. This is not what I want.

To clarify: By an empty line I mean any line with "white" characters (e.g. tabs or white spaces)

Any thoughts how to do this?

Bob
  • 15,441
  • 3
  • 26
  • 42
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564

4 Answers4

6
RegExReplace(Clipboard, "([^\r\n])\R(?=[^\r\n])", "$1$2")

This will strip single line breaks assuming the new line token contains either a CR or a LF at the end (e.g. CR, LF, CR+LF, LF+CR). It does not count whitespace as empty.

Your main problem was the use of \R:

\R inside a character class is merely the letter "R" [source]

The solution is to use the CR and LF characters directly.


To clarify: By an empty line I mean any line with "white" characters (e.g. tabs or white spaces)

RegExReplace(Clipboard, "(\S.*?)\R(?=.*?\S)", "$1")

This is the same as the above one, but counts whitespace as empty. It works because it accepts all characters except line breaks non-greedily (*?) up to the first non-whitespace character both behind and in front of the linebreaks, since the . does not match line breaks by default.

A lookahead is used to avoid 'eating' (matching) the next character, which can break on single-character lines. Note that since it is not matched, it is not replaced and we can leave it out of the replacement string. A lookbehind cannot be used because PCRE does not support variable-length lookbehinds, so a normal capture group and backreference are used there instead.


I would like to replace single line breaks by spaces, leaving empty lines alone.

If you want to replace the line break with spaces, this is more appropriate:

RegExReplace(Clipboard, "(\S.*?)\R(?=.*?\S)", "$1 ")

This will replace single line breaks with a space.


And if you wanted to use lookbehinds and lookaheads:


Strip single line breaks:

RegExReplace(Clipboard, "(?<=[^\r\n\t ][^\r\n])\R(?=[^\r\n][^\r\n\t ])", "")


Replace single line breaks with spaces:

RegExReplace(Clipboard, "(?<=[^\r\n\t ][^\r\n])\R(?=[^\r\n][^\r\n\t ])", " ")

For some reason, \S doesn't seem to work in lookbehinds and lookaheads. At least, not with my testing.

Bob
  • 15,441
  • 3
  • 26
  • 42
  • I'd like to both upvote and downvote: pretty helpful, but `([^\r\n])\R([^\r\n])` and `(\S.*?)\R(.*?\S)` don't work for joining lines with a single (non line break) character. E.g. this string in Java notation: `"aaa\n" + "b\n" + "ccc"` gets incorrectly converted to `"aaab\nccc"`. Additionally, I don't entirely understand the explanation for `(\S.*?)\R(.*?\S)` - would you mind expanding it? – Jan Żankowski Mar 07 '19 at 10:43
  • @JanŻankowski ...wow, this was 7 years ago. Edited & fixed for the single-character case by using a lookahead. For detailed regex explanations, please refer to the [AHK quickreference](https://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm) and various PCRE tutorials/explanations available online. https://www.regular-expressions.info/ is a good one. Or use a tool that can parse/explain regex syntax, [e.g. see the right side of this regex101 page](https://regex101.com/r/MvBIKN/1). – Bob Mar 08 '19 at 03:25
  • Thanks for promptly coming back to this after such a long time! Great to see lookahead is the way to go - I thought so too. A few notes: (1) the first regex `([^\r\n])\R([^\r\n])` probably needs lookahead too, (2) after tinkering with `(\S.*?)\R(?=.*?\S)` in the regex tester you kindly suggested, I don't think the non-greedy modifiers (`?` in `*?`) are needed - the group and lookahead will have wider matches on the line before & the line after the linebreak, but will work too - and it reads simpler. – Jan Żankowski Mar 08 '19 at 10:12
  • @JanŻankowski True, I had that thought this afternoon. I think originally I wasn't sure about the `.` matching behaviour for line breaks, so there was some concern that greedy could match more than intended. That said, in theory non-greedy should be faster because it'll stop earlier, but the lack of variable-length lookbehind means non-greedy and greedy are equivalent there. Edited the first example, leaving the non-greedys alone for now. – Bob Mar 08 '19 at 12:05
2

I believe this will work:

text=
(
This is a test. 
This 
is a test.

This is a test. 
This is a 
test.
)
MsgBox %    RegExReplace(text,"\S\K\v(?=\S)",A_Space)
SouthStExit
  • 201
  • 1
  • 3
1
Clipboard := RegExReplace(Clipboard, "(\S+)\R", "$1 ")
mihai
  • 37,072
  • 9
  • 60
  • 86
  • When I run this, the script deletes the text (i.e. `Clipboard` is assigned an empty string) – Amelio Vazquez-Reina May 05 '12 at 22:02
  • yeah...the solution is incorect, disregard it. It had a mismatch parenthesis, but that was not it. The problem was that you can have empty spaces before the end of line. I'm having trouble as well implementing this just with regex :) – mihai May 06 '12 at 01:03
1
#SingleInstance force

#v::
    Send ^c
    ClipWait
    ClipSaved = %clipboard%

    Loop
    {
        StringReplace, ClipSaved, ClipSaved, `r`n`r`n, `r`n, UseErrorLevel
        if ErrorLevel = 0  ; No more replacements needed.
            break
    }
    Clipboard := ClipSaved
    return
Florent
  • 12,310
  • 10
  • 49
  • 58
tatoosh
  • 21
  • 2