2

I have a massive text file and want to remove all lines that are less than 6 characters long.

I tried the following search string (Regular expressions - Perl)

^.{0,5}\n\r$   -- string not found

^.{0,5}\n\r    -- string not found

^.{0,5}$       -- leaves blank lines

^.{0,5}$\n\r   -- string not found

^.{0,5}$\r     -- leaves blank lines

^.{0,5}$\r\n   -- **worked**

My question is why should the last one work and the 4th one not work? Why should the 5th one leave blank lines.

Thanks.

mpapec
  • 50,217
  • 8
  • 67
  • 127
chribonn
  • 445
  • 5
  • 20
  • UltraEdit indicates the line terminator type in status bar at bottom of main application window for active file with **DOS**, **UNIX**, **MAC**. See UltraEdit forum topic [DOS/UNIX/MAC line terminator indication in status bar](https://www.ultraedit.com/forums/viewtopic.php?f=7&t=15214) with more information about this indication in status bar. And take a look also on [UE symbol explanations for line teminators](https://www.ultraedit.com/forums/viewtopic.php?f=3&t=12016). – Mofi Nov 21 '14 at 06:52

2 Answers2

1

Because ^.{0,5}$\n\r is not the same as ^.{0,5}$\r\n.

  • \n\r is a linefeed followed by carriage return.

  • \r\n is a carriage return followed by linefeed - a popular line ending combination of characters. Specifically \r\n is used by the MS-DOS and Windows family of operating systems, among others.

G. Cito
  • 6,210
  • 3
  • 29
  • 42
  • All text files follow this sequence? – chribonn Nov 20 '14 at 20:59
  • 1
    DOS/Windows has traditionally been CR/LF (`\r\n`) while Unix systems are just a bare LF (`\n`). Macs before OS X used a bare CR (`\r`). I'm not aware of any systems that used LF/CR (`\n\r`). – tomlogic Nov 20 '14 at 21:02
  • @tomlogic According to [Wikipedia's Newline article](http://en.wikipedia.org/wiki/Newline) Acorn and RiscOS used it in certain applications. – G. Cito Nov 20 '14 at 21:09
1

In multiline mode, ^ is a metacharacter that matches Begin of String and can also match after a newline.

Likewise, $ matches End of String and these too:

          \r\n
         ^    ^
here ----+-or-+

or

            \n
         ^    ^
here ----+-or-+  

$ will try to match before newline if it can (depends on other parts of the regex).

You can use that to advantage like this regex

^.{0,5}$(\r?\n)* which will match end of string AND optional successive linebreaks.