-2

I have a 1.5 GB Windows text file with some lines ending with LF and most of lines ending with CR+LF

Can you please help with sed script which

  • will replace all CR+LF with $|$
  • replace all LF with CR+LF
  • replace back all $|$ with CR+LF

I have tried to do all replacements with text editor, but it took very long to perform all replacements in the file (1 percent for half an hour). I've tried to replace it with fart:

fart -c -B -b text.txt "\r\n" "$|$"

with following result

replacement 0 occurence(s) in 0 file(s)..
agc
  • 7,973
  • 2
  • 29
  • 50
  • My mistake, I have tried to do all replacements with text editor, but it took very long to perform all replacements in the file (1 percent for half an hour). I've tried to replace it with Fart (http://fart-it.sourceforge.net/) fart -c text.txt "CRLF" "$|$" but it finds nothing to replace – Ugil Meister Feb 08 '19 at 14:15
  • Are there any CRs in the file other then those immediately before LFs? Usually when there's a Windows file that has LFs other then preceeded by CR those LFs actually do NOT indicate the end of a line. An example would be a CSV exported from Excel where `beg,"foo\nbar",end\r\n` represents a single line where one cell contains a `\n` within quotes. So are you SURE you want to treat all independent LFs as if they represent line endings? – Ed Morton Feb 09 '19 at 00:12
  • The file is simply data extract from database. Most of the lines has normal CR+LF endings. But some are broke with initial corrupted content ( additional LF) which transfer the normal CR+LF to the new line and breaks one line into two. The task is to move extracted data back to database and that is why the problem appears. The initial number of lines and number of lines uploaded to database will not match without additional manipulation. – Ugil Meister Feb 11 '19 at 05:47

3 Answers3

0

One with awk:

$ awk '{sub(/(^|[^\r])$/,"&\r")}1' file

Testing it (0x0a is LF, 0x0d is CR):

$ awk 'BEGIN{print "no\nyes\r\n\n\r"}' > foo
$ hexdump -C foo
00000000  6e 6f 0a 79 65 73 0d 0a  0a 0d 0a                 |no.yes.....|
0000000b
$ awk '{sub(/(^|[^\r])$/,"&\r")}1' foo > bar
$ hexdump -C bar
00000000  6e 6f 0d 0a 79 65 73 0d  0a 0d 0a 0d 0a           |no..yes......|
0000000d
James Brown
  • 36,089
  • 7
  • 43
  • 59
0

I would do this: first remove all \r at the end of the line, then explicitly add a \r to the end of the line.

sed -e 's/\r$//' -e 's/$/\r/' file

Here's a demo:

$ printf "1\r\n2\n3\n4\r\n5\n" > file
$ od -c file
0000000   1  \r  \n   2  \n   3  \n   4  \r  \n   5  \n
0000014
$ sed -i -e 's/\r$//' -e 's/$/\r/' file
$ od -c file
0000000   1  \r  \n   2  \r  \n   3  \r  \n   4  \r  \n   5  \r  \n
0000017

This is GNU sed.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
0

It's simpler just to install a util like unix2dos which does it automatically. With unix2dos the proposed intermediate step of converting CR+LF to $|$, (and back), isn't necessary. Demo:

# first dump a file with both *DOS* and *Unix* style line endings:
hexdump -C <({ seq 2 | unix2dos ; seq 3 4; } )
# the same file, run through unix2dos
hexdump -C <({ seq 2 | unix2dos ; seq 3 4; } | unix2dos)

Output:

00000000  31 0d 0a 32 0d 0a 33 0a  34 0a                    |1..2..3.4.|
0000000a
00000000  31 0d 0a 32 0d 0a 33 0d  0a 34 0d 0a              |1..2..3..4..|
0000000c

Or more elaborately, a before/after table, (see man hexdump for details on formatting):

hdf() { hexdump -v  -e '/1  "%_ad#  "' -e '/1 " _%_u\_\n"' $@ ; }
# Note: the `printf` stuff keeps `paste` from misaligning the output.
paste <(hdf <({ seq 2 | unix2dos ; seq 3 4; }) ; printf '\t\n\t\n' ; ) \
      <(hdf <({ seq 2 | unix2dos ; seq 3 4; } | unix2dos ))

Output:

0#   _1_    0#   _1_
1#   _cr_   1#   _cr_
2#   _lf_   2#   _lf_
3#   _2_    3#   _2_
4#   _cr_   4#   _cr_
5#   _lf_   5#   _lf_
6#   _3_    6#   _3_
7#   _lf_   7#   _cr_
8#   _4_    8#   _lf_
9#   _lf_   9#   _4_
            10#  _cr_
            11#  _lf_
agc
  • 7,973
  • 2
  • 29
  • 50
  • @UgilMeister, Glad to hear it. Please read: [What should I do when someone answers my question?](https://stackoverflow.com/help/someone-answers) – agc Feb 11 '19 at 19:56