1

was trying to remove the erroneous newline characters generated from Windows.

$cat -e file.xml
foo^M$
bar$
$
hello world1$
hello world2$

where there should be "foobar" without any newlines in between while all the newlines should be retained. I know within emacs we could do replacement of "^M^J" with 'RET', but I have a huge file that I don't want to open it but only wanted to use command line to convert it.

I tried dos2unix but it only removed the "^M" part, still rendering a broken word/sentence. Also tried tr -d '\r' and sed 's:^M$::g' or sed 's:^M$\n:\n:g', all didn't work. Anyone has an idea how to do it correctly?

galactica
  • 1,753
  • 2
  • 26
  • 36

3 Answers3

1

I have replicated your example file as:

$ cat -e so.txt
foo^M$
bar$
line2$
line3$

You can use Perl in 'gulp' mode to do:

$ perl -0777 -pe 's/\r\n//g' so.txt
foobar
line2
line3

The problem with using most line oriented approaches is the \r\n is read as a line.


You can do:

$ perl -pe 's/\r\n//' /tmp/so.txt
foobar
line2
line3

as well...

dawg
  • 98,345
  • 23
  • 131
  • 206
  • I assume this -0777 is to trigger the 'gulp' mode? If so, we shouldn't be doing that because I mentioned this is a huge file I need to process (>5GB). Anyway, your suggestion works but i'm a bit surprised that there is not command line tool that can handle this neatly. – galactica Jun 14 '16 at 03:09
  • Perl and you OS will handle a huge file intelligently even if it is a lot larger than the host memory. Try it. – dawg Jun 14 '16 at 03:11
  • i did and it gave me a segmentation fault error if adding -0777 on CentOS 6, that's why I commented above – galactica Jun 14 '16 at 15:55
1

Using awk:

$ cat -e so.txt
foo^M$
bar$
line2$
line3$

$ awk 1 RS=$'\r\n' ORS= so.txt
foobar
line2
line3

$ awk 1 RS=$'\r\n' ORS= so.txt | cat -e # Just for verification
foobar$
line2$
line3$

It sets the record separator to \r\n & prints the records with ORS=<empty string>

anishsane
  • 20,270
  • 5
  • 40
  • 73
0

Perhaps the following will work

sed -e 's/[\n\r]//g' old_file.txt > new_file.txt

will work

Ed Heal
  • 59,252
  • 17
  • 87
  • 127