3

I want to remove newlines in some special cases. I have this text:

0 
15.239 
23.917 
 Reprenem el debat que avui els oferim entorn de les perspectives d'aquest dos mil set. <ehh> Estavem parlant concretament dels temes 
30.027 
de la seguretat mundial 
 una miqueta 
de la intervencio
33.519 
que 

And I want to replace newlines between a number and some text as so:

0 
15.239 
23.917 Reprenem el debat que avui els oferim entorn de les perspectives d'aquest dos mil set. <ehh> Estavem parlant concretament dels temes 
30.027 de la seguretat mundial una miqueta de la intervencio
33.519 que

I want only to erase the new lines between numbers and a sentence.

Can anyone help me?

alexwlchan
  • 5,699
  • 7
  • 38
  • 49
Sergi
  • 417
  • 6
  • 18

2 Answers2

2

I'll go by your example output, which also erases newlines in the middle of a sentence. You can use this:

sed ':a $!{N;ba}; s/\n\([^0-9]\)/\1/g' filename

That is

:a $!{N;ba}          # assemble the whole file in the pattern space
s/\n\([^0-9]\)/\1/g  # remove newlines that are not directly before a number

To handle DOS linebreaks, you can use

#                   vvvv-- new stuff here
sed ':a $!{N;ba}; s/\r\?\n\([^0-9]\)/\1/g' filename

that will optionally match a \r before a \n and remove the whole \r\n in case it finds a DOS linebreak.

Or you can use dos2unix.

Wintermute
  • 42,983
  • 5
  • 77
  • 80
  • Thank you @Wintermute but still can't remove the new lines – Sergi Feb 13 '15 at 11:35
  • Equivalent in awk `awk -vRS= '$0=gensub("\n([^0-9])","\\1","g")' filename` –  Feb 13 '15 at 11:37
  • @JID in GNU awk, that is. – Tom Fenech Feb 13 '15 at 11:43
  • @Sergi: Can you elaborate? It works for me with your sample. What is wrong with the output you get? – Wintermute Feb 13 '15 at 11:44
  • I obtain the same output before to apply your new command, everything separated by a newline – Sergi Feb 13 '15 at 11:48
  • I cannot reproduce that; I get the changed file printed to stdout. Do you mean that the file itself doesn't change? If you want to change it in place, use `sed -i` (or better `sed -i.bak` to have a backup in case you don't like the result). – Wintermute Feb 13 '15 at 11:52
  • on my aix (posix version) i need to add several newline instead of space and `;`for working, maybe the similar problem if not in GNU sed `sed ':a^J$!{N;ba^J}^Js/\n\([^0-9]\)/\1/g' YourFile` (where ^J are real new line) – NeronLeVelu Feb 13 '15 at 11:58
  • @Wintermute note the OP mentions some "^M" problems in the other answer. That must be why your solution doesn't work to him. It does to me and I like the explanation, so +1! – fedorqui Feb 13 '15 at 12:08
  • 1
    I fixed my problem, thanks to @fedorqui . I had a format problem and "dos2unix" helped me a lot – Sergi Feb 13 '15 at 12:17
  • Oh, DOS linebreaks. I'll add something for that. – Wintermute Feb 13 '15 at 12:18
2

An awk:

awk '/^[0-9]+\.[0-9]+/{printf "\n"}{printf $0}' filename

For handling DOS line breaks:

awk '{sub(/\r$/,"")}/^[0-9]+\.[0-9]+/{printf "\n"}{printf $0}' filename

Demo:

$ awk '{sub(/\r$/,"")}/^[0-9]+\.[0-9]+/{printf "\n"}{printf $0}' filename                        

0 
15.239 
23.917  Reprenem el debat que avui els oferim entorn de les perspectives d'aquest dos mil set. <ehh> Estavem parlant concretament dels temes 
30.027 de la seguretat mundial  una miqueta de la intervencio
33.519 que que

Explained code:

  • {sub(/\r$/,"")} : Delete DOS linebreaks.

  • /^[0-9\.]+/{printf "\n"}: When the line begins with a number/dot combination, print just a carriage return an continue with record processing.

  • {printf $0} : For the remain record or the ones not started by numbers just prints $0 without line breaks.

  • At the end , placing the carriage return just before the numbers and ignoring the rest makes the trick.

Juan Diego Godoy Robles
  • 14,447
  • 2
  • 38
  • 52
  • 1
    I'm sure that the OP (and any subsequent viewers of this question) would appreciate some explanation. – Tom Fenech Feb 13 '15 at 11:37
  • Thanks @klashxx but it does not work, I have the same output as before – Sergi Feb 13 '15 at 11:45
  • This is my output using vi,( I don not understand why I get newlines while you don't): 0 ^M15.239 ^M23.917 ^M Reprenem el debat que avui els oferim entorn de les perspectives d'aquest dos mil set. <ehh> Estàvem parlant concretament dels temes ^M30.027 ^Mde la seguretat mundial ^M una miqueta ^Mde la intervenció ^M33.519 ^Mque ^Mque – Sergi Feb 13 '15 at 12:01
  • @Sergi you have a DOS encoding and it is breaking everything. You need to clean it by running `dos2unix`. – fedorqui Feb 13 '15 at 12:02
  • klashxx, note `printf $0` can also be written as `print ""`. – fedorqui Feb 13 '15 at 12:10
  • I have updatep my answer for ``DOS`` linebreaks @Sergi – Juan Diego Godoy Robles Feb 13 '15 at 12:15
  • Hi again @klashxx . Regarding the question of the past 2 days, is it possible to solve this little issue? 9.545 hello . 10.544 bye I want that dot goes up and not create a new line, I'm struggling with "awk" but it's impossible for me right now – Sergi Feb 16 '15 at 11:50