2

I have a rather large text file where there is an extra space between every character;

I t   l o o k s   l i k e   t h i s .  

I'd like to remove those extra characters so

It looks like this. 

via the Linux terminal. I can't seem to find anyway to do this without removing all of the whitespaces. I'm willing to try any solution at this point. I'd appreciate any nudge in the right direction.

  • 1
    Are you sure they are spaces and not NUL characters? It sounds like you have something encoded using UTF-16, in which case the solution would be to change the encoding to UTF-8 or whatever. – ikegami Nov 30 '13 at 21:26

5 Answers5

5
$ echo 'I t   l o o k s   l i k e   t h i s .  ' | sed 's/\(.\) /\1/g'
It looks like this. 
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
3

Are you certain that the intermediate characters are spaces? It is most likely that this is a UTF-16 file.

I suggest you use a capable editor to open it as such and convert it to UTF-8.

Borodin
  • 126,100
  • 9
  • 70
  • 144
  • 1
    Borodin is right. try to type 'file your_file.txt' to get the encoding. you don't want to corrupt your data right? – Pierre Nov 30 '13 at 19:18
3

An awksolution

echo "I t   l o o k s   l i k e   t h i s ." | awk '{for (i=1;i<=NF;i+=2) printf $i;print ""}' FS=""
It looks like this.
Jotne
  • 40,548
  • 12
  • 51
  • 55
1

As long as it's every other character you want to get rid of, you can use python.

>>> s = "I t   l o o k s   l i k e   t h i s ."
>>> print s[0::2]
It looks like this.

If you wanted to do this for the text file, do the following:

with open("/path/to/file.txt") as f:
    f = f.readlines()

with open("/path/to/new.txt") as g:
    for i in f:
        g.write(str(i)[0::2]+"\n")
TW80000
  • 1,507
  • 1
  • 11
  • 18
1
perl -pe 's|(\s+)| " "x (length($1)>1) |ge' file
mpapec
  • 50,217
  • 8
  • 67
  • 127