How to remove leading and trailing whitespaces?

Question

I'm using awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt > out.txt to remove both leading and trailing whitespaces.

The problem is the output file actually has trailing whitespaces! All lines are of the same length - they are right padded with spaces.

What am I missing?

UPDATE 1

The problem is probably due to the the fact that the trailing spaces are nor "normal" spaces but \x20 characters (DC4).

UPDATE 2

I used gsub (/'[[:cntrl:]]|[[:space:]]|\x20/,"") an it worked. Two strange things:

Why isn't \x20 considered a control character?
Using '[[:cntrl:][:space:]\x20 does NOT work. Why?

UPDATE: perhaps these are not simple spaces, but DC4 control characters? The files originated from Windows. — user1194552, Feb 07 '12 at 12:08
`\x20` is a regular ASCII space. Control characters are `\x00` through `\x1F`. — tripleee, Feb 07 '12 at 14:30

score 26 · Answer 1 · answered Feb 07 '12 at 14:17

26

This command works for me:

$ awk '{$1=$1}1' file.txt

answered Feb 07 '12 at 14:17

kev

155,172
47
273
272

+1 Yes, why not? ;-) You can even do: `awk '$1=$1' file.txt` isn't it? – oHo Feb 07 '12 at 14:21
@oHessling `$1=$1` will delete empty line – kev Feb 07 '12 at 14:27
3

@eddi. `awk` will normalize a line by removing extra spaces. `$1=$1` trigger the action, otherwise nothing will happen. – kev Aug 12 '15 at 16:49
1

I think you should add that to the answer together with an explanation of what the 1 does. – eddi Aug 12 '15 at 17:02
2

@eddi The `1` is the same as `{print}`. It will print every line. – kev Aug 12 '15 at 17:09
1

@kev : it does not work with GNU Awk 3.1.7 on CentOS 6.5 with ksh: `echo "foo;bar ">tt && print "_$( awk -F";" -OFS";" '{$2=$2}1' tt)_"` gives `_foo;bar _`. Did I miss something ? What is your setup, btw ? – Mat M Feb 17 '16 at 14:46
So `awk '{$2=$2}1'` is equivalent to `awk '{$2=$2; print $0}'`. For some awks; this doesn't strip whitespaces on all awks (e.g.Mac system awk). Also, you need to explain the quirky idiom in your answer. – smci Nov 13 '16 at 18:02

oHo · Answer 2 · 2012-02-07T13:06:41.223

Your code is OK for me.
You may have something else than space and tabulation...
hexdump -C may help you to check what is wrong:

awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt | hexdump -C | less

UPDATE:

OK you identified DC4 (there may be some other control characters...)
Then, you can improve your command:

awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' in.txt > out.txt

See awk manpage:

[:alnum:] Alphanumeric characters.
[:alpha:] Alphabetic characters.
[:blank:] Space or tab characters.
[:cntrl:] Control characters.
[:digit:] Numeric characters.
[:graph:] Characters that are both printable and visible. (A space is printable, but not visible, while an a is both.)
[:lower:] Lower-case alphabetic characters.
[:print:] Printable characters (characters that are not control characters.)
[:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters).
[:space:] Space characters (such as space, tab, and formfeed, to name a few).
[:upper:] Upper-case alphabetic characters.
[:xdigit:] Characters that are hexadecimal digits.

Leading/Trailing `0x20` removal

For me the command is OK, I have tested like this:

$ echo -e "\x20 \tTEXT\x20 \t" | hexdump -C
00000000  20 20 09 54 45 58 54 20  20 09 0a                 |  .TEXT  ..|
0000000b
$ echo -e "\x20 \tTEXT\x20 \t" | awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' | hexdump -C
00000000  54 45 58 54 0a                                    |TEXT.|
00000005

However if you have 0x20 in the middle of your text
=> then it is not removed.
But this is not your question, isn't it?

I really thought this would work, but it didn't, I'm still left with all these "spaces" ASCII code 20 (int = 32). — user1194552, Feb 07 '12 at 12:32
Hello @user1194552. Please provide your `hexdump -C` output before and after `awk` processing. Then I can better understand your issue. Because when I try to test, it looks good for me :-) — oHo, Feb 07 '12 at 13:08
What is your `awk --version`? I can test two versions: `GNU Awk 3.1.3` and `GNU Awk 3.1.5`. And please provide your `hexdump -C`. Then I can test the same thing as you. — oHo, Feb 07 '12 at 13:53

score 1 · Answer 3 · answered Feb 07 '12 at 12:14

Your files probably have Windows line endings. That means that they end with \r\n, so matching a sequence of tabs and spaces at the end of the line won't work -- awk tries to match all the tabs and spaces that come after the \r. Try running the file through tr -d "\r" before sending it to awk.

score 0 · Answer 4 · answered Oct 06 '15 at 23:57

Perl could be used:

perl -lpe 's/^\s*(.*\S)\s*$/$1/' in.txt > out.txt

s/foo/bar/ substitute using regular expressions
^ beginning of string
\s* zero or more spaces
(.*\S) any characters ending with a non-whitespace. Capture it into $1
\s* zero or more spaces
$ end of string

How to remove leading and trailing whitespaces?

4 Answers4

UPDATE:

Leading/Trailing `0x20` removal

Linked

How to remove leading and trailing whitespaces?

4 Answers4

UPDATE:

Leading/Trailing 0x20 removal

Linked

Leading/Trailing `0x20` removal