How to group strings with whitespace, using sed?

Question

Suppose my text file with the following strings:

Apple foo foobar
Banana foo foobar1 abc b c
Orange barfoo
Pear foo

How do I group the strings that comes after Apple, Banana, Orange, and Pear?

I could do this for Apple, but this wouldn't work for the rest of the text files.

sed 's/\([^ ]*\) \([^ ]*\) \([^ ]*\)/\2 \3/'

I want the output to look like this:

foo foobar
foo foobar1 abc b c
barfoo
foo

Is there a general case where I can print these strings after the first whitespace?

mklement0 · Accepted Answer · 2014-02-06T14:26:56.580

3

sed -r 's/^[^ ]+[ ]+//' in.txt

(GNU sed; on OSX, use -E instead of -r).

Update:

As @Jotne points out, the initial ^ is not strictly needed in this case - though it makes the intent clearer; similarly, you can drop the [] around the second space char.

The above only deals with spaces separating the columns (potentially multiple ones, thanks to the final + in the regex), whereas the OP more generally mentions whitespace.

Generalized whitespace version:

Note: In the forms below, \s and [:space:] match all kinds of whitespace, including newlines. If you wanted to restrict matching to spaces and tabs, use [ \t] or [:blank:].

sed -r 's/^\S+\s+//' in.txt

(GNU sed; this form will not work on OSX, even with -E.)

POSIX-compliant version (e.g., for AIX - thanks, @NeronLeVelu):

sed  's/^[^[:space:]]\{1,\}[[:space:]]\{1,\}//' in.txt

edited Feb 06 '14 at 14:26

answered Feb 06 '14 at 01:56

mklement0

382,024
64
607
775

1

In OPs example this can be shorten to: `sed -r 's/[^ ]+ //' in.txt`. Since `g`global is not specified, it will always take first column, until first space is found. And no need for brackets around one single characters, so `[ ]` is the same as one space without brackets. – Jotne Feb 06 '14 at 06:47
1

on posix compliant use "[:blank:]" instead of " " and "\{1,\}" instead of "+" (on AIX by example) – NeronLeVelu Feb 06 '14 at 07:00
@Jotne: Good points; I wanted to be explicit about matching from the start and the space char. - brackets def. not needed, though. The extra `+` at the end was to cover the case where _multiple_ spaces separate the columns (not a requirement of the OP). – mklement0 Feb 06 '14 at 13:35
@NeronLeVelu: Good point about using `\{1,\}` instead of `+` for POSIX compatibility, but `' '` (unquoted) or `[ ]` work just fine - no need for `[:blank:]`, unless `\t` chars should be matched, too - which, now that I think about it, is probably a good idea :) – mklement0 Feb 06 '14 at 13:49

score 1 · Answer 2 · answered Feb 06 '14 at 01:55

1

Any reason it has to be sed?

$ cat <<EOF | cut -d ' ' -f 2-
Apple foo foobar
Banana foo foobar1 abc b c
Orange barfoo
Pear foo
EOF

foo foobar
foo foobar1 abc b c
barfoo
foo

answered Feb 06 '14 at 01:55

phs

10,687
4
58
84

+1. Probably the simplest in the case at hand. (One thing to note about `cut`: if there were _multiple_ space chars. between the fields, `cut` would treat them as separating empty fields.) – mklement0 Feb 06 '14 at 04:06

score 1 · Answer 3 · answered Feb 06 '14 at 02:04

1

GNU grep works too

grep -oP '(?<=\s).*'

answered Feb 06 '14 at 02:04

glenn jackman

238,783
38
220
352

1

+1. Or (as I've learned about an hour ago :)): `grep -oP '\s\K.*'`. (The latter has the advantage that it's easier to generalize to deal with multiple whitespace chars. - `grep -oP '\s+\K.*'` - which apparently won't work with look-behind assertions, because they must describe a fixed-length string.) – mklement0 Feb 06 '14 at 03:56

score 0 · Answer 4 · answered Feb 06 '14 at 02:07

0

Not sure about sed.

But you can just remove the unwanted part of each line using the multiline modifier:

/^\w+\s/gm

answered Feb 06 '14 at 02:07

cvsguimaraes

12,910
9
49
73

Jotne · Answer 5 · 2014-02-06T06:48:37.413

0

This can also be solved by awk

awk '{$1="";sub(/^ /,x)}1' file
foo foobar
foo foobar1 abc b c
barfoo
foo

or with this:

awk '{sub(/[^ ]+ /,x)}1' file

edited Feb 06 '14 at 06:48

answered Feb 06 '14 at 06:43

Jotne

40,548
12
51
55

How to group strings with whitespace, using sed?

5 Answers5