0

Suppose my text file with the following strings:

Apple foo foobar
Banana foo foobar1 abc b c
Orange barfoo
Pear foo

How do I group the strings that comes after Apple, Banana, Orange, and Pear?

I could do this for Apple, but this wouldn't work for the rest of the text files.

sed 's/\([^ ]*\) \([^ ]*\) \([^ ]*\)/\2 \3/'

I want the output to look like this:

foo foobar
foo foobar1 abc b c
barfoo
foo

Is there a general case where I can print these strings after the first whitespace?

UberNate
  • 2,109
  • 2
  • 15
  • 15

5 Answers5

3
sed -r 's/^[^ ]+[ ]+//' in.txt

(GNU sed; on OSX, use -E instead of -r).


Update:

As @Jotne points out, the initial ^ is not strictly needed in this case - though it makes the intent clearer; similarly, you can drop the [] around the second space char.

The above only deals with spaces separating the columns (potentially multiple ones, thanks to the final + in the regex), whereas the OP more generally mentions whitespace.

Generalized whitespace version:

Note: In the forms below, \s and [:space:] match all kinds of whitespace, including newlines. If you wanted to restrict matching to spaces and tabs, use [ \t] or [:blank:].

sed -r 's/^\S+\s+//' in.txt

(GNU sed; this form will not work on OSX, even with -E.)

POSIX-compliant version (e.g., for AIX - thanks, @NeronLeVelu):

sed  's/^[^[:space:]]\{1,\}[[:space:]]\{1,\}//' in.txt
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    In OPs example this can be shorten to: `sed -r 's/[^ ]+ //' in.txt`. Since `g`global is not specified, it will always take first column, until first space is found. And no need for brackets around one single characters, so `[ ]` is the same as one space without brackets. – Jotne Feb 06 '14 at 06:47
  • 1
    on posix compliant use "[:blank:]" instead of " " and "\{1,\}" instead of "+" (on AIX by example) – NeronLeVelu Feb 06 '14 at 07:00
  • @Jotne: Good points; I wanted to be explicit about matching from the start and the space char. - brackets def. not needed, though. The extra `+` at the end was to cover the case where _multiple_ spaces separate the columns (not a requirement of the OP). – mklement0 Feb 06 '14 at 13:35
  • @NeronLeVelu: Good point about using `\{1,\}` instead of `+` for POSIX compatibility, but `' '` (unquoted) or `[ ]` work just fine - no need for `[:blank:]`, unless `\t` chars should be matched, too - which, now that I think about it, is probably a good idea :) – mklement0 Feb 06 '14 at 13:49
1

Any reason it has to be sed?

$ cat <<EOF | cut -d ' ' -f 2-
Apple foo foobar
Banana foo foobar1 abc b c
Orange barfoo
Pear foo
EOF

foo foobar
foo foobar1 abc b c
barfoo
foo
phs
  • 10,687
  • 4
  • 58
  • 84
  • +1. Probably the simplest in the case at hand. (One thing to note about `cut`: if there were _multiple_ space chars. between the fields, `cut` would treat them as separating empty fields.) – mklement0 Feb 06 '14 at 04:06
1

GNU grep works too

grep -oP '(?<=\s).*'
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • 1
    +1. Or (as I've learned about an hour ago :)): `grep -oP '\s\K.*'`. (The latter has the advantage that it's easier to generalize to deal with multiple whitespace chars. - `grep -oP '\s+\K.*'` - which apparently won't work with look-behind assertions, because they must describe a fixed-length string.) – mklement0 Feb 06 '14 at 03:56
0

Not sure about sed.

But you can just remove the unwanted part of each line using the multiline modifier:

/^\w+\s/gm
cvsguimaraes
  • 12,910
  • 9
  • 49
  • 73
0

This can also be solved by awk

awk '{$1="";sub(/^ /,x)}1' file
foo foobar
foo foobar1 abc b c
barfoo
foo

or with this:

awk '{sub(/[^ ]+ /,x)}1' file
Jotne
  • 40,548
  • 12
  • 51
  • 55