7

I'm trying to find a succinct shell one-liner that'll give me all the lines in a file up until some pattern.

The use case is dumping all the lines in a log file until I spot some marker indicating that the server has been restarted.

Here's a stupid shell-only way that:

tail_file_to_pattern() {
    pattern=$1
    file=$2

    tail -n$((1 + $(wc -l $file | cut -d' ' -f1) - $(grep -E -n "$pattern" $file | tail -n 1 | cut -d ':' -f1))) $file
}

A slightly more reliable Perl way that takes the file on stdin:

perl -we '
    push @lines => $_ while <STDIN>;
    my $pattern = $ARGV[0];
    END {
        my $last_match = 0;
        for (my $i = @lines; $i--;) {
            $last_match = $i and last if $lines[$i] =~ /$pattern/;
        }
        print @lines[$last_match..$#lines];
    }
'

And of course you could do that more efficiently be opening the file, seeking to the end and seeking back until you found a matching line.

It's easy to print everything as of the first occurrence, e.g.:

sed -n '/PATTERN/,$p'

But I haven't come up with a way to print everything as of the last occurance.

  • 1
    Your title says "all lines up until the last pattern" but your two example scripts print all lines from the last pattern to the end. I assume it's the title that's misleading? – John Zwinck Jan 22 '12 at 16:40
  • If the pattern will usually be present and near the end, you might want to consider [File::ReadBackwards](http://search.cpan.org/perldoc?File::ReadBackwards) (unshifting into a buffer until you reach the pattern or beginning of file). – ikegami Jan 23 '12 at 10:30

7 Answers7

6

Here's a sed-only solution. To print every line in $file starting with the last line that matches $pattern:

sed -e "H;/${pattern}/h" -e '$g;$!d' $file

Note that like your examples, this only works properly if the file contains the pattern. Otherwise, it outputs the entire file.

Here's a breakdown of what it does, with sed commands in brackets:

  • [H] Append every line to sed's "hold space" but do not echo it to stdout [d].
  • When we encounter the pattern, [h] throw away the hold space and start over with the matching line.
  • When we get to the end of the file, copy the hold space to the pattern space [g] so it will echo to stdout.

Also note that it's likely to get slow with very large files, since any single-pass solution will need to keep a bunch of lines in memory.

Rob Davis
  • 15,597
  • 5
  • 45
  • 49
4

Alternatively: tac "$file" | sed -n '/PATTERN/,$p' | tac

EDIT: If you don't have tac emulate it by defining

tac() {
    cat -n | sort -nr | cut -f2
}

Ugly but POSIX.

Jo So
  • 25,005
  • 6
  • 42
  • 59
  • I don't have a `tac` binary. Given that the OP didn't specify an operating system, it's probably best to offer solutions that will work across the board. – ghoti Jan 23 '12 at 06:25
  • You can use `tail -r` in place of `tac`. Although this solution isn't quite what (the body of) the question is asking for. For that, you'd need `sed -n "1,/${pattern}/p"`. – Rob Davis Jan 23 '12 at 06:43
  • 1
    @ghoti: Well, it seems you're not using GNU/coreutils. Apparently `tac` isn't POSIX. If you insist on POSIX, use `cat -n | sort -nr | cut -f2` instead of `tac` (Oh, we're getting ugly again!) – Jo So Jan 23 '12 at 15:16
  • @RobDavis: `tail -r` isn't POSIX either, and not available on my Debian System. For the second part: True, title does not match body question. But please give the whole line, which would be `tac | sed -n '1,/PATTERN/p' | tac` (or tac replacement) – Jo So Jan 23 '12 at 15:26
4

Load the data into an array line by line, and throw the array away when you find a pattern match. Print out whatever is left at the end.

 while (<>) {
     @x=() if /$pattern/;
     push @x, $_;
 }
 print @x;

As a one-liner:

 perl -ne '@x=() if /$pattern/;push @x,$_;END{print @x}' input-file
mob
  • 117,087
  • 18
  • 149
  • 283
3

I suggest a simplification of your shell script:

tail -n +$(grep -En "$pattern" "$file" | tail -1 | cut -d: -f1) "$file"

It's substantially more concise because it:

  • Uses tail's + option to print from the given line to the end, rather than having to calculate the distance from there to the end.
  • Uses more concise ways of expressing command line options.

And it fixes a bug by quoting $file (so it will work on files whose names contain spaces).

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
3

Sed's q command will do the trick:

sed "/$pattern/q" $file

That will print all the lines until it gets to the line with the pattern. After that, sed will print that last line and quit.

David W.
  • 105,218
  • 39
  • 216
  • 337
  • This does what the title and first line of the question suggest but not what the questioner actually wants, I think. He wants all the lines *after* and including the last line that matches a given pattern. – Rob Davis Jan 23 '12 at 06:23
  • @RobDavis - You're right. I read the first paragraph, and thought _"Hey, this is simple"_. I'll probably have to come up with something with Awk – David W. Jan 23 '12 at 15:51
1

This questions title and description don't match.

For the question's title, +1 for @David W.'s answer. Also:

sed -ne '1,/PATTERN/p'

For question in the body, you've already got some solutions.

Note that tac is probably specific to Linux. It doesn't seem to exist in BSD or OSX. If you want a solution that's multi-platform, don't rely on tac.

Of course, just about any solution is going to require that your data either be spooled in memory, or submitted once for analysis and a second time for processing. For exampel:

#!/usr/local/bin/bash

tmpfile="/tmp/`basename $0`,$$"
trap "rm $tmpfile" 0 1 2 5
cat > $tmpfile

n=`awk '/PATTERN/{n=NR}END{print NR-n+1}' $tmpfile`

tail -$n $tmpfile

Note that my use of tail is for FreeBSD. If you use Linux, you'll probably need tail -n $n $tmpfile instead.

ghoti
  • 45,319
  • 8
  • 65
  • 104
  • You can use `tail -r` on OSX to get the functionality of `tac`. – Mark Setchell Feb 10 '15 at 09:43
  • That's true, but it's also not multiplatform as the `-r` option does not exist in Linux. If I recommend against one, it would be hypocritical of me not to recommend against the other. :) – ghoti Feb 10 '15 at 13:47
  • I understand and agree entirely - I was merely pointing out, mainly for any readers in the future, that if they want to use `tac` on OS X, they can use `tail -r` instead... rather than leaving your statement saying it doesn't seem to exist. – Mark Setchell Feb 10 '15 at 13:55
1

Rob Davis pointed out to me what you said you wanted isn't what you really asked:

You said:

I'm trying to find a succinct shell one-liner that'll give me all the lines in a file up until some pattern.

but then at the very end of your post, you said:

But I haven't come up with a way to print everything as of the last occurance.

I've already gave you the answer for your first question. Here's a one line answer for your second question: Printing from a regular expression to the end of the file:

awk '{ if ($0 ~ /'"$pattern"'/) { flag = 1 } if (flag == 1) { print $0 } }' $file

A similar Perl one-liner:

export pattern="<regex>"
export file="<file>"
perl -ne '$flag=1 if /$ENV{pattern}/;print if $flag;' $file
Community
  • 1
  • 1
David W.
  • 105,218
  • 39
  • 216
  • 337
  • Except he wants the lines after the last occurrence of the pattern, I believe. – Rob Davis Jan 23 '12 at 16:36
  • @RobDavis - You're right. Your [solution](http://stackoverflow.com/a/8967705/368630) is the best. It's one line and platform independent. – David W. Jan 23 '12 at 16:45