1

I have one file inside that file it is present as given below

TEST_4002_sample11_1_20110531.TXT
TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample11_5_20110531.TXT
TEST_4002_sample11_6_20110531.TXT
TEST_4002_sample10_1_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT
TEST_4002_sample10_5_20110531.TXT

I want the output if the 4th filed of that file sequence is missing, then print previous file name and next file name as output.

TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT
clt60
  • 62,119
  • 17
  • 107
  • 194
gyrous
  • 189
  • 3
  • 6
  • 19
  • 1
    Hm. I'm really don't understand why you closed this question. It is real world programming question with concrete solution (as you can see in answers). Provided the example input and the wanted output. – clt60 Jun 14 '11 at 07:19

5 Answers5

1

This awk variant seems to produce the required output:

awk -F_ '$4>c+1{print p"\n"$0}{p=$0;c=$4}'
ripat
  • 3,076
  • 6
  • 26
  • 38
1

simple perl way:

perl -F_ -lane 'print "$o\n$_" if $F[3]-$n>1;$o=$_;$n=$F[3]' < file
clt60
  • 62,119
  • 17
  • 107
  • 194
0

In Perl you could do something like this:

use strict;
use warnings;

my $prev_line;
my $prev_val;

while(<>){
    # get the 4th value
    my $val = (split '_')[3];

    # skip if invalid line
    next if !defined $val;

    # print if missed sequence
    if(defined($prev_val) && $val > $prev_val + 1){
        print $prev_line . $_;
    }

    # save for next iteration
    $prev_line = $_;
    $prev_val = $val;
}

Save that in foo.pl and run it with something like:

cat file.txt | perl foo.pl

I'm sure it can be shortened quite a lot. Could use something like this if all lines are valid:

perl -n -e '$v=(/[^_]/g)[3];print"$l$_"if$l&&$v>$p+1;$p=$v;$l=$_' file.txt

or

perl -naF_ -e '$v=$F[3];print"$l$_"if$l&&$v>$p+1;$p=$v;$l=$_' file.txt
Qtax
  • 33,241
  • 9
  • 83
  • 121
0

As far as I understand what you need, here is a Perl script that do the job:

#!/usr/local/bin/perl 
use strict;
use warnings;

my $prev = '';
my %seq1;
while(<DATA>) {
    chomp;
    my ($seq1, $seq2) = $_ =~ /^.*?(\d+)_(\d+)_\d+\.TXT$/;
    $seq1{$seq1} = $seq2 - 1 unless exists $seq1{$seq1};
    if ($seq1{$seq1}+1 != $seq2) {
        print $prev,"\n",$_,"\n";
    }
    $prev = $_;
    $seq1{$seq1} = $seq2;
}


__DATA__
TEST_4002_sample11_1_20110531.TXT
TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample11_5_20110531.TXT
TEST_4002_sample11_6_20110531.TXT
TEST_4002_sample10_1_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT
TEST_4002_sample10_5_20110531.TXT

output:

TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT
Toto
  • 89,455
  • 62
  • 89
  • 125
0

I used glob to get the files (it's possible that it's as simple as <TEST_*.TXT>).

use strict;
use warnings;

my %last = ( name => '', group => '', seq => 0 );

foreach my $file ( sort glob('TEST_[0-9][0-9][0-9][0-9]_sample[0-9][0-9]_[0-9]_*.TXT')
    ) {
    my ( $group, $seq ) = $file =~ m/(\d{4,}_sample\d+)_(\d+)/;
    if ( $group eq $last{group} && $seq - $last{seq} > 1 ) { 
        print join( "\n", $last{name}, $file, '' );
    }
    @last{ qw<name group seq> } = ( $file, $group, $seq );
}
Axeman
  • 29,660
  • 2
  • 47
  • 102