0

I am splitting a text file into blocks in order to extract those blocks which do not contain a certain line by using a regular expression. The text file looks like this:

[Term]  
id: id1  
name: name1  
xref: type1:aab  
xref: type2:cdc  

[Term]  
id: id2  
name: name2  
xref: type1:aba  
xref: type3:fee 

Someone helped me a few days ago by showing me how to extract those blocks which do contain a certain regular expression (for example "xref: type3"):

while (<MYFILE>) {
  BEGIN { $/ = q|| }
    my @lines = split /\n/;
    for my $line ( @lines ) {
        if ( $line =~ m/xref:\s*type3/ ) {
            printf NEWFILE qq|%s|, $_;
            last;
        }
    }
}

Now I want to write all blocks in a new file which do not contain "xref: type3". I tried to do this by simply negating the regex

if ( $line !~ m/xref:\s*type3/ )

or alternatively by negating the if statement by using

unless ( $line =~ m/xref:\s*type3/ )

Unfortunately it doesn't work - the output file is the same as the the original one. Any ideas what I'm doing wrong?

atreju
  • 965
  • 6
  • 15
  • 36

3 Answers3

3

You have:

For every line, print this block if this line doesn't match the pattern.

But you want:

For every line, print this line if none of the other lines in the block match the pattern.

As such, you can't start printing the block before you examined every line in the block (or at all lines until you find a matching line).

local $/ = q||;
while (<MYFILE>) {
    my @lines = split /\n/;

    my $skip = 0;
    for my $line ( @lines ) {
        if ( $line =~ m/^xref:\s*type3/ ) {
            $skip = 1; 
            last;
        }
    }

    if (!$skip) {
        for my $line ( @lines ) {
            print NEWFILE $line;
        }
    }
}

But there's no need to split into lines. We can check and print the whole block at once.

local $/ = q||;
while (<MYFILE>) {
    print NEWFILE $_ if !/^xref:\s*type3/m;
}

(Note the /m to make ^ match the start of any line.)

ikegami
  • 367,544
  • 15
  • 269
  • 518
1

Do not process the records line by line. Use a paragraph mode:

{   local $/ = q();
    while (<MYFILE>) {
        if (! /xref:\s*type3/ ) {
            printf NEWFILE qq|%s|, $_;
            last;
        }
}
choroba
  • 231,213
  • 25
  • 204
  • 289
  • Thanks for your help. I tried this solution but it only returned the very first block for me. Do I need to add a loop to process all the paragraphs? – atreju Jul 24 '13 at 13:40
  • @user2241303: If you want all the blocks, remove the `last` command. – choroba Jul 24 '13 at 13:40
1

The problem is that you are using unless with !~ which is interpreted as if $line does not NOT match do this. ( a double negative )

When using the unless block with the normal pattern matching operator =~ you code worked perfectly, that is I see the first block as output because it does not contain type3.

LOOP:
while (<$MYFILE>) {
  BEGIN { $/ = q|| }
    my @lines = split /\n/;
    for my $line ( @lines ) {
        unless ( $line =~ m/xref:\s*type3/ ) {
            printf qq|%s|, $_;
            last LOOP;
        }
  }
}

# prints
# [Term]
# id: id1
# name: name1
# xref: type1:aab
# xref: type2:cdc
Hunter McMillen
  • 59,865
  • 24
  • 119
  • 170
  • Thank you for you help, but all I get is an empty document although I copied the code and used exactly the same example. – atreju Jul 24 '13 at 13:21
  • I changed the printf statement to print to my console, add your `NEWFILE` handle back in to check or examine your terminal. – Hunter McMillen Jul 24 '13 at 13:22
  • I added my NEWFILE handle but the output file is still empty. I'm truly at loss what to do. `#!/usr/bin/perl open (MYFILE, 'inputfile'); open (NEWFILE, ">>", 'outputfile'); LOOP: while (<$MYFILE>) { BEGIN { $/ = q|| } my @lines = split /\n/; for my $line ( @lines ) { unless ( $line =~ m/xref:\s*type3/ ) { printf NEWFILE qq|%s|, $_; last LOOP; } } }` – atreju Jul 24 '13 at 13:34
  • take the `$` out before `MYFILE`, if you had `strict` and `warnings` on you would have seen the error. – Hunter McMillen Jul 24 '13 at 13:45