When comparing two files, how do I skip (ignore) blank lines?

Question

I'm comparing line against line of two text files, ref.txt (reference) and log.txt. But there may be an arbitrary number of blank lines in either file that I'd like to ignore; how can I accomplish this?

ref.txt

one

two


three



end

log.txt

one
two
three
end

There would be no incorrect log lines in the output, in other words log.txt matches with ref.txt.

What I like to accomplish in pseudo code:

while (traversing both files at same time) {
    if ($l is blank line || $r is blank line) {
        if ($l is blank line)
            skip to next non-blank line
        if ($r is blank line)
            skip to next non-blank line
    }
    #continue with line by line comparison...
}

My current code:

use strict;
use warnings;

my $logPath    = ${ARGV [0]};
my $refLogPath = ${ARGV [1]} my $r;    #ref log line
my $l;                                 #log line

open INLOG, $logPath    or die $!;
open INREF, $refLogPath or die $!;

while (defined($l = <INLOG>) and defined($r = <INREF>)) {
    #code for skipping blank lines?
    if ($l ne $r) {
        print $l, "\n";                #Output incorrect line in log file
        $boolRef = 0;                  #false==0
    }
}

score 8 · Answer 1 · answered Jul 19 '12 at 16:48

8

If you are on a Linux platform, use :

diff -B ref.txt log.txt

The -B option causes changes that just insert or delete blank lines to be ignored

answered Jul 19 '12 at 16:48

JRFerguson

7,426
2
32
36

squiguy · Answer 2 · 2012-07-19T16:51:55.980

2

You can skip blank lines by comparing it to this regular expression:

next if $line =~ /^\s*$/

This will match any white space or newline characters which can potentially make up a blank line.

edited Jul 19 '12 at 16:51

answered Jul 19 '12 at 16:38

squiguy

32,370
6
56
63

It seems more understandable (to me, at least) to write that as `next unless $line =~ /\S/`. – Dave Cross Jul 20 '12 at 10:34
@DaveCross I suppose that your version ensures that there is something on the line read. There is always more the one way to do it int Perl! – squiguy Jul 20 '12 at 13:05
Yeah. I switched to my approach after dealing with one too many files where the "empty" lines actually contained spaces and/or tabs. – Dave Cross Jul 20 '12 at 13:11

kevlar1818 · Answer 3 · 2012-07-20T16:19:54.460

2

This way seems the most "perl-like" to me. No fancy loops or anything, just slurp the files and grep out the blank lines.

use warnings;

$f1 = "path/file/1";
$f2 = "path/file/2";

open(IN1, "<$f1") or die "Cannot open file: $f1 ($!)\n";
open(IN2, "<$f2") or die "Cannot open file: $f2 ($!)\n";

chomp(@lines1 = <IN1>); # slurp the files
chomp(@lines2 = <IN2>);

@l1 = grep(!/^\s*$/,@lines1); # get the files without empty lines
@l2 = grep(!/^\s*$/,@lines2);

# something like this to print the non-matching lines
for $i (0 .. $#l1) {
   print "[$f1 $i]: $l1[$i]\n[$f2 $i]: $l2[$i]\n" if($l1[$i] ne $l2[$i]);
}

edited Jul 20 '12 at 16:19

answered Jul 19 '12 at 17:06

kevlar1818

3,055
6
29
43

Perhaps rewrite those greps as `@l1 = grep(/\S/, @lines1)` etc. – Dave Cross Jul 20 '12 at 10:35
How do I retrieve individual lines from the @l1 and @l2? – jerryh91 Jul 20 '12 at 15:47
This isn't perfect, as one mismatched line will make all the ones below it be mismatches too. I thought I'd share this as an exploration of perl's file slurping/grepping ability. Definitely just use `diff -B` if you can. – kevlar1818 Jul 20 '12 at 16:21

score 0 · Answer 4 · answered Jul 19 '12 at 16:42

You can loop to find each line, each time:

while(1) {
    while(defined($l = <INLOG>) and $l eq "") {}
    while(defined($r = <INREF>) and $r eq "") {}

    if(!defined($l) or !defined($r)) {
        break;
    }

    if($l ne $r) {
        print $l, "\n";
        $boolRef = 0;
    }
}

score 0 · Answer 5 · answered Jul 19 '12 at 16:46

0

man diff

diff -B ref.txt log.txt

answered Jul 19 '12 at 16:46

toolic

57,801
17
75
117

score 0 · Answer 6 · answered Jul 19 '12 at 16:47

0

# line skipping code
while (defined($l=<INLOG>) && $l =~ /^$/ ) {}  # no-op loop exits with $l that has length

while (defined($r=<INREF>) && $r =~ /^$/ ) {}  # no-op loop exits with $r that has length

answered Jul 19 '12 at 16:47

marklark

860
1
8
18

When comparing two files, how do I skip (ignore) blank lines?

6 Answers6