1

I have a text file that looks like the following:

Line 1
Line 2
Line 3
Line 4
Line 5
filename2.tif;Smpl/Pix & Bits/Smpl are missing.

There are 5 lines that are always the same, and on the 6th line is where I want to start reading data. Upon reading data, each line (starting from line 6) is delimited by semicolons. I need to just get the first entry of each line (starting on line 6).

For example:

Line 1
Line 2
Line 3
Line 4
Line 5
filename2.tif;Smpl/Pix & Bits/Smpl are missing.
filename4.tif;Smpl/Pix & Bits/Smpl are missing.
filename6.tif;Smpl/Pix & Bits/Smpl are missing.
filename8.tif;Smpl/Pix & Bits/Smpl are missing.  

Output desired would be:

filename2.tif
filename4.tif
filename6.tif
filename8.tif

Is this possible, and if so, where do I begin?

drewrockshard
  • 2,043
  • 10
  • 35
  • 47
  • 1
    It's possible. Do you have any code yet? – aschepler Nov 24 '10 at 23:34
  • Yes and no. Not for this yet - but I have 300+ lines of code that I'm trying to implement this into. It's basically a new feature I'm trying to implement to process files from a text file that already exists. – drewrockshard Nov 24 '10 at 23:37
  • The answer to   ̲a̲l̲l̲  questions beginning, *“In Perl, ¿can I do …?”* is **“¡Yes!”** However, the answer to some of these continues with **“Yes, but ….”** – tchrist Nov 24 '10 at 23:55

3 Answers3

4

This uses the Perl 'autosplit' (or 'awk') mode:

perl -n -F'/;/' -a -e 'next if $. <= 5; print "$F[0]\n";' < data.file

See 'perlrun' and 'perlvar'.


If you need to do this in a function which is given a file handle and a number of lines to skip, then you won't be using the Perl 'autosplit' mode.

sub skip_N_lines_read_column_1
{
    my($fh, $N) = @_;
    my $i = 0;
    my @files = ();
    while (my $line = <$fh>)
    {
        next if $i++ < $N;
        my($file) = split /;/, $line;
        push @files, $file;
    }
    return @files;
}

This initializes a loop, reads lines, skipping the first N of them, then splitting the line and capturing the first result only. That line with my($file) = split... is subtle; the parentheses mean that the split has a list context, so it generates a list of values (rather than a count of values) and assigns the first to the variable. If the parentheses were omitted, you would be providing a scalar context to a list operator, so you'd get the number of fields in the split output assigned to $file - not what you needed. The file name is appended to the end of the array, and the array is returned. Since the code did not open the file handle, it does not close it. An alternative interface would pass the file name (instead of an open file handle) into the function. You'd then open and close the file in the function, worrying about error handling.

And if you need the help with opening the file, etc, then:

use Carp;

sub open_skip_read
{
    my($name) = @_;
    open my $fh, '<', $name or croak "Failed to open file $name ($!)";
    my @list = skip_N_lines_read_column_1($fh, 5);
    close $fh or croak "Failed to close file $name ($!)";
    return @list;
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • +1. Best answer, really. I have to rescue the perl part of my brain :) – Diego Sevilla Nov 25 '10 at 00:09
  • How would I write this in a script, and not in a command line, and read in from an open file handle of a file that already exists? – drewrockshard Nov 25 '10 at 00:13
  • @Jonathan: It scares me that perl knows to gobble your slash delimiters around the separator. I didn’t know it did that! – tchrist Nov 25 '10 at 00:40
  • @tchrist: See perl run...no, I mean, see 'perlrun'... '-Fpattern specifies the pattern to split on if -a is also in effect. The pattern may be surrounded by // , "" , or '' , otherwise it will be put in single quotes. You can't use literal whitespace in the pattern.' I didn't really need the slashes in this example. – Jonathan Leffler Nov 25 '10 at 00:56
  • I'm getting closer. I changed the last part of the open_skip_read subroutine to `return $list[0];` and now it outputs the filename. The problem is that only one filename is returned, but there's more results. I need it to return every match it comes against. – drewrockshard Nov 25 '10 at 02:43
  • Figured it out; ran it through a foreach loop and that did the trick. I really appreciate your help on this one - this was definitely the better answer. – drewrockshard Nov 25 '10 at 02:52
2

Kinda ugly but, read out the dummy lines and then split on ; for the rest of them.

my $logfile = '/path/to/logfile.txt';

open(FILE, $logfile) || die "Couldn't open $logfile: $!\n";

for (my $i = 0 ; $i < 5 ; $i++) {
   my $dummy = <FILE>;
}

while (<FILE>) {
   my (@fields) = split /;/;
   print $fields[0], "\n";
}

close(FILE);
dmah
  • 236
  • 1
  • 3
  • This can be written something like: `my @dummy; @dummy[0..4] = ;map {($a) = split /;/;print $a,"\n"} ;`. – Diego Sevilla Nov 24 '10 at 23:59
  • What if I'm trying to read in from a file, but not from the command line. I have a file that's in a relative location (so I can call it by logfile.txt, for example). I'm having troubles reading it in, so far, my code is shooting a while loop continuously and I have to CTRL+C out of it. – drewrockshard Nov 25 '10 at 00:01
  • 1
    @Diego: *Por desgracia,* that won’t work because you just supplied list context to the `readline` operator in your slice assignment, thereby exhausting input. The remaining lines were discarded. – tchrist Nov 25 '10 at 00:36
  • @drewrockshard I've edited the answer to open your file logfile.txt. – dmah Nov 25 '10 at 02:38
  • Thanks! I'll also give this one a go - I originally liked this one the best as it seemed the simplest of them all. – drewrockshard Nov 25 '10 at 05:03
  • @tchrist, you're right! Well, I just wanted to give a more functional approach... Seems that I have to study more the functional possibilities of Perl :) – Diego Sevilla Nov 25 '10 at 09:44
2
#!/usr/bin/env perl
#
# name_of_program - what the program does as brief one-liner
#
# Your Name <your_email@your_host.TLA>
# Date program written/released
#################################################################

use 5.10.0;

use utf8;
use strict;
use autodie;
use warnings FATAL => "all";

#  ⚠ change to agree with your input: ↓
use open ":std" => IN    => ":encoding(ISO-8859-1)",
                   OUT   => ":utf8";
#  ⚠ change for your output: ↑ — *maybe*, but leaving as UTF-8 is sometimes better

END {close STDOUT}

our $VERSION = 1.0;

$| = 1;

if (@ARGV == 0 && -t STDIN) {
   warn "reading stdin from keyboard for want of file args or pipe";
}

while (<>) {
    next if 1 .. 5;
    my $initial_field = /^([^;]+)/ ? $1 : next;
    #    ╔═══════════════════════════╗
    #   ☞ your processing goes here ☜
    #    ╚═══════════════════════════╝
} continue {
    close ARGV if eof;
}

__END__
tchrist
  • 78,834
  • 30
  • 123
  • 180
  • Guys, I'm still lost - ** new to Perl **. I already have a file that contains everything. I just need to open the file in my script, skip the first 5 lines, and output the first column of every line after the 5th line. – drewrockshard Nov 25 '10 at 00:41
  • @drewsrockhard: That’s what my program does. Try it out. – tchrist Nov 25 '10 at 00:42
  • Can you show an example on how to run this and where to place your "input file"? – drewrockshard Nov 25 '10 at 00:53
  • @drew: You run it like any other script. And you place your input file wherever you please; I don’t know its name. `perl this_program your_input_file` or `perl this_program < your_input_file` or `cat your_input_file | perl this_program` or `gzcat your_input_file.gz | perl this_program` or `wget -O - http://remote_url | perl this_program` or any one of infinitely many alternate formulations of the same ilk and effect. – tchrist Nov 25 '10 at 02:27
  • That's my point and what I kept getting at was the fact that I didn't need a "script", I needed code to implement to my program that would read a file that already existed in. All your examples that you listed appear that I'm running this code and in some way, piping the text file of mine to it, or vise versa. Jonathan's example was code that I could implement into my own script, and I was able to define my existing file. I just didn't know how to do that with your script. I do appreciate your help though. – drewrockshard Nov 25 '10 at 02:55