-1

I have a master file containing the content of 3696 files in it. Each file has a repetitive structure: It starts by a line containing the file name in a quotation from, and ends with . There is no other repetitions in the files. Is there any way to break down the master file into these smaller files? For instance if the master file contains two files as follows,

    "features/mmjr0_si2166.rec"
0 1800000 L104 -755.825928
1800000 2600000 L25 -397.663269
2600000 3600000 L6 -419.864960
3600000 3800000 L98 -116.326584
3800000 4500000 L104 -315.009827
4500000 5500000 L93 -447.467133
5500000 6300000 L12 -352.010101
6300000 7600000 L45 -556.794006
7600000 7900000 L8 -175.087677
.
"features/mesd0_si1002.rec"
0 1300000 L104 -530.985107
1300000 1700000 L13 -207.014145
1700000 2300000 L47 -303.084534
2300000 2900000 L104 -300.312927
2900000 3200000 L96 -151.823212
3200000 3700000 L46 -235.867447
3700000 4000000 L49 -170.302170
4000000 5200000 L97 -517.739868
5200000 6200000 L28 -453.094452
.

I want them to be separated, and both stored in the directory "features" the first file with the name mmjr0_si2166.rec and the second file mesd0_si1002.rec

Manuel Allenspach
  • 12,467
  • 14
  • 54
  • 76

2 Answers2

1

There are probably more compact ways to write it in Perl, but this has the merit of working first time:

#!/usr/bin/env perl
use strict;
use warnings;

my $fh = undef;

while (<>)
{
        if (/^\s*"([^"]+)"\s*$/)
        {
                my $new_file = $1;
                close $fh if (defined $fh);
                open $fh, ">", $new_file or die "Failed to open $new_file";
        }
        elsif (/^\s*\.\s*$/)
        {
                # Ignore lines with a dot only
                next;
        }
        else
        {
                print $fh $_;
        }
}

It omits the file name and dot marker from the generated files. The changes necessary to include them are trivial. It doesn't object if it comes across a 'dot-line' not followed by a file name line. It assumes that the directory (or directories) for the files already exist. If that's a problem, you can use modules to create the directories before opening the files. It allows white space before and after the quotes around file names; it also allows them before and after lines containing just a dot. You can tweak the regexes if that's inappropriate.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
1

awk example

#!/bin/bash
if [ ! -d features ] 
then
       mkdir features
fi
tr -d '"' < bigfile |
  awk '/features/ { close(file); file=$1; next}
            {print $0 >file} ' 
jim mcnamara
  • 16,005
  • 2
  • 34
  • 51