Splitting of files based on a criteria

Question

I have a file with below data

.domain bag
.set bag1
bag1
abc1
.set bag2
bag2
abc2
.domain cat
.set bag1:cat
bag1:cat
abc1:cat
.set bag2:cat
bag2:cat
abc2:cat

I want to split this file into two (bag1.txt and bag2.txt) based on the set value.

bag1.txt should look like :

.domain bag
.set bag1
bag1
abc1
.domain cat
.set bag1:cat
bag1:cat
abc1:cat

bag2.txt should look like :

.domain bag
.set bag2
bag2
abc2
.domain cat
.set bag2:cat
bag2:cat
abc2:cat

the .domain line is common for both the files.

I tried the command below but it is not working.

nawk '{if($0~/.set/){split($2,a,":");filename=a[1]".text"}if(filename=".text"){print|"tee *.text"}else{print >filename}}' file.txt

Birei · Accepted Answer · 2012-08-08T13:40:28.677

One way:

awk '
    BEGIN {
        ## Split fields with spaces and colon.
        FS = "[ :]+";

        ## Extension of output files.
        ext = ".txt";
    }

    ## Write lines that begin with ".domain" to all known output files (saved
    ## in "processed_bags"). Also save them in the "domain" array to copy them
    ## later to all files not processed yet.
    $1 == ".domain" {

        for ( b in processed_bags ) {
            print $0 >> sprintf( "%s%s", b, ext );
        }

        domain[ i++ ] = $0;

        next;
    }

    ## Select output file to write. If not found previously, copy all
    ## domains saved until now.
    $1 == ".set" {
        bag = $2;
        if ( ! (bag in processed_bags) ) {
            for ( j = 0; j < i; j++ ) {
                print domain[j] >> sprintf( "%s%s", bag, ext );
            }
            processed_bags[ bag ] = 1;            
        }
    }

    ## A normal line of data (neither ".domain" nor ".set"). Copy
    ## to the file saved in "bag" variable.
    bag {
        print $0 >> sprintf( "%s%s", bag, ext );
    }
' file.txt

Run following command to check output:

head bag[12].txt

Output:

==> bag1.txt <==                                                                                                                                                                                                                             
.domain bag                                                                                                                                                                                                                                  
.set bag1                                                                                                                                                                                                                                    
bag1                                                                                                                                                                                                                                         
abc1                                                                                                                                                                                                                                         
.domain cat                                                                                                                                                                                                                                  
.set bag1:cat                                                                                                                                                                                                                                
bag1:cat
abc1:cat

==> bag2.txt <==
.domain bag
.set bag2
bag2
abc2
.domain cat
.set bag2:cat
bag2:cat
abc2:cat

This is ok.But can we generalise the part of common lines?If there are many bags? like bag1....bag1000.how can i do this?The actual file that i have has many bags from bag1 to bag1000.instead of print >> bag1 can we do it simply with print >*.txt(many empty files are already present in the directory from bag1.txt to bag 1000.txt) — Vijay, Aug 08 '12 at 12:06
@peter: I've edited the answer to generalise it. It's fully commented and you can see if it fits your needs because I don't understand what do you mean with `print >> *.txt` — Birei, Aug 08 '12 at 13:41

Splitting of files based on a criteria

1 Answers1