I explain my problem.
I have a huge file in gff format such that:
scaffold_31 AUGUSTUS CDS 18857 19210 0.63 + 0 transcript_id "g56.t1"; gene_id "g56";
scaffold_32 AUGUSTUS CDS 8973 9290 0.82 - 0 transcript_id "g57.t1"; gene_id "g57";
scaffold_32 AUGUSTUS CDS 11374 11507 0.96 - 2 transcript_id "g57.t1"; gene_id "g57";
scaffold_32 AUGUSTUS CDS 11586 11733 0.39 - 0 transcript_id "g57.t1"; gene_id "g57";
scaffold_33 AUGUSTUS CDS 5303 5323 0.83 - 0 transcript_id "g58.t1"; gene_id "g58";
scaffold_33 AUGUSTUS CDS 5810 6034 0.97 - 0 transcript_id "g58.t1"; gene_id "g58";
scaffold_34 AUGUSTUS CDS 1390 1805 0.87 + 1 transcript_id "g59.t1"; gene_id "g59";
scaffold_37 AUGUSTUS CDS 15299 15390 0.91 - 2 transcript_id "g60.t1"; gene_id "g60";
scaffold_37 AUGUSTUS CDS 15622 15826 0.88 - 0 transcript_id "g60.t1"; gene_id "g60";
an so on... And I would like to find a command to extract in one side transcrit where their FIRST CDS starts with a a codon phase 0 (the 7 th column), and those from which their FIRST CDS starts with a 1 or a 2. Then, I would like to get 3 files and here it would be:
First file: with the first CDS of the transcript starting in phase 0.
scaffold_31 AUGUSTUS CDS 18857 19210 0.63 + 0 transcript_id "g56.t1"; gene_id "g56";
scaffold_32 AUGUSTUS CDS 8973 9290 0.82 - 0 transcript_id
scaffold_32 AUGUSTUS CDS 8973 9290 0.82 - 0 transcript_id "g57.t1"; gene_id "g57";
scaffold_33 AUGUSTUS CDS 5303 5323 0.83 - 0 transcript_id "g58.t1"; gene_id "g58";
scaffold_33 AUGUSTUS CDS 5810 6034 0.97 - 0 transcript_id "g58.t1"; gene_id "g58";
The second with with the first CDS of the transcript starting in phase 1:
scaffold_34 AUGUSTUS CDS 1390 1805 0.87 + 1 transcript_id "g59.t1"; gene_id "g59";
And the third with the first CDS of the transcript starting in phase 2:
scaffold_37 AUGUSTUS CDS 15299 15390 0.91 - 2 transcript_id "g60.t1"; gene_id "g60";
scaffold_37 AUGUSTUS CDS 15622 15826 0.88 - 0 transcript_id "g60.t1"; gene_id "g60";
As you can see, since the transcrit for exemple transcript_id "g60.t1 has its first CDS starting with the phase 2, all the folowwing CDS belonging to this transcript has to be transfered to the same file.
Thanks for you help, I hope someone will find a solution :)? I thought that awk could help ?