2

I have the following file

CHO 1
4096
26 20 0 0 0 0 0 0 0 0 
0 0 0 0 0 3 5 15 8 14 
9 7 13 10 12 9 5 3 3 2 
2 0 0 0 0 0 0 1 1 0 
0 0 0 0 0 0 0 0 0 0 
0 0 0 0 1 0 1 0 0 0 
0 0 0 0 0 0 1 0 0 0 
0 0 0 0 1 0 0 0 0 0
6 8 5 5 7 13 13 33 23 29 
44 51 56 42 39 31 21 24 18 18 
18 30 44 43 51 67 102 110 130 130 
100 96 87 49 25 16 4 1 1 0
0 0 0 0 0 0

What I want to do is put all entries after 4096 in one column. A desired output is the following

1 26
2 20
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
...
4096 0

I don't have a clue on how to do it using awk. I tried for instance to put them in one line using

awk -F'\n' '{if(NR == 1) {printf $0} else {printf $0}}' file

but I don't know how to get them to one column. Let alone the fact that the first entries are not as expected.

CHO 1409626 20 0 0 0 0 0 0 0 0 0 0 0 0 0 3 5

Any idea on how to get the desired two column output? Any help is more than welcome!!!

toolic
  • 57,801
  • 17
  • 75
  • 117
Thanos
  • 594
  • 2
  • 7
  • 28
  • You can put them all in the same line and loop through the string. – fedorqui Feb 21 '14 at 16:09
  • @fedorqui: Thank's for your answer. I have already tried to put them in one line but I don't know how to make this line a column. – Thanos Feb 21 '14 at 16:19
  • Do you want the number of entries indicated by the second line paired to zero where no data exists? Or just to sequence the data that exists? – potong Feb 22 '14 at 11:41
  • @potong : I want after the 2nd line(i.e. starting from 3rd) to put all the data in one column while having a new column that will "count" the number of entries starting from 1. – Thanos Feb 25 '14 at 09:52

6 Answers6

4

Using Perl it could be done with a readaptation of this:

#!/usr/bin/perl

use strict;
use warnings;

my @lines = ('CHO 1', '4096', #simulate line-by-line loading of the file
'26 20 0 0 0 0 0 0 0 0',
'0 0 0 0 0 3 5 15 8 14', 
'9 7 13 10 12 9 5 3 3 2', 
'2 0 0 0 0 0 0 1 1 0', 
'0 0 0 0 0 0 0 0 0 0', 
'0 0 0 0 1 0 1 0 0 0', 
'0 0 0 0 0 0 1 0 0 0', 
'0 0 0 0 1 0 0 0 0 0',
'6 8 5 5 7 13 13 33 23 29', 
'44 51 56 42 39 31 21 24 18 18', 
'18 30 44 43 51 67 102 110 130 130', 
'100 96 87 49 25 16 4 1 1 0',
'0 0 0 0 0 0');


my $first_line = shift @lines; #removes CHO 1
my $stop = shift @lines; #removes 4096 
my $i = 0;


foreach my $line (@lines) {
  $line =~ s/^\s*//;
  $line =~ s/\s*$//;

  my @parts = split(/\s+/, $line);
  foreach my $part (@parts) {
    print "$i $part\n"; #prints to stdout, maybe you want to print into a file
    $i++;
  }

}

and this is the output:

0 26
1 20
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 3
16 5
 ...
125 0
 ...
Filippo Lauria
  • 1,965
  • 14
  • 20
3

This will do the trick:

$ awk 'NR>2{$1=$1;print}' OFS='\n' file 
Zombo
  • 1
  • 62
  • 391
  • 407
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
  • Thank you very much for your answer! How can I add a column starting from 1 and ending to 4096? – Thanos Feb 21 '14 at 16:41
  • Didn't think you actually wanted the line counts, you can use `nl` for that `awk 'NR>2{$1=$1;print}' RS=' ' file | nl -n ln ` – Chris Seymour Feb 21 '14 at 16:45
  • +1 Not sure why would anyone un-upvote this cheeky looking one-liner. I would use this any day over the answer at the top. – jaypal singh Feb 24 '14 at 22:07
2

OPs request put all entries after 4096 in one column. Others solution just assume its record number 2. This gnu awk should take care of that, and the problem with spaces at the end of the line:

awk 'f{print ++x,$1} /4096/{f=1}' RS=" | *\n" file

PS you need gnu awk due to multiple characters in RS

Jotne
  • 40,548
  • 12
  • 51
  • 55
2

This can be done with GNU awk, which can use a regex as recond separator (RS):

gawk -v RS="[[:space:]]+" 'NR > 3 { print NR-3, $0 }' file
1

Here is another way with awk:

awk 'NR>2{for(x=1;x<=NF;x++) print y++,$x}' file

Test:

$ cat file
CHO 1
4096
26 20 0 0 0 0 0 0 0 0
0 0 0 0 0 3 5 15 8 14
9 7 13 10 12 9 5 3 3 2
2 0 0 0 0 0 0 1 1 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 1 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0
6 8 5 5 7 13 13 33 23 29
44 51 56 42 39 31 21 24 18 18
18 30 44 43 51 67 102 110 130 130
100 96 87 49 25 16 4 1 1 0
0 0 0 0 0 0

$ awk 'NR>2{for(x=1;x<=NF;x++) print y++,$x}' file
0 26
1 20
2 0
3 0
4 0
5 0
6 0
7 0
---
---
122 0
123 0
124 0
125 0
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
1

This might work for you (GNU sed):

sed -r '1d;2{s/.*/seq -s: &/e;s/$/:/;h;d};G;:a;/:/!d;/^\s*\n/{s///;h;$!d;x;s/:/ 0\n/g;s/.$//p;d};s/^(\S+)\s*([^\n]*\n)([^:]*):/\3 \1\n\2/;P;s/[^\n]*\n//;ba' file

This removes the first line. Stores a sequence of the numbers from 1 to the number held in the second line, in the hold space and removes the second line. Pairs the first number on the next line with the first number in the hold space and adds a newline. Prints out the pairing and repeats. When the last number of the last line has been matched any sequence numbers left are paired with zero.

potong
  • 55,640
  • 6
  • 51
  • 83
  • This gives correct output for the given date, but it does not stop before counter ends up with 4096 lines. It does not start after record with `4096` and just after column number `2` – Jotne Feb 22 '14 at 09:52
  • @Jotne I don't understand your comment. Do you mean the process should cease once the data has been exhausted even though the second line indicates 4096 lines? I have however found a bug in the solution concerning the end-of-file condition so will delete it for the time being. – potong Feb 22 '14 at 10:23