-1

I have 20 files, with each file containing 19 columns and 3000 rows.

Now I want to sum over file 1~4, by keep the first column intact (first column are the same over all files), but sum column 2 to 19 over this four files. i.e. sum column 2,3,...,19 of file 1,2,3,4 over 3000 rows.

I have files DOS1 DOS2 ... DOS20.

How to do it simply?

I found a command like this works:

pr -m -t -s\  test1 test2 test3 | gawk '{print $1+$5+$9,$2+$6+$10,$3+$7+$11,$4+$8+$12}' > test4.dat

But, I have 19 columns to add, write them specifically is not neat. The test file is only 3 columns.

Thank you!

Lei Zhang
  • 103
  • 1
  • 2
  • 7
  • Wait, I realized I might be misunderstanding after adding an answer. Do you want a total sum, or each column separately? – Andras Deak -- Слава Україні Jun 28 '16 at 21:47
  • 2
    Please clarify your question. I can't make head or tail of it yet. What's the `pr` command for? It seems to be merging the 3 files, so combining line 1 from each of test1, test2, test3 into a single output line, then the same for each other line? You say 'keep first column intact' but then show code adding `$1 + $5 + $9`, which is confusing. Please show some sample input data (3 lines from each of test1, test2, test3, perhaps?) and the desired output. Please read about how to create an MCVE ([MCVE]). – Jonathan Leffler Jun 28 '16 at 22:34

1 Answers1

1

If you "only" have 3000 rows, you can keep everything in memory:

awk '
  !((FNR,1) in d) { d[FNR,1] = $1 }
  { for (c=2;c<=NF;++i) d[FNR,c] += $c }
  END { for (r=1;(r,1) in d;++r) {
          printf "%s", d[r,1];
          for(c=2;(r,c) in d;++c)
            printf " %f", d[r,c];
          printf "\n";
       }
  ' DOC{1..4}

This awk program will aggregate all the files you list on the command-line. It assumes that the first column of each row is the same in all files, but it lets some files be longer than others (because I was too lazy to check that they are all the same length).

rici
  • 234,347
  • 28
  • 237
  • 341
  • 1
    +1 but could be more concise: `awk '{t[FNR]=$1; for (c=2;c<=NF;c++) d[FNR,c] += $c} END{ for (r=1;r<=FNR;r++) {printf "%s%s", t[r], OFS; for (c=2;c<=NF;c++) printf "%d%s", d[r,c], (c – Ed Morton Jul 10 '16 at 17:47