Combining summary statistics from multiple input files in Bash

Question

I want to generate some Summary statistics for "Mary" based on data in multiple files.

input1.txt looks like

Jose 88518 95 75 95 62 100 78 68 
Alex 97502 84 79 80 73 88 95 79 85 93 
Mary 98765 80 75 100 51 83 75 99 50 75 89 94
...

input2.txt looks like

Jack 32954 100 98 95 100 93 100 99 98 100 100
Mary 98765 85 83 96 77 81 84 98 75 87
Lisa 83746 100 100 100 100 99 100 98 100 100 100
...

Running the following one-liner code in Bash for input1.txt:

awk '/Mary/{for(n=3;n<=NF;n++) print $n}' input1.txt | Rscript -e 'summary (as.numeric (readLines ("stdin")))'

The results are:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  50.00   75.00   80.00   79.18   91.50  100.00

Running the following code for input2.txt:

awk '/Mary/{for(n=3;n<=NF;n++) print $n}' input2.txt | Rscript -e 'summary (as.numeric (readLines ("stdin")))'

The results are:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 75.00   81.00   84.00   85.11   87.00   98.00

How can I write a one-liner solution to combine "Mary"'s stats from each data file into one report that results in something similar to the following?

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.   
 50.00   75.00   80.00   79.18   91.50  100.00
 75.00   81.00   84.00   85.11   87.00   98.00

This question is being discussed on [meta](https://meta.stackoverflow.com/questions/411644). — cigien, Sep 21 '21 at 02:50

Reynaldo Aceves · Answer 1 · 2021-10-02T01:17:53.980

0

I think you need to use a bash for loop.

for file in $(ls input*.txt); do awk '/Mary/{for(n=3;n<=NF;n++) print $n}' $file | Rscript -e 'summary (as.numeric (readLines ("stdin")))'; done

Probably you will end with two headers now, but as we do not have visibility on how the headers are created it makes hard to suggest. Min. 1st Qu. Median Mean 3rd Qu. Max.

edited Oct 02 '21 at 01:17

answered Sep 29 '21 at 05:15

Reynaldo Aceves

436
2
10

This Won't work. I believe this solution produces one row of stats after combining Mary's numbers from both files. The expected results is two rows of stats where each row produces the stats from Mary's data in each input file. – LockhartTech Sep 30 '21 at 14:53

Combining summary statistics from multiple input files in Bash

1 Answers1