-2

I have a file (let say file B) like this:

File B:

A1  3  5  
A1  7  9  
A2  2  5  
A3  1  3    

The first column defines a filename and the other two define a range in that specific file. In the same directory, I have also three more files (File A1, A2 and A3). Here is 10 sample lines from each file:

File A1:

1   0.6  
2   0.04 
3   0.4  
4   0.5  
5   0.009
6   0.2  
7   0.3  
8   0.2  
9   0.15 
10  0.1  

File A2:

1  0.2  
2  0.1
3  0.2
4  0.4
5  0.2
6  0.3
7  0.8
8  0.1
9  0.9
10 0.4

File A3:

1  0.1
2  0.2
3  0.5
4  0.3
5  0.7
6  0.3
7  0.3
8  0.2
9  0.8
10 0.1

I need to add a new column to file B, which in each line gives the sum of values of column two in the defined range and file. For example, file B row 1 means that calculate the sum of values of line 3 to 5 in column two of file A1. The desired output is something like this:

File B:

A1  3  5  0.909    
A1  7  9  0.65  
A2  2  5  0.9  
A3  1  3  0.8     

All files are in tabular text format. How can I perform this task? I have access to bash (ubuntu 14.04) and R but not an expert bash or R programmer. Any help would be greatly appreciated. Thanks in advance

Tom Fenech
  • 72,334
  • 12
  • 107
  • 141

1 Answers1

1

Given the first file fileB and 3 input files A1, A2 and A3, each with two columns, this produces the output that you want:

#!/bin/bash

while read -r file start end; do
    sum=$(awk -vs="$start" -ve="$end" 'NR==s,NR==e{sum+=$2}END{print sum}' "$file")
    echo "$file $start $end $sum"
done < fileB

This uses awk to sum up the values between lines between the range specified by the variables s and e. It's not particularly efficient, as it loops through $file once per line of fileB but depending on the size of your inputs, that may not be a problem.

Output:

A1 3 5 0.909
A1 7 9 0.65
A2 2 5 0.9
A3 1 3 0.8

To redirect the output to a file, just add > output_file to the end of the loop. To overwrite the original file, you will need to first write to a temporary file, then overwrite the original file (e.g. > tmp && mv tmp fileB).

Tom Fenech
  • 72,334
  • 12
  • 107
  • 141