I have a table with five columns. I want to merge start and end columns if they overlap, and have same RNAiclone and target_mRNA name. If the start-end of two entries are: (A) 1-10, 11-20 means overlapping range; while (B)1-10, 12-20 means no-overlapping range. RNAilength(nt) is same for similar RNAiclone.
input.txt
RNAiclone RNAilength(nt) target_mRNA start end
siRNA1 10 mRNA1 1 10
siRNA1 10 mRNA1 11 20
siRNA1 10 mRNA1 17 30
siRNA1 10 mRNA2 18 19
siRNA2 20 mRNA2 1 10
siRNA2 20 mRNA2 9 100
expected output.txt
RNAiclone RNAilength(nt) target_mRNA start end
siRNA1 10 mRNA1 1 30
siRNA1 10 mRNA2 18 19
siRNA2 20 mRNA2 1 100
program.awk
BEGIN{
i=0;
s="";
m="";
OFS="\t";
}
{
if (s!=$1 && m!=$3){
if (s != "" && m!= ""){
combine(chr,s,m,i);
}
i=0;
s="";
}
s=$1;
m=$3;
chr[i,0]=$4;
chr[i,1]=$5;
i++
}
END{
combine(chr,s,m,i);
}
function combine(arr,s,m,i) {
j=0;
new[j,0]=arr[0,0];
new[j,1]=arr[0,1];
for (k=1;k<i;k++)
{
if ((arr[k,0]<=new[j,1])&&(arr[k,1]>=new[j,1])){
new[j,1]=arr[k,1];
}
else if (arr[k,0]>new[j,1]){
j++;
new[j,0]=arr[k,0];
new[j,1]=arr[k,1];
}
}
for (n=0;n<=j;n++){
print s,m,new[n,0],new[n,1]
}
}
I am running the script using command "wk -f program.awk input.txt > output.txt", but I am not getting the expected result. Could you kindly help me to correct the script. Thank you very much.