-1

Given the following three column tab delimited list file, where value 1 and value 2 in each row represents a range in a given match, what is the simplest shell script/command that will identify all overlapping ranges for each match and determine the min and max value for the entire overlap? For a given match, the smallest value is always in the first column of the range. However, within a match the values in the columns are not necessarily sorted.

infile.txt:

match1 857 1107
match1 879 1128
match1 969 1126
match1 865 1115
match1 1296 1546
match1 1304 1554
match1 1318 1600
match1 1408 1562
match2 300 1100
match2 639 1225
match2 4299 6546
match2 5304 7754

outfile.txt:

match1 857 1128
match1 1296 1600
match2 300 1225
match2 4299 7754
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • Welcome to SO, on SO we encourage people to post their efforts which they have put in order to solve their own problem, so kindly do so and let us know then. – RavinderSingh13 Oct 08 '19 at 14:50

1 Answers1

0

what is the simplest shell script/command that will identify all overlapping ranges for each match and determine the min and max value for the entire overlap?

It is arguable whether this is the simplest shell script to do that, but every solution probably will have to sort the ranges and identify the gaps, as this does:

while read match min max
do  printf %s\\n $match\ {$min..$max}
done <infile.txt | sort -u -k1,1 -k2n |
while read match value
do  if [ $match != "$oldmatch" -o $value != $((oldvalue+1)) ]
    then    [ "$oldmatch" ] && echo $oldvalue
            printf %s\\t $match $value
    fi
    oldmatch=$match
    oldvalue=$value
done
echo $oldvalue
Armali
  • 18,255
  • 14
  • 57
  • 171