I'm writing this to avoid a O(n!) time complexity but I only have pseudocode right now because there are some things I'm unsure about implementing.
This is the format of the file that I want to pass into this script. The data is sorted by the third column -- the start position.
93 Blue19 1 82
87 Green9 1 7912
76 Blue7 2 20690
65 Red4 2 170
...
...
256 Orange50 17515 66740
166 Teal68 72503 123150
228 Green89 72510 114530
Explanation of the code:
I want to create an array of arrays to find when two pieces of information have overlapping lengths.
Columns 3 and 4 of the input file are start and stop positions on a single track line. If any row(x) has a position in column 3 that is shorter than the position in column 4 in any row(y) then this means that x starts before y ends and there is some overlap.
I want to find every row that overlaps with asnyrow without having to compare every row to every row. Because they are sorted I simply add a string to an inner array of the array which represents one row. If the new row being looked at does not overlap with one of the rows already in the array then (because the array is sorted by the third column) no further row will be able to overlap with the row in the array and it can be removed.
This is what I have an idea of
#!/usr/bin/perl -w
use strict;
my @array
while (<>) {
my thisLoop = ($id, $name, $begin, $end) = split;
my @innerArray = split; # make an inner array with the current line, to
# have strings that will be printed after it
push @array(@innerArray)
for ( @array ) { # loop through the outer array being made to see if there
# are overlaps with the current item
if ( $begin > $innerArray[3]) # if there are no overlaps then print
# this inner array and remove it
# (because it is sorted and everything
# else cannot overlap because it is
# larger)
# print @array[4-]
# remove this item from the array
else
# add to array this string
"$id overlap with innerArray[0] \t innerArray[0]: $innerArray[2], $innerArray[3] "\t" $id : $begin, $end
# otherwise because there is overlap add a statement to the inner
# array explaining the overlap
The code should produce something like
87 overlap with 93 93: 1 82 87: 1 7982
76 overlap with 93 93: 1 82 76: 1 20690
65 overlap with 93 93: 1 82 65: 2 170
76 overlap with 87 87: 1 7912 76: 2 20690
65 overlap with 87 87: 1 7912 65: 2 170
65 overlap with 76 76: 2 20690 65: 2 170
256 overlap with 76 76: 2 20690 256: 17515 66740
228 overlap with 166 166: 72503 123150 228: 72510 114530
This was tricky to explain so ask me if you have any questions