0

I have a problem I need to solve, but I can't think of any easy and more important: fast solution. It's a bit like a part of a multiple traveling salesman problem.

First I have a matrix with X rows and N columns, N is a static variable of my algorithm and X can vary. Let's assume it looks like (here N = 5):

matrix = [1 2 4 3 5; 4 3 1 2 5; 1 2 4 3 5; ]
matrix =
 1     2     4     3     5
 4     3     1     2     5
 1     2     4     3     5

every row is seen as a "route" and contains all the unique numbers between 1 and N Each route (= row) will be split in partial routes. That means, I have a breakpoint matrix which contains X rows and M (M < N) columns. E.g.:

breakpoints = [2 3 4; 1 2 4; 1 3 4]
breakpoints =
 2     3     4
 1     2     4
 1     3     4

The indices of each row of breakpoints give the elements of the corresponding row of matrix AFTER which the route will be split into partial routes. Just to make clear, let's regard the frist row as an example: breakpoints(1, :) = 2 3 4 which means, that the route matrix(1, :) = 1 2 4 3 5 will be split into the partial routes [1 2], [4], [3] and [5]. The second row has the breakpoints breakpoints(2, :) = 1 2 4 which will split the second route matrix(2, :) = 4 3 1 2 5 into the partial routes [4], [3], [1 2] and [5].

Now my aim is to remove all rows from matrix, whereas the partial routes are redundant duplicates, just in a different order. In this example row 2 is a duplicate of row 1. Row 3 is NO duplicate even if it has the same route as row 1, because there are different breakpoints which lead to the partial routes [1], [2 4], [3] and [5].

How could I do this cleanly and fast? Matrix can contain many elements, like X = 5e4 rows and N = 10, M = 6.

tim
  • 9,896
  • 20
  • 81
  • 137

1 Answers1

1

For constant M, N, this can be solved in time O(X log X) by sorting composite records into order, and then testing for equality of adjacent entries.

By a "composite record" I mean a record that combines a function of a row and its breakpoints into a single record. The function is obtained, for a given row, by:

  1. Apply breakpoints to row, getting a list of partial routes.
  2. Sort the partial routes into ascending order by first element of each route. E.g. sort {[4], [3], [1 2], [5]} as {[1 2], [3], [4], [5]}.
  3. Form new composite record by concatenating sorted-partial-routes; effective-breakpoints; and an index-to-source-row. E.g. if the example row in previous step is row 2 = (4 3 1 2 5), save (1 2 3 4 5; 2 3 4; 2) which is (sorted partial routes; effective breakpoints; index).

After sorting the composite records, go through them looking for equality of adjacent entries, up to source index. For example, (1 2 3 4 5; 2 3 4; 2) and (1 2 3 4 5; 2 3 4; 7) indicate that partial routes from row 7 duplicate those of row 2. Each time a duplicate is found, set its corresponding original first row entry to an invalid point number, say N+1.

Thus, after sorting, which cost O(X log X), use O(X) time to detect duplicates. Then use O(X) time to squeeze out duplicates by going through original rows dropping those with an invalid first element.

A slightly more accurate overall cost is O((M+N)*X*log X), which exceeds the theoretical minimum O((M+N)*X) by a log X factor. You can get rid of the log X factor if you store the composite records in a hash table instead of sorting them, and mark records for deletion when duplicate hash entries occur.

James Waldby - jwpat7
  • 8,593
  • 2
  • 22
  • 37
  • Oh yeah that sounds nice, didn't think of this type of algorithm. Hmmm but probably it's a bit too time consuming. I don't need it badly, I just thought reducing the matrixsize may give a little benefit for my further calculations. But since the reduction of duplicate entries is already taking much time, it probably wont benefit in the end. Thanks a lot though!!! – tim Dec 11 '11 at 19:41