1

I'm trying to write an optimization process based on Dijkstra's algorithm to find the optimal path, but with a slight variation to disallow choosing items from the same group/family when finding the optimal path.

Brute force traversal of all edges to find the solution would be np-hard, which is why am attempting to (hopefully) use Dijkstra's algorithm, but I'm struggling to add in the no-repeat groups logic.

Think of it like a traveling salesman problem, but I want to travel from New Your to Los Angels, and have an interesting route (by never visiting 2 similar cities from same group) and minimize my fuel costs. There are approx 15 days and 40 cities, but for defining my program, I've pared it down to 4 cities and 3 days.

Valid paths don't have to visit every group, they just can't visit 2 cities in the same group. {XL,L,S} is a valid solution, but {XL,L,XL} is not valid because it visits the XL group twice. All Valid solutions will be the same length (15 days or edges) but can use any combination of groups (w/out duplicating groups) and need not use them all (since 15 days, but 40 different city groups).

Here's a picture I put together to illustrate a valid & invalid route: (FYI - groups are horizontal rows in the matrix) enter image description here

**Day 1**
G1->G2 @ $10
G3->G4 @ $30
etc...
**Day 2**
G1->G3 @ $50
G2->G4 @ $10
etc...
**Day 3**
G1->G4 @ $30
G2->G3 @ $50
etc...

The optimal path would be G1->G2->G3, however a standard Dijkstra solution returns G1-

I found & tweaked this example code online, and name my nodes with the following syntax so I can quickly check what day & group they belong to: D[day#][Group#] by slicing the 3rd character.

## Based on code found here: https://raw.githubusercontent.com/nvictus/priority-queue-dictionary/0eea25fa0b0981558aa780ec5b74649af83f441a/examples/dijkstra.py

import pqdict

def dijkstra(graph, source, target=None):
    """
    Computes the shortests paths from a source vertex to every other vertex in
    a graph

    """
    # The entire main loop is O( (m+n) log n ), where n is the number of
    # vertices and m is the number of edges. If the graph is connected
    # (i.e. the graph is in one piece), m normally dominates over n, making the
    # algorithm O(m log n) overall.

    dist = {}   
    pred = {}
    predGroups = {}

    # Store distance scores in a priority queue dictionary
    pq = pqdict.PQDict()
    for node in graph:
        if node == source:
            pq[node] = 0
        else:
            pq[node] = float('inf')

    # Remove the head node of the "frontier" edge from pqdict: O(log n).
    for node, min_dist in pq.iteritems():
        # Each node in the graph gets processed just once.
        # Overall this is O(n log n).
        dist[node] = min_dist
        if node == target:
            break

        # Updating the score of any edge's node is O(log n) using pqdict.
        # There is _at most_ one score update for each _edge_ in the graph.
        # Overall this is O(m log n).
        for neighbor in graph[node]:
            if neighbor in pq:
                new_score = dist[node] + graph[node][neighbor]

                #This is my attempt at tracking if we've already used a node in this group/family
                #The group designator is stored as the 4th char in the node name for quick access
                try:
                    groupToAdd = node[2]
                    alreadyVisited = predGroups.get( groupToAdd, False )
                except: 
                    alreadyVisited = False
                    groupToAdd = 'S'

                #Solves OK with this line
                if new_score < pq[neighbor]:
                #Erros out with this line version
                #if new_score < pq[neighbor] and not( alreadyVisited ):
                    pq[neighbor] = new_score
                    pred[neighbor] = node

                    #Store this node in the "visited" list to prevent future duplication
                    predGroups[groupToAdd] = groupToAdd
                    print predGroups
                    #print node[2]

    return dist, pred

def shortest_path(graph, source, target):
    dist, pred = dijkstra(graph, source, target)
    end = target
    path = [end]
    while end != source:
        end = pred[end]
        path.append(end)        
    path.reverse()
    return path

if __name__=='__main__':
    # A simple edge-labeled graph using a dict of dicts
    graph = {'START': {'D11':1,'D12':50,'D13':3,'D14':50},
             'D11': {'D21':5},
             'D12': {'D22':1},
             'D13': {'D23':50},
             'D14': {'D24':50},
             'D21': {'D31':3},
             'D22': {'D32':5},
             'D23': {'D33':50},
             'D24': {'D34':50},
             'D31': {'END':3},
             'D32': {'END':5},
             'D33': {'END':50},
             'D34': {'END':50},
             'END': {'END':0}}

    dist, path = dijkstra(graph, source='START')
    print dist
    print path
    print shortest_path(graph, 'START', 'END')
NumericOverflow
  • 899
  • 8
  • 17
  • Are you trying to visit every group, or just trying to get from point A to point B? If the former, this is NP hard (since you could take the degenerate case in which each group has exactly one city and recover TSP). – Kevin Sep 09 '15 at 14:03
  • Both - get from start to end in shortest distance AND with a constraint on the cities I visit along the way (or what qualifies as a valid route. Example being I don't want to get there visiting all "large" cities, I want to visit only 1 extra-large, 1 large, 1 medium. In the original post, {XL,L,M} would all be groups. The group is an attribute of each city and I want any route where I visit 2 XL cities to be invalid when determining shortest route. Does that make sense? – NumericOverflow Sep 09 '15 at 15:39
  • What if I can get there by visiting one XL, one M, but no L? Is that a valid solution? If not, this is NP-complete and you should look into a dynamic programming solution instead. – Kevin Sep 09 '15 at 16:08
  • @Kevin - Your suggested path would be valid - paths don't have to visit every group. I posted a link to a diagram I threw in the original post of a picture to clarify. {XL,L,S} is a valid solution, but {XL,L,XL} is not because it visits the XL group twice. All Valid solutions will be the same length (15 days or edges) but can use any combination of groups and need not use them all (since 15 days, but 40 different city groups). – NumericOverflow Sep 09 '15 at 18:13

0 Answers0