How to create a DAG from a list in python

Question

I am using networkx to manually input the (u, v, weights) to a graph. But when the input gets bigger this manual insertion of nodes and edges will become a really tiresome task and prone to errors. I'm trying but haven't figured out that how to perform this task without manual labour.

Sample Input:

my_list = ["s1[0]", "d1[0, 2]", "s2[0]", "d2[1, 3]", "d3[0, 2]", "d4[1, 4]", "d5[2, 3]", "d6[1, 4]"]

Manual Insertion:

Before inserting nodes into a graph I need to number them, so first occurrence of 's' or 'd' can be differentiate from later similar characters e.g. s1,s2,s3,... and d1,d2,d3,... I am aware it is something similar to SSA form (compilers) but I was not able to find something helpful for my case.

Manually inserting (u, v, weights) to a DiGraph()

my_graph.add_weighted_edges_from([("s1", "d1", 0), ("d1", "s2", 0), ("d1", "d3", 2), ("s2", "d3", 0), (
    "d2", "d4", 1), ("d2", "d5", 3), ("d3", "d5", 2), ("d4", "d6", 1), ("d4", "d6", 4)])

Question:

How to automatically convert that input list(my_list) into a DAG(my_graph), avoiding manual insertion?

Complete Code: This is what I have written so far.

import networkx as nx
from networkx.drawing.nx_agraph import write_dot, graphviz_layout
from matplotlib import pyplot as plt

my_graph = nx.DiGraph()
my_graph.add_weighted_edges_from([("s1", "d1", 0), ("d1", "s2", 0), ("d1", "d3", 2), ("s2", "d3", 0), (
    "d2", "d4", 1), ("d2", "d5", 3), ("d3", "d5", 2), ("d4", "d6", 1), ("d4", "d6", 4)])


write_dot(my_graph, "graph.dot")

plt.title("draw graph")
pos = graphviz_layout(my_graph, prog='dot')



nx.draw(my_graph, pos, with_labels=True, arrows=True)

plt.show()
plt.clf()

Explanation:

's' and 'd' are some instructions that requires 1 or 2 registers respectively, to perform an operation.
In above example we have 2 's' operations and 6 'd' operations and there are five registers [0,1,2,3,4].
Each operation will perform some calculation and store the results in relevant register/s.
From input we can see that d1 uses register 0 and 2, so it cannot operate until both of these registers are free. Therefore, d1 is dependent on s1 because s1 comes before d1 and is using register 0. As soon as s1 finishes d1 can operate as register 2 is already free.
E.g. We initialize all registers with 1. s1 doubles its input while d1 sums two inputs and store the result in it's second register:

so after s1[0] reg-0 * 2 -> 1 * 2 => reg-0 = 2

and after d1[0, 2] reg-0 + reg-2 -> 2 + 1 => reg-0 = 2 and reg-2 = 3

Update 1: The graph will be a dependency-graph based on some resources [0...4], each node will require 1(for 's') or 2(for 'd') of these resources.

Update 2: Two questions were causing confusion so I'm separating them. For now I have changed my input list and there is only a single task of converting that list into a DAG. I have also included an explanation section.

PS: You might need to pip install graphviz if you don't already have it.

How did you change your input to `[("s1", "d1", 0), ("d1", "s2", 0), ("d1", "d3", 2), ("s2", "d3", 0), ("d2", "d4", 1), ("d2", "d5", 3), ("d3", "d5", 2), ("d4", "d6", 1), ("d4", "d6", 4)]`? — Siva Shanmugam, Aug 22 '20 at 06:07
Can you explain in a little more detail what were the steps involved in changing your input to SSA form ? — Gambit1614, Aug 22 '20 at 09:34
@SivaShanmugam I hard-coded those nodes based on input list. the values in square brackets are some resources/registers, every trailing node is dependent on its previous nodes because it has to wait till previous nodes free those resources. 's' nodes use single register and 'd' nodes require two registers. It is like a dependency graph based on input sequence and registers required by each node. — muruDiaz, Aug 22 '20 at 14:48
@muruDiaz where does `("d4", "d6", 4)` come from anywhere in your input? the integer `6` isn't present anywhere in your sample? — Tadhg McDonald-Jensen, Aug 22 '20 at 14:59
@TadhgMcDonald-Jensen that is my Questions 1. For now I am renaming 4th and 6th occurrence of 'd' node as d4 and d6 respectively. Similarly, I need to rename all the nodes according to [SSA form](https://en.wikipedia.org/wiki/Static_single_assignment_form). For now I'm doing it manually. — muruDiaz, Aug 22 '20 at 16:24
I see a bunch of `s`s and `d`s, and some numbers. absolutely none of them mean anything to me. some explanation for how the mapping works in words would be helpful here. — Tadhg McDonald-Jensen, Aug 22 '20 at 20:35
@TadhgMcDonald-Jensen I have updated my question and added an explanation section, hope it's much more clear now. Comment if there's still any confusion. — muruDiaz, Aug 23 '20 at 05:49

score 1 · Answer 1 · answered Aug 23 '20 at 20:11

Ok now that I have a better idea of how the mapping works, it just comes down to describing the process in code, keeping a mapping of which op is using which resource and as iterating over the operations if it uses a resource used by the previous operation we generate an edge. I think this is along the lines of what you are looking for:

import ast
class UniqueIdGenerator:
    def __init__(self, initial=1):
        self.auto_indexing = {}
        self.initial = initial
    def get_unique_name(self, name):
        "adds number after given string to ensure uniqueness."
        if name not in self.auto_indexing:
            self.auto_indexing[name] = self.initial
        unique_idx = self.auto_indexing[name]
        self.auto_indexing[name] += 1
        return f"{name}{unique_idx}"

def generate_DAG(source):
    """
    takes iterable of tuples in format (name, list_of_resources) where
    - name doesn't have to be unique
    - list_of_resources is a list of resources in any hashable format (list of numbers or strings is typical)
    
    generates edges in the format (name1, name2, resource),
    - name1 and name2 are unique-ified versions of names in input
    - resource is the value in the list of resources
    each "edge" represents a handoff of resource, so name1 and name2 use the same resource sequentially.
    """
    # format {resource: name} for each resource in use.
    resources = {}
    g = UniqueIdGenerator()
    for (op, deps) in source:
        op = g.get_unique_name(op)
        for resource in deps:
            # for each resource this operation requires, if a previous operation used it
            if resource in resources:
                # yield the new edge
                yield (resources[resource], op, resource)
            # either first or yielded an edge, this op is now using the resource.
            resources[resource] = op

my_list = ["s[0]", "d[0, 2]", "s[0]", "d[1, 3]", "d[0, 2]", "d[1, 4]", "d[2, 3]", "d[1, 4]"]
data = generate_DAG((a[0], ast.literal_eval(a[1:])) for a in my_list)
print(*data, sep="\n")

Works perfect for the sample input, just a little questions: what tweak we need to make in the code if the operation name is multiple character/string instead of single char. e.g. "da" or "dn" instead of just 'd' — muruDiaz, Aug 28 '20 at 15:17
the part `(a[0], ast.literal_eval(a[1:])` is currently taking just the first character for the id, if you formatted your input as `[("sa", [0]), ("dn", [0,2]), ...]` then you could pass that list directly to `generate_DAG`, it just needs a list of 2 element tuple containing the id and the list of resources. for parsing out you could do something like `(a[:a.index("[")], ast.literal_eval(a[a.index("["):])) for a in my_list` if getting the list of strings formatted as you have it is easier than reformatting into 2 element tuples. — Tadhg McDonald-Jensen, Aug 29 '20 at 19:38

How to create a DAG from a list in python

1 Answers1