0

I have a file of jsonlines that contains items with node as the key and as a value a list of the other nodes it is connected to. To add the edges to a networkx graph, -I think- requires tuples of the form(u,v). I wrote a naive solution for this but I feel it might be a bit slow for big enough jsonl files does anyone got a better, more pythonic solution to suggest?

dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]
for node in dol:
    #print(node)
    tpls = []
    key = list(node.keys())[0]
    tpls = [(key,v) for v in node[key]]
    print(tpls)

<iterate through each one in the list to add them to the graph>

[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
[(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
Arty
  • 14,883
  • 6
  • 36
  • 69
MiltosV
  • 3
  • 2
  • The answer depends on what you need to do with this graph. There are three commonly used representations. https://en.wikipedia.org/wiki/Graph_(abstract_data_type)#Common_Data_Structures_for_Graph_Representation Yours is an *adjacency list*. –  Dec 24 '21 at 15:50
  • Why is your input split into multiple dictionaries with one item in each? Also, why is your output split into multiple lists? If you want to build a graph, it would make sense for your input to either be a single dictionary, or a list of 2-items lists, with the first items being the dictionary key, and the second being the dictionary value. Your output should be just a list of 2-tuples. – Amitai Irron Dec 24 '21 at 16:01

3 Answers3

1
dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]

def process(item: dict):
    for key, values in item.items():
        for i in values:
            yield (key, i) 

results = map(process, dol)
print([list(r) for r in results])

You should use yield where you can.

You will find its more memory efficient when you are using yield and getting a generator that you can iterate over.

Generators are more memory efficient.

Nimantha
  • 6,405
  • 6
  • 28
  • 69
A H Bensiali
  • 825
  • 1
  • 9
  • 22
  • you don't have to create a generator with a function, just take all my comprehensions and replace brackets with parenthesis :) (but yes, you are right, generator is more memory efficient but more situational and less beginner friendly) – Dorian Turba Dec 24 '21 at 16:21
  • 1
    Both work. One should write code for others. Nice, clean, and easy to read. I like your use of pandas. – A H Bensiali Dec 24 '21 at 16:24
0

Only one key

If the dict never have more than one item, you can do this:

dol = [{0: [1, 2, 3, 4, 5, 6]}, {1: [0, 2, 3, 4, 5, 6]}]

for node in dol:
    local_node = node.copy()  # only if dict shouldn't be modified in any way
    k, values = local_node.popitem()
    print([(k, value) for value in values])
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
# [(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]

Multiple keys

But if a dict may contains more than one value, you can do a while loop and test if the dict is not empty:

for node in dol:
    local_node = node.copy()  # only if dict shouldn't be modified in any way
    while local_node:
        k, values = local_node.popitem()
        print([(k, value) for value in values])
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
# [(2, 0), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)]
# [(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]

Of course, if you need to store the generated list, append it to a list instead of just printing it.

Only one big dictionary

If your dol object can be a single dictionary, it's even simpler and if, as Yves Daoust said, you need an adjacency list or matrix, here is two example:

Adjacency list pure python

An adjacency list:

dol = {0: [1, 2, 3, 4, 5, 6],
       1: [0, 2, 3, 4, 5, 6]}

adjacency_list = [(key, value) for key, values in dol.items() for value in values]
print(adjacency_list)
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]

Adjacency matrix with pandas

An adjacency_matrix:

import pandas
dol = {0: [1, 2, 3, 4, 5, 6],
       1: [0, 2, 3, 4, 5, 6]}

adjacency_list = [(key, value) for key, values in dol.items() for value in values]
adjacency_df = pandas.DataFrame(adjacency_list)
adjacency_matrix = pandas.crosstab(adjacency_df[0], adjacency_df[1],
                                   rownames=['keys'], colnames=['values'])
print(adjacency_matrix)
# values  0  1  2  3  4  5  6
# keys                       
# 0       0  1  1  1  1  1  1
# 1       1  0  1  1  1  1  1
Dorian Turba
  • 3,260
  • 3
  • 23
  • 67
0

You could use a list comprehension:

dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]

tuples = [ (n1,n2) for d in dol for n1,ns in d.items() for n2 in ns ]

print(tuples)

[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (1, 0), (1, 2), 
 (1, 3), (1, 4), (1, 5), (1, 6)]
Alain T.
  • 40,517
  • 4
  • 31
  • 51