Create co-occurrence matrix from dictionary key, value(s) in Python

Question

I am trying to create a co-occurrence matrix from a dictionary of unique keys with overlapping values (in Python 3). Here is my data structure:

keys = ['A','B','C','D']
vals = [[1,2],1,[1,3],2]

dict = {'A':[1,2], 'B':1, 'C':[1,3], 'D':2]}

How can I create matrix that counts the occurrences of the values for each key in the form:?

   1.  2.  3. 
A. 1   1   0 
B. 1   0   0 
C. 1   0   1 
D. 0   1   0

I've been recommended to use defaultdict but I am not sure how to implement it. Thank you!

what would the result be if it was `vals = [[1,2],1,[1,2],4]`? Note that there is no `3` in this case — Ma0, Mar 25 '20 at 15:40
So the `3` would be included even if it is missing from the original data. Note that this is not covered by the otherwise great answer from [@Dani](https://stackoverflow.com/a/60852143/6162307). This also highlights the need for a good general example! — Ma0, Mar 25 '20 at 15:55

Dani Mesejo · Answer 1 · 2020-03-25T15:40:22.677

You could do:

d = {'A': [1, 2], 'B': [1], 'C': [1, 3], 'D': [2]}

values = sorted(set(e for v in d.values() for e in v))

result = {k : [1 if value in v else 0 for value in values] for k, v in d.items()}

print(result)

Output

{'A': [1, 1, 0], 'B': [1, 0, 0], 'C': [1, 0, 1], 'D': [0, 1, 0]}

If there are many values you could use sets for the containment test, something like this:

d = {'A': [1, 2], 'B': [1], 'C': [1, 3], 'D': [2]}
d = { k : set(v) for k, v in d.items() }

A more concise way suggested by @Ev. Kounis is to do:

result = {k : [int(value in v) for value in values] for k, v in d.items()}

Finally if you are interested in a list of list data structure, (i.e a matrix), you could put the values of result in a list:

print(list(result.values()))

Output

[[1, 1, 0], [1, 0, 0], [1, 0, 1], [0, 1, 0]]

you could also do `[int(value in v) for value in values]` – Ma0 Mar 25 '20 at 15:39 — Ma0, Mar 25 '20 at 15:39

Ma0 · Answer 2 · 2020-03-25T15:56:00.987

Assuming that:

there can be gaps in the values of the original dict which should not be skipped and
you are only interested in the range defined by min and max value

you can do:

d = {'A':[1,2], 'B':[1], 'C':[1,3], 'D':[2]}

values_flat = {v for sub in d.values() for v in sub}
max_value = max(values_flat)
min_value = min(values_flat)

result = {k : [int(i in v) for i in range(min_value , max_value +1)] for k, v in d.items()}

print(result)  # {'A': [1, 1, 0], 'B': [1, 0, 0], 'C': [1, 0, 1], 'D': [0, 1, 0]}

Note that I took the liberty to redefine and rename your original dict. Now all the values are lists. Consistent data are essential, so if you have any control over your input make sure you sanitize it first. Also note that dict is a bad variable name since it shadows the Python built-in.

Create co-occurrence matrix from dictionary key, value(s) in Python

2 Answers2