0

I am trying to create a co-occurrence matrix from a dictionary of unique keys with overlapping values (in Python 3). Here is my data structure:

keys = ['A','B','C','D']
vals = [[1,2],1,[1,3],2]

dict = {'A':[1,2], 'B':1, 'C':[1,3], 'D':2]}

How can I create matrix that counts the occurrences of the values for each key in the form:?

   1.  2.  3. 
A. 1   1   0 
B. 1   0   0 
C. 1   0   1 
D. 0   1   0 

I've been recommended to use defaultdict but I am not sure how to implement it. Thank you!

AHN
  • 1
  • 1
    what would the result be if it was `vals = [[1,2],1,[1,2],4]`? Note that there is no `3` in this case – Ma0 Mar 25 '20 at 15:40
  • 1. 2. 3. 4. A. 1 1 0 0 B. 1 0 0 0 C. 1 1. 0. 0. D. 0 0 0. 1 – AHN Mar 25 '20 at 15:51
  • So the `3` would be included even if it is missing from the original data. Note that this is not covered by the otherwise great answer from [@Dani](https://stackoverflow.com/a/60852143/6162307). This also highlights the need for a good general example! – Ma0 Mar 25 '20 at 15:55

2 Answers2

0

You could do:

d = {'A': [1, 2], 'B': [1], 'C': [1, 3], 'D': [2]}

values = sorted(set(e for v in d.values() for e in v))

result = {k : [1 if value in v else 0 for value in values] for k, v in d.items()}

print(result)

Output

{'A': [1, 1, 0], 'B': [1, 0, 0], 'C': [1, 0, 1], 'D': [0, 1, 0]}

If there are many values you could use sets for the containment test, something like this:

d = {'A': [1, 2], 'B': [1], 'C': [1, 3], 'D': [2]}
d = { k : set(v) for k, v in d.items() }

A more concise way suggested by @Ev. Kounis is to do:

result = {k : [int(value in v) for value in values] for k, v in d.items()}

Finally if you are interested in a list of list data structure, (i.e a matrix), you could put the values of result in a list:

print(list(result.values()))

Output

[[1, 1, 0], [1, 0, 0], [1, 0, 1], [0, 1, 0]]
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
0

Assuming that:

  • there can be gaps in the values of the original dict which should not be skipped and
  • you are only interested in the range defined by min and max value

you can do:

d = {'A':[1,2], 'B':[1], 'C':[1,3], 'D':[2]}

values_flat = {v for sub in d.values() for v in sub}
max_value = max(values_flat)
min_value = min(values_flat)

result = {k : [int(i in v) for i in range(min_value , max_value +1)] for k, v in d.items()}

print(result)  # {'A': [1, 1, 0], 'B': [1, 0, 0], 'C': [1, 0, 1], 'D': [0, 1, 0]}

Note that I took the liberty to redefine and rename your original dict. Now all the values are lists. Consistent data are essential, so if you have any control over your input make sure you sanitize it first. Also note that dict is a bad variable name since it shadows the Python built-in.

Ma0
  • 15,057
  • 4
  • 35
  • 65