I want to compute correlation percentages between multiple items that appear in log files. In doing so, I get the number of times they appear divided by the number of times they appear while another item was present. I won't go too much in the details but this correlation is not symmetrical (The correlation between A and B is not the same as between B and A)
As an output I have a dictionary that has a format like this one :
{
itemA: {
itemB: 0.85,
itemC: 0.12
},
itemB: {
itemC: 0.68,
itemA: 0.24
},
itemC: {
itemA: 0.28
}
}
I have tried working with DictVectorizer
from sklearn
but it doesn't work since it requires a list of dictionaries.
I would like the output to be a matrix for visualisation with matplotlib
something like this :
[[1,0.85,0.12]
[0.68,1,0.24]
[0.28,0,1]]
If possible, I would also like to have a matplotlib visualisation with a legend for each line and column, since my dict has way more than 3 items.
I hope that everything is clear. Thank you for your help.