2

I have a problem with representing website user behaviour in a Adjacency Matrix in Python. I want to analyze the user interaction between 43 different websites to see which websites are used together.

The given data set has about 13.000.000 lines with following structure:

 user website
 id1  web1
 id1  web2
 id1  web2
 id2  web1
 id2  web2
 id3  web3
 id3  web2

I would like to visualize the interactions between the website in a Adjacency Matrix like this:

     web1 web2 web3
 web1  2    2    0
 web2  2    4    1
 web3  0    1    1

I'm happy for any advice

Community
  • 1
  • 1
Duesentrieb
  • 492
  • 2
  • 7
  • 18

1 Answers1

6
import scipy.sparse

data = """
 id1  web1
 id1  web2
 id1  web2
 id2  web1
 id2  web2
 id3  web3
 id3  web2
"""

data = np.array(data.split()).reshape(-1, 2)
_, i = np.unique(data[:, 0], return_inverse=True)
_, j = np.unique(data[:, 1], return_inverse=True)

incidence = scipy.sparse.coo_matrix((np.ones_like(i), (i,j)))
adjecency = incidence.T * incidence

print(adjecency.todense())
Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42