0

So I created a python script where from a given data of query logs, I transform it into a list of nested dictionary and write it to a new text file.

This is a sample output of my script:

[{'ip_address': '10.10.80.209', 'domain_names': {'google.com': 2}},
{'ip_address': '10.10.25.188', 'domain_names': {'fbcdn-profile-a.akamaihd.net': 1}},
{'ip_address': '10.10.50.195', 'domain_names': {'googleads.g.doubleclick.net': 2, '0-edge-chat.facebook.com': 2, 'gg.google.com': 2, 'content.googleapis.com': 1, 'accounts.google.com': 1}}]

As you can see, I have a list of user transactions, which contains two entries: the key-value pair ip_address and the dictionary of domain_names, which in turn contains a dictionary of domain names and their visit count (e.g. 'google.com': 2).

Somehow, I need to transform this file into a co-occurrence matrix, as from what you can see in this image: where t is the user transactions, d is the domain names and the value is the visit count (as you can see, visit count = 0 if the user didn't visit that certain domain name).

The data I created is close to this concept already, the problem is I have to transform it into a matrix (consequently, for each non-existing visited domain name in a user transaction, the value must be 0, but my list of nested dictionary only provides "visited" values) and save it as a .mat file type.

It needs to be a .mat file because the script for clustering this data requires a .mat file type. From what I've known, .mat is a file type for MATLAB, and I have no prior knowledge regarding that language.

So how do I do this?

Daniel
  • 36,610
  • 3
  • 36
  • 69
  • for the sample you show, please also show what you want to convert it to, and for the second part, you have to investigate yourself, try to use some library for writing .mat files, try to code the conversion yourself, and if you get any specific code problem in the process, you should come back here with a question. so.. this also means, remove the second part of your question, and make it a new one later, if you don't manage it yourself. – hoijui Aug 30 '15 at 04:47
  • Writing mat-files in python has been addressed multiple times in other questions here. You can either use `scipy.io.savemat` to write the old format, or use any HDF5-library to write a gzip compressed HDF5-File can call it `.mat` – Daniel Aug 30 '15 at 09:55

0 Answers0