2

enter image description here

sample first row of event log file ,here i have successfully extracted evrything apart from last key value pair which is attribute-

{"event_type":"ActionClicked","event_timestamp":1451583172592,"arrival_timestamp":1451608731845,"event_version":"3.0",
  "application":{"app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:
    4d9cf803-0487-44ec-be27-1e160d15df74","package_name":"com.think.vito","sdk":{"name":"aws-sdk-android","version":"2.2.2"}
    ,"title":"Vito","version_name":"1.0.2.1","version_code":"3"},"client":{"client_id":"438b152e-5b7c-4e99-9216-831fc15b0c07",
      "cognito_id":"us-east-1:448efb89-f382-4975-a1a1-dd8a79e1dd0c"},"device":{"locale":{"code":"en_GB","country":"GB",
        "language":"en"},"make":"samsung","model":"GT-S5312","platform":{"name":"ANDROID","version":"4.1.2"}},
  "session":{"session_id":"c15b0c07-20151231-173052586","start_timestamp":1451583052586},"attributes":{"OfferID":"20186",
    "Category":"40000","CustomerID":"304"},"metrics":{}}

Hello Every One ,I am trying to extract the content from Event log file as shown in attached image .As to requirement i have to fetch customer ID, offer id, category these are important variable i need to extract from the this event log file .this is csv formatted file. i tryed with regular expression but it is't working because you can observe format of every column is different. As you see first row has category customer id offer id and second row is totally blank in this case regular expression wont work apart from this we have to consider we have to consider all possible condition, we has 14000 sample.in Event log file ...#Jason # Parsing #Python #Pandas

shivsn
  • 7,680
  • 1
  • 26
  • 33
Nabi Shaikh
  • 787
  • 1
  • 6
  • 26
  • 1
    Is this a plain text file? Does every line start and end with `{}`? If so, seems like you can read the file line by line and use `literal_eval` to turn each line to a Python `dict` object. – DeepSpace Jul 10 '16 at 08:12
  • 1
    Can you provide the actual piece of your data log instead of the image format? You don't expect us to type your data one by one, right? – MaThMaX Jul 10 '16 at 08:22
  • yes , it was in txt format earlier.it was huge file i extracted below variable from event log file event_type event_timestamp arrival_timestamp event_version application { app_id cognito_identity_pool_id } client{} device{} session{} attributes{} – Nabi Shaikh Jul 10 '16 at 08:23
  • Why do you have single quotes in the image but double quotes in the text? (The latter could be in JSON format.) – ayhan Jul 10 '16 at 08:35
  • @ayhan image file is in csv format and where as the in text form its in .txt format ...after extracting from .txt file i separated every key to individual csv file . – Nabi Shaikh Jul 10 '16 at 09:12
  • can i just extract only the value related to one key , and create a column for that key only ,but the problem here is some rows might have that key and some rows may not have ...in that case its difficult ..@DeepSpace – Nabi Shaikh Jul 10 '16 at 09:18

2 Answers2

2

Edit

The data, after your edit, now appears to be JSON data. You can still use literal_eval as below, or you could use the json module:

import json

with open('event.log') as events:
    for line in events:
        event = json.loads(line)
        # process event dictionary

To access the CustomerID, OfferID, Category etc. you need to access the nested dictionary associated with the key 'attributes' in the event dictionary:

print(event['attributes']['CustomerID'])
print(event['attributes']['OfferID'])
print(event['attributes']['Category'])

If it is the case that some keys could be missing use dict.get() instead:

print(event['attributes'].get('CustomerID'))
print(event['attributes'].get('OfferID'))
print(event['attributes'].get('Category'))

Now you will get None if the key is missing.

You can extend this principle to access other items with the dictionary.

If I understand your question you also want to create a CSV file containing the extracted fields. You use the extracted values with csv.DictWriter like this:

import csv

with open('event.log') as events, open('output.csv', 'w') as csv_file:
    fields = ['CustomerID', 'OfferID', 'Category']
    writer = csv.DictWriter(csv_file, fields)
    writer.writeheader()
    for line in events:
        event = json.loads(line)
        writer.writerow(event['attributes'])

DictWriter will simply leave fields empty when the dictionary is missing keys.


Original answer The data is not in CSV format, it appears to contain Python dictionary strings. These can be parsed into Python dictionaries using ast.literal_eval():

from ast import literal_eval

with open('event.log') as events:
    for line in events:
        event = literal_eval(line)
        # process event dictionary
mhawke
  • 84,695
  • 9
  • 117
  • 138
  • we require to extract the values of customer id and offer id and category and also in some rows "{ }" with no key : value pair in it Sir , the Result was >>> event {u'MenuItem': u'Category', u'CustomerID': u'364'} @mhawke – Nabi Shaikh Jul 10 '16 at 09:08
  • @NabiShaikh: Once you have the dictionary you can access the attributes in it. Looking at your updated sample of data (which now looks to be JSON data!) you actually have nested dictionaries, so you would access the customer id with `event['attributes']['CustomerID']` for example. – mhawke Jul 10 '16 at 09:31
  • ,The EVENT LOG file is in .txt format , its not jason format i am facing error Traceback (most recent call last): File "", line 7, in File "C:\Anaconda2\lib\csv.py", line 152, in writerow return self.writer.writerow(self._dict_to_list(rowdict)) File "C:\Anaconda2\lib\csv.py", line 148, in _dict_to_list + ", ".join([repr(x) for x in wrong_fields])) ValueError: dict contains fields not in fieldnames: u'Lat', u'Long' – Nabi Shaikh Jul 10 '16 at 11:35
  • @NabiShaikh: it is a text file, but the contents are JSON. The `json` parser successfully parses it, doesn't it? Don't pass dictionaries to `DictWriter.writerow()` that contain keys that you have not defined in the `fieldnames` argument to `DictWriter`. In this case `Lat` and `Long` are being passed to `writerow()`. Don't do that. – mhawke Jul 10 '16 at 12:11
1

This might not be the most efficient way to convert nested json records in a text file (delimited by line) to DataFrame object, but it kinda does the job.

import pandas as pd
import json
from pandas.io.json import json_normalize

with open('path_to_your_text_file.txt', 'rb') as f:
    data = f.readlines()

data = map(lambda x: eval(json_normalize(json.loads(x.rstrip())).to_json(orient="records")[1:-1]), data)
e = pd.DataFrame(data)
print e.head()
Mohammad Yusuf
  • 16,554
  • 10
  • 50
  • 78