Working with named tuples to output specific data

Question

I am having some trouble working with initializing my data so that I can call specific values by their keys...

This is my code so far:

from kafka import KafkaConsumer
import ast
from collections import namedtuple
import json
import csv
import sys
from datetime import datetime
import os

# connect to kafka topic
kaf = KafkaConsumer('kafka.topic',
                   auto_offset_reset='earliest', bootstrap_servers=['consumer-kafka.server'])
outputfile = 'C:\\Users\\Documents\\KafkaConsum\\file.csv'

outfile = open(outputfile, mode='w', newline='')

for row in kaf:
    a = row.value.decode("utf-8")
        if "TAG_NAME" in a:
            print(a)
            outfile.write(a + '\n')

This is how my data is formatted:

2018-12-04 13:27:12,511 [a-1 app=helloname,receiverId=abc-abc-123-123,partner=company] INFO kafka.consumer.topic TAG_NAME Type='Generic App' Class= UpdateCheck description=Version1 appName="TWITTER" appAction="start"

2018-12-04 13:27:12,511 [a-1 app=helloname,receiverId=abc-abc-123-123,partner=company] INFO kafka.consumer.topic TAG_NAME Type='Generic App' Class= UpdateCheck description=Version1 appName="TWITTER" appAction="start"

I am looking to be able to parse this data to look like this in my csv file:

app | receiverId | partner | Type | Class | description | appName | appAction |

helloname | abc-abc-123-123 | company | Generic App | UpdateCheck | Version1 | TWITTER | start |

helloname | abc-abc-123-123 | company | Generic App | UpdateCheck | Version1 | TWITTER | start |

You can use regular expression to extract the data from each line (example https://stackoverflow.com/questions/30627810/how-to-parse-this-custom-log-file-in-python) — Mohamed Ali JAMAOUI, Dec 07 '18 at 16:32

Chris Charley · Answer 1 · 2018-12-08T17:01:27.783

Here is a solution, but it doesn't use csv (probably should).

It grabs the header and the value in findall(... and then below that, it separates the header from the value (separated by the = sign) and writes the header (one time only) and all of the values.

import re

def main():
    header = True
    fin = open('f3.txt', 'r')
    for line in fin:
        data = re.findall(r'\w+=\s*[\'"]?[\w-]+', line)
        headers = []
        array = []
        for pair in data:
            m = re.search(r'(\w+)=\s*[\'"]?([\w-]+)', pair)
            headers.append(m.group(1)) # get header
            array.append(m.group(2))   # get value

        if header == True:
            print('|'.join(headers))
            header = False
        print('|'.join(array))
    fin.close()

main()

This produced this output:

app|receiverId|partner|Type|Class|description|appName|appAction
helloname|abc-abc-123-123|company|Generic|UpdateCheck|Version1|TWITTER|start
helloname|abc-abc-123-123|company|Generic|UpdateCheck|Version1|TWITTER|start

score 0 · Answer 2 · answered Dec 07 '18 at 19:22

As Medali has said, you can use regular expression to get the data you want and separate it properly. Something along the lines of;

import re

pattern = r'app=(.*?),'
app = re.search(pattern, a).group(1)

you could actually have a list of those headers you want and make a for loop through the pattern saving it in a dictionary and then write that directly to a csv.

you'll need a new variable csv_outfile or such and change your open variables;

headers = ['app', 'receiverid', .... , 'appAction']
outfile = open(outputfile, mode='wb')
csv_outfile = csv.DictWriter(outfile, headers, delimiter = '|')
csv_outfile.writeheader()


for header in headers:
    pattern = header + r'=(.*?),'
    my_dict[header] = re.search(pattern, a).group(1)
csv_outfile.writerow(my_dict)

I think this answers your questions?

I attempted using this but I keep getting the errors **AttributeError: 'NoneType' object has no attribute 'group'** and **TypeError: unhashable type: 'list'**. I did make a few modifications such as adding `my_dict = {} ` and `my_dict[headers] = re.search(pattern, str(a)).group(1)` — j.Doe, Dec 10 '18 at 14:18
`AttributeError: 'NoneType' object has no attribute 'group'` means that you are not getting any results from the search, make sure the header is correct. Do you know where the `TypeError: unhashable type: 'list'.` is coming from in the code? — SRT HellKitty, Dec 10 '18 at 19:29

Working with named tuples to output specific data

2 Answers2