0

I am trying to parse CDR records that are coming from an ACME PACKET SBC. The data for the most part is CSV the issue that I am running into is that there is some odd cases that I am running into and it isn't for every CDR it is only for some of them. I initially started trying to do this with CSV reader like such.

CDR with redacted information

...stop,"{phone_number} sip:{phone_number}@{ip_address}", ""{first_name},{last_name}"sip:{phone_number}@{ip_address}", "NAS-Identifier"...

There are 300 other columns in the field but this is one of the examples as to where I am falling short.

snippet of code

import csv
import io
import gzip
col_names = ['Record-Type', 'Calling-Station-ID', 'Called-Station-ID', 'NAS-Identifier']
with gzip.open(filename, 'r') as f:
    reader.csv.DictReader(io.TextIOWrapper(f, newline='\n'), fielnames=COL_NAMES, skipinitialspace=True)
    try:
        for row in reader:
            print('record-type: ' + row['Record-Type'])
            print('calling-station-id: ' + row['Calling-Station-ID'])
            print('called-station-id: ' + row['Called-Station-ID'])
            print('NAS-Identifier: ' + row['NAS-Identifier])
   except csv.Error as e:
       print(e)

with the following code I am getting some what of the following

record-type: Stop
called-station-id: {phone_number} sip:{phone_number}@{ip_address}
calling-station-id: {first_name}
NAS-Identifier: {last_name} sip:{phone_number}@{ip_address}

I tried to also read this with pandas and get roughly the same thing where it is splitting because of commas in side quotechar.

Plus with the with " being inside the quotechar I think there is also problems going on there.

This is also some what variable on most of the records this works fine there are only a few hundred that have this issue but since I can't tell how the record is formatted before it hits me I wouldn't be able to parse these any differently.

All help is much appreciated.

Thanks

  • What's your question / problem? – martineau May 27 '15 at 00:43
  • 1
    Can you provide us with a sample of the data that clearly reproduces the problem? Ideally just two or three lines, demonstrating both "good" lines and lines that cause the problems you're seeing. – larsks May 27 '15 at 03:28
  • @martineau The question is I can't seem to get python to read the CSV correctly because the some of the values have dual quotes and commas in the it. Which will cause it to go the next variable. – user2049448 May 27 '15 at 14:39
  • @larsks I can see if I can get some of the data I would have to scrub it pretty thoroughly. Maybe I can find an example somewhere – user2049448 May 27 '15 at 14:40
  • For redacted information in sample data, please use something other than `{whatever}` because `{}` brackets have special meaning in Python. Just put in some nonsensical but normally formatted values. – martineau May 27 '15 at 14:45
  • Unfortunately I can't do this I can send a generic ACME CDR but ours are quite a bit different. I have kind of found a way of doing this there are about 10 fields randomly through the CDR that are always the same. So I parse the CDR file using the CSV module and check those knowable fields. Then I go fix check the data in fields that are usually broken and merge the cells together. This has worked for about 150,000 records so far. It hasn't broken yet. – user2049448 Jun 02 '15 at 21:08

0 Answers0