I am trying to parse CDR records that are coming from an ACME PACKET SBC. The data for the most part is CSV the issue that I am running into is that there is some odd cases that I am running into and it isn't for every CDR it is only for some of them. I initially started trying to do this with CSV reader like such.
CDR with redacted information
...stop,"{phone_number} sip:{phone_number}@{ip_address}", ""{first_name},{last_name}"sip:{phone_number}@{ip_address}", "NAS-Identifier"...
There are 300 other columns in the field but this is one of the examples as to where I am falling short.
snippet of code
import csv
import io
import gzip
col_names = ['Record-Type', 'Calling-Station-ID', 'Called-Station-ID', 'NAS-Identifier']
with gzip.open(filename, 'r') as f:
reader.csv.DictReader(io.TextIOWrapper(f, newline='\n'), fielnames=COL_NAMES, skipinitialspace=True)
try:
for row in reader:
print('record-type: ' + row['Record-Type'])
print('calling-station-id: ' + row['Calling-Station-ID'])
print('called-station-id: ' + row['Called-Station-ID'])
print('NAS-Identifier: ' + row['NAS-Identifier])
except csv.Error as e:
print(e)
with the following code I am getting some what of the following
record-type: Stop
called-station-id: {phone_number} sip:{phone_number}@{ip_address}
calling-station-id: {first_name}
NAS-Identifier: {last_name} sip:{phone_number}@{ip_address}
I tried to also read this with pandas and get roughly the same thing where it is splitting because of commas in side quotechar.
Plus with the with " being inside the quotechar I think there is also problems going on there.
This is also some what variable on most of the records this works fine there are only a few hundred that have this issue but since I can't tell how the record is formatted before it hits me I wouldn't be able to parse these any differently.
All help is much appreciated.
Thanks