For my project, I need to read file and match it with my constants and once matches, need to store them in a dictionary. I am going to show a sample of my data and what I have so far below.
My data:
TIMESTAMP: 1579051725 20100114-202845
.1.2.3.4.5.6.7.8.9 = 234567890
ifTb: name-nam-na
.1.3.4.1.2.1.1.1.1.1.1.128 = STRING: AA1
.1.3.4.1.2.1.1.1.1.1.1.129 = STRING: Eth1
.1.3.4.1.2.1.1.1.1.1.1.130 = STRING: Eth2
This data has 5 important parts I want to gather:
Date right after timestamp:
1579051725
Num
(first part of the numbers until 128, 129, 130,etc):.1.3.4.1.2.1.1.1.1.1.1
Num2
(second part):128
or129
or130
or others in my larger data setSyntax
: In this case it is named:STRING
Counter
: In this case they are strings;AA1
orEth1
orEth2
I also have (need to have) constant Num
as dictionary within the program that holds the value above and constant syntax
I want to read through the data file,
If
Num
matches the constant I have within the program,grab
Num2
,check if
Syntax
matches the constantsyntax
within the programgrab
Counter
When I say grab, I mean put that data under corresponding dictionary.
In short, I want to read through the data file, split 5 variables within it, match 2 variables with constant dictionary values, and grab and store 3 variables (including time) under dictionary.
I have trouble with splitting the data as of right now. I can split everything except Num
and Num2
. Also I am not sure how to create the constant dictionaries and how I should put under the constant dictionaries.
I would love to use regular expression instead of using if statement, but could not figure out what symbols to use since data includes many dots within the words.
I have the following so far:
constant_dic1 = {[".1.3.4.1.2.1.1.1.1.1.1"]["STRING" ]}
data_cols = {'InterfaceNum':[],"IndexNum":[],"SyntaxName":[],"Counter":[],"TimeStamp":[]}
fileN = args.File_Name
with open (fileN, 'r') as f:
for lines in f:
if lines.startswith('.'):
if ': ' in lines:
lines=lines.split("=")
first_part = lines[0].split()
second_part = lines[1].split()
for i in first_part:
f_f = i.split("{}.{}.{}.{}.{}.{}.{}.{}.{}.{}.{}.")
print (f_f[0])
Once I run the program, I receive the error that that "TypeError: list indices must be integers or slices, not str".
When I comment out the dictionary part, output is Num
as well as Num2
. It does not get split and does not print just the Num
part.
Any help is appreciated! If there's any other source, please let me know below. Please let me know if I need any updates on the question without down voting. Thanks!
UPDATED CODE
import pandas as pd
import io
import matplotlib
matplotlib.use('TkAgg') # backend option for matplotlib #TkAgg #Qt4Agg #Qt5Agg
import matplotlib.pyplot as plt
import re # regular expression
import argparse # for optional arguments
parser = argparse.ArgumentParser()
parser.add_argument('File_Name', help="Enter the file name | At least one file is required to graph")
args=parser.parse_args()
data_cols = {'InterfaceNum':[],"IndexNum":[],"SyntaxName":[],"Counter":[],"TimeStamp":[]}
fileN = args.File_Name
input_data = fileN
expr = r"""
TIMESTAMP:\s(\d+) # date - TimeStamp
| # ** OR **
((?:\.\d+)+) # num - InterfaceNum
\.(\d+)\s=\s # num2 - IndexNum
(\w+):\s # syntax - SyntaxName
(\w+) # counter - Counter
"""
expr = re.compile(expr, re.VERBOSE)
data = {}
keys = ['TimeStamp', 'InterfaceNum', 'IndexNum', 'SyntaxName', 'Counter']
with io.StringIO(input_data) as data_file:
for line in data_file:
try:
find_data = expr.findall(line)[0]
vals = [date, num, num2, syntax, counter] = list(find_data)
if date:
cur_date = date
data[cur_date] = {k: [] for k in keys}
elif num:
vals[0] = cur_date
for k, v in zip(keys, vals):
data[cur_date][k].append(v)
except IndexError:
# expr.findall(...)[0] indexes an empty list when there's no
# match.
pass
data_frames = [pd.DataFrame.from_dict(v) for v in data.values()]
print(data_frames[0])
ERROR I GET
Traceback (most recent call last):
File "v1.py", line 47, in <module>
print(data_frames[0])
IndexError: list index out of range
NEW DATA
TIMESTAMP: 1579051725 20100114-202845
.1.2.3.4.5.6.7.8.9 = 234567890
ifTb: name-nam-na
.1.3.4.1.2.1.1.1.1.1.1.128 = STRING: AA1
.1.3.4.1.2.1.1.1.1.1.1.129 = STRING: Eth1
.1.3.4.1.2.1.1.1.1.1.1.130 = STRING: Eth2
.1.2.3.4.5.6.7.8.9.10.11.131 = INT32: A
UPDATED CODE (v2)
import pandas as pd
import io
import matplotlib
import re # regular expression
file = r"/home/rusif.eyvazli/Python_Projects/network-switch-packet-loss/s_data.txt"
def get_dev_data(file_path, timestamp=None, iface_num=None, idx_num=None,
syntax=None, counter=None):
timestamp = timestamp or r'\d+'
iface_num = iface_num or r'(?:\.\d+)+'
idx_num = idx_num or r'\d+'
syntax = syntax or r'\w+'
counter = counter or r'\w+'
# expr = r"""
# TIMESTAMP:\s({timestamp}) # date - TimeStamp
# | # ** OR **
# ({iface_num}) # num - InterfaceNum
# \.({idx_num})\s=\s # num2 - IndexNum
# ({syntax}):\s # syntax - SyntaxName
# ({counter}) # counter - Counter
# """
expr = r"TIMESTAMP:\s(\d+)|((?:\.\d+)+)\.(\d+)\s=\s(\w+):\s(\w+)"
# expr = re.compile(expr, re.VERBOSE)
expr = re.compile(expr)
rows = []
keys = ['TimeStamp', 'InterfaceNum', 'IndexNum', 'SyntaxName', 'Counter']
cols = {k: [] for k in keys}
with open(file_path, 'r') as data_file:
for line in data_file:
try:
find_data = expr.findall(line)[0]
vals = [tstamp, num, num2, sntx, ctr] = list(find_data)
if tstamp:
cur_tstamp = tstamp
elif num:
vals[0] = cur_tstamp
rows.append(vals)
for k, v in zip(keys, vals):
cols[k].append(v)
except IndexError:
# expr.findall(line)[0] indexes an empty list when no match.
pass
return rows, cols
const_num = '.1.3.4.1.2.1.1.1.1.1.1'
const_syntax = 'STRING'
result_5 = get_dev_data(file)
# Use the results of the first dict retrieved to initialize the master
# dictionary.
master_dict = result_5[1]
df = pd.DataFrame.from_dict(master_dict)
df = df.loc[(df['InterfaceNum'] == '.1.2.3.4.5.6.7.8.9.10.11') & (df['SyntaxName'] == 'INT32' )]
print(f"\n{df}")
OUTPUT
TimeStamp InterfaceNum IndexNum SyntaxName Counter
3 1579051725 .1.2.3.4.5.6.7.8.9.10.11 131 INT32 A