Task:
My task is to compare my strings in first column inside of sha1_vsdt.csv and strings in trendx.log: When it matched, it should get the description inside of the log file then put it in the third column of csv, otherwise put undetected
But the trendx.log
can't be read, what I did is - I copied the contents of trendx.log then paste it in a notepad then after I saved it, it is readable.
Here is the the readable log file - trend2.log. I think the unicode format is the problem.
How can I read this log file guys? is there anyway to convert this? I already tried to encode this using utf-16le
but I only prints 3 lines
Here is my code
import numpy as np
import pandas as pd
import csv
import io
import shutil
pd.set_option('display.max_rows', 1000)
logtext = "trendx.log"
#Log data into dataframe using genfromtxt
logdata = np.genfromtxt(logtext,invalid_raise = False,dtype=str, comments=None,usecols=np.arange(16))
logframe = pd.DataFrame(logdata)
#print (logframe.head())
#Dataframe trimmed to use only SHA1, PRG and IP
df2=(logframe[[10,11]]).rename(columns={10:'SHA-1', 11: 'DESC'})
#print (df2.head())
#sha1_vsdt data into dataframe using read_csv
df1=pd.read_csv("sha1_vsdt.csv",delimiter=",",error_bad_lines=False,engine = 'python',quoting=3)
#Using merge to compare the two CSV
df = pd.merge(df1, df2, on='SHA-1', how='left').fillna('undetected')
df1['DESC'] = df['DESC'].values
df1.to_csv("sha1_vsdt.csv",index=False)
Output in csv using: trendx.log
all is undetected from row 1 - 584
Correct output in csv using: trend2.log