1

Task:

My task is to compare my strings in first column inside of sha1_vsdt.csv and strings in trendx.log: When it matched, it should get the description inside of the log file then put it in the third column of csv, otherwise put undetected

But the trendx.log can't be read, what I did is - I copied the contents of trendx.log then paste it in a notepad then after I saved it, it is readable. Here is the the readable log file - trend2.log. I think the unicode format is the problem.

How can I read this log file guys? is there anyway to convert this? I already tried to encode this using utf-16le but I only prints 3 lines

Here is my code

import numpy as np
import pandas as pd
import csv
import io
import shutil


pd.set_option('display.max_rows', 1000)
logtext = "trendx.log"

#Log data into dataframe using genfromtxt
logdata = np.genfromtxt(logtext,invalid_raise = False,dtype=str, comments=None,usecols=np.arange(16))
logframe = pd.DataFrame(logdata)
#print (logframe.head())

#Dataframe trimmed to use only SHA1, PRG and IP
df2=(logframe[[10,11]]).rename(columns={10:'SHA-1', 11: 'DESC'})
#print (df2.head())

#sha1_vsdt data into dataframe using read_csv
df1=pd.read_csv("sha1_vsdt.csv",delimiter=",",error_bad_lines=False,engine = 'python',quoting=3)
#Using merge to compare the two CSV

df = pd.merge(df1, df2, on='SHA-1', how='left').fillna('undetected')
df1['DESC'] = df['DESC'].values

df1.to_csv("sha1_vsdt.csv",index=False)

Output in csv using: trendx.log all is undetected from row 1 - 584

enter image description here

Correct output in csv using: trend2.log

enter image description here

  • Possible duplicate of [Python - Decode UTF-16 file with BOM](https://stackoverflow.com/questions/22459020/python-decode-utf-16-file-with-bom) – Harvey Oct 08 '18 at 01:31
  • The `file` command helps. `file trendx.log` => `Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators` – Harvey Oct 08 '18 at 01:34
  • can u explain further the code, so i can accept it –  Oct 08 '18 at 01:38

1 Answers1

0

This file is encoded as UTF-16-LE. Pass in the encoding flag when you read the file, like this:

logdata = np.genfromtxt(logtext, invalid_raise=False,dtype=str, comments=None,usecols=np.arange(16), encoding='utf_16-le')

phihag
  • 278,196
  • 72
  • 453
  • 469