Can't read log file but can read after copy paste to notepad

Question

Task:

My task is to compare my strings in first column inside of sha1_vsdt.csv and strings in trendx.log: When it matched, it should get the description inside of the log file then put it in the third column of csv, otherwise put undetected

But the trendx.log can't be read, what I did is - I copied the contents of trendx.log then paste it in a notepad then after I saved it, it is readable. Here is the the readable log file - trend2.log. I think the unicode format is the problem.

How can I read this log file guys? is there anyway to convert this? I already tried to encode this using utf-16le but I only prints 3 lines

Here is my code

import numpy as np
import pandas as pd
import csv
import io
import shutil


pd.set_option('display.max_rows', 1000)
logtext = "trendx.log"

#Log data into dataframe using genfromtxt
logdata = np.genfromtxt(logtext,invalid_raise = False,dtype=str, comments=None,usecols=np.arange(16))
logframe = pd.DataFrame(logdata)
#print (logframe.head())

#Dataframe trimmed to use only SHA1, PRG and IP
df2=(logframe[[10,11]]).rename(columns={10:'SHA-1', 11: 'DESC'})
#print (df2.head())

#sha1_vsdt data into dataframe using read_csv
df1=pd.read_csv("sha1_vsdt.csv",delimiter=",",error_bad_lines=False,engine = 'python',quoting=3)
#Using merge to compare the two CSV

df = pd.merge(df1, df2, on='SHA-1', how='left').fillna('undetected')
df1['DESC'] = df['DESC'].values

df1.to_csv("sha1_vsdt.csv",index=False)

Output in csv using: trendx.log all is undetected from row 1 - 584

Correct output in csv using: trend2.log

Possible duplicate of [Python - Decode UTF-16 file with BOM](https://stackoverflow.com/questions/22459020/python-decode-utf-16-file-with-bom) — Harvey, Oct 08 '18 at 01:31
The `file` command helps. `file trendx.log` => `Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators` — Harvey, Oct 08 '18 at 01:34

score 0 · Accepted Answer · answered Oct 08 '18 at 01:52

0

This file is encoded as UTF-16-LE. Pass in the encoding flag when you read the file, like this:

logdata = np.genfromtxt(logtext, invalid_raise=False,dtype=str, comments=None,usecols=np.arange(16), encoding='utf_16-le')

answered Oct 08 '18 at 01:52

phihag

278,196
72
453
469

Can't read log file but can read after copy paste to notepad

1 Answers1