0

I am having a the following string:

>>> line = '\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'

When I type the variable line in the python terminal it showing the following:

>>> line
'\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'

When I am printing it, its showing the following:

>>> print line
        7    Cardio Metabolic Care               12,788,528.04

In the variable line each word is separated using \t and I wanted to save it to a csv file. So I tried using the following code:

import csv
with open('test.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')
    spamwriter.writerow(line.split('\t'))

When I look into the test.csv file, I am getting only the following

,,,,,,

Is there any to get the words into the csv file. Kindly help.

Jeril
  • 7,858
  • 3
  • 52
  • 69
  • CSV doesn't actually stand for Comma. It stands for Tab as well. So you already have a CSV! – e4c5 May 16 '17 at 05:55
  • actually I am trying to convert a corrupted file to csv file. – Jeril May 16 '17 at 05:59
  • 1
    This may help: http://stackoverflow.com/questions/29230943/importing-a-text-file-gives-error – DYZ May 16 '17 at 06:01
  • what does `print(line.split("\t"))` give you? – e4c5 May 16 '17 at 06:02
  • @e4c5 it gives me the following: `['\x00', '\x007\x00', '\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00', '\x00', '\x00', '\x00', '\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n']` – Jeril May 16 '17 at 06:05
  • 1
    You are reading your file incorrectly. Open it with `open("source.csv","r", "utf-16")` or `io.open("source.csv","r", encoding = "utf-16")`. – DYZ May 16 '17 at 06:12
  • @DYZ i referred to your previous comment and its working. I used the following: `import io; file1 = io.open(filename, "r", encoding="utf-16")` its giving me the answer. With `utf-8` its giving me `UnicodeDecodeError`. Thanks a lot. – Jeril May 16 '17 at 06:19

1 Answers1

2

Your input text is not corrupted, it's encoded - as UTF-16 (Big Endian in this case). And it's CSV itself, just with tab as the delimiter.

You must decode it into a string, after that you can use it normally.

Ideally you declare the proper byte encoding when you read it from a source. For example, when you open a file you can state the encoding the file uses so that the file reader will decode the contents for you.

If you have that byte string from a source where you can't declare an encoding while reading it, you can decode manually:

line = '\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'
decoded = line.decode('utf_16_be')

print decoded
#   7   Cardio Metabolic Care                12,788,528.04

But since I suppose that you are actually reading it from a file:

import csv
import codecs

with codecs.open('input.txt', 'r', encoding='utf16') as in_file, codecs.open('output.csv', 'w', encoding='utf8') as out_file:
    reader = csv.reader(in_file, delimiter='\t')
    writer = csv.writer(out_file, delimiter=',', quotechar='"')

    writer.writerows(reader)
Tomalak
  • 332,285
  • 67
  • 532
  • 628