-2

In Python 2.7 and Ubuntu 14.04 I am trying to write to a csv file:

csv_w.writerow( map( lambda x: flatdata.get( x, "" ), columns ))

this gives me the notorious

UnicodeEncodeError: 'ascii' codec can't encode character u'\u265b' in position 19: ordinal not in range(128)

error.

The usual advice on here is to use unicode(x).encode("utf-8") I have tried this and also just .encode("utf-8") for both parameters in the get:

csv_w.writerow( map( lambda x: flatdata.get( unicode(x).encode("utf-8"), unicode("").encode("utf-8") ), columns ))

but I still get the same error.

Any help is much appreciated in getting rid of the error. (I imagine the unicode("").encode("utf-8") is clumsy but I'm still a newb).

EDIT: My full program is:

#!/usr/bin/env python
import json
import csv
import fileinput
import sys
import glob
import os
def flattenjson( b, delim ):
val = {}
for i in b.keys():
    if isinstance( b[i], dict ):
        get = flattenjson( b[i], delim )
        for j in get.keys():
            val[ i + delim + j ] = get[j]
    else:
        val[i] = b[i]
return val
def createcolumnheadings(cols):
    #create column headings
    print ('a', cols)
    columns = cols.keys()
    columns = list( set( columns ) )
    print('b', columns)
    return columns
doOnce=True
out_file= open( 'Excel.csv', 'wb' )
csv_w = csv.writer( out_file, delimiter="\t"  )
print sys.argv, os.getcwd()
os.chdir(sys.argv[1])
for line in fileinput.input(glob.glob("*.txt")):
    print('filename:', fileinput.filename(),'line  #:',fileinput.filelineno(),'line:', line)
    data = json.loads(line)
    flatdata = flattenjson(data, "__")
    if doOnce:
        columns=createcolumnheadings(flatdata)     
        print('c', columns)
        csv_w.writerow(columns)                
        doOnce=False
    csv_w.writerow( map( lambda x: flatdata.get( unicode(x).encode("utf-8"), unicode("").encode("utf-8") ), columns ))

Redacted single tweet that throws the error UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in position 14: ordinal not in range(128): is available here.

SOLUTION as per Alistair's advice I installed unicodescv. The steps were: Download the zip from here

install it: sudo pip install /path/to/zipfile/python-unicodecsv-master.zip

import unicodecsv as csv
csv_w = csv.writer(f, encoding='utf-8')
csv_w.writerow(flatdata.get(x, u'') for x in columns)
Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
schoon
  • 2,858
  • 3
  • 46
  • 78
  • 2
    Can you show a complete example, with sample data, so that I can reproduce the problem on my machine and help? – Will Jul 10 '16 at 21:38
  • Thanks!! I've added the program. the sample data is racist tweets! These are 1. racist and 2. have identifying information. Could I email them? – schoon Jul 11 '16 at 09:55
  • I've put a single redacted tweet on dropcanvas, Link at end of question. Thanks again!! – schoon Jul 11 '16 at 11:03
  • You have two exceptions which are complaining about different things - please clarify and provide the full stack trace from the exception so we can see what line is as fault. If the input data is offensive, you can easily recreate it without the offensive parts and paste it into your question. – Alastair McCormack Jul 15 '16 at 09:15

2 Answers2

1

Without seeing your data it would seem that your data contains Unicode data types (See How to fix: "UnicodeDecodeError: 'ascii' codec can't decode byte" for a brief explination of Unicode vs. str types)

Your solution to encode it is then error prone - any str with non-ascii encoded in it will throw an error when you unicode() it (See previous link for explanation).

You should get all you data into Unicode types before writing to CSV. As Python 2.7's CSV module is broken, you will need to use the drop in replacement: https://github.com/jdunck/python-unicodecsv.

You may also wish to break out your map into a separate statement to avoid confusion. Make to sure to provide the full stacktrace and examples of your code.

Community
  • 1
  • 1
Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
  • Thanks. What do you mean 'Python 2.7's CSV module is broken'? I only use the map because I cut and pasted it. Any chance of guidance on how to expand it? – schoon Jul 15 '16 at 05:42
  • It does not support Python 2.x Unicode strings meaning that you have to manually encode your data - the drop-in replacement handles encoding for you, so you can just use Unicodes strings and not worry about encoding in the middle of your code. – Alastair McCormack Jul 15 '16 at 09:17
1
csv_w.writerow( map( lambda x: flatdata.get( unicode(x).encode("utf-8"), unicode("").encode("utf-8") ), columns ))

You've encoded the parameters passed to flatdata.get(), ie the dict key. But the unicode characters aren't in the key, they're in the value. You should encode the value returned by get():

csv_w.writerow([flatdata.get(x, u'').encode('utf-8') for x in columns])
bobince
  • 528,062
  • 107
  • 651
  • 834