UnicodeEncodeError: 'ascii' codec can't encode character error using writerow and map

Question

In Python 2.7 and Ubuntu 14.04 I am trying to write to a csv file:

csv_w.writerow( map( lambda x: flatdata.get( x, "" ), columns ))

this gives me the notorious

UnicodeEncodeError: 'ascii' codec can't encode character u'\u265b' in position 19: ordinal not in range(128)

error.

The usual advice on here is to use unicode(x).encode("utf-8") I have tried this and also just .encode("utf-8") for both parameters in the get:

csv_w.writerow( map( lambda x: flatdata.get( unicode(x).encode("utf-8"), unicode("").encode("utf-8") ), columns ))

but I still get the same error.

Any help is much appreciated in getting rid of the error. (I imagine the unicode("").encode("utf-8") is clumsy but I'm still a newb).

EDIT: My full program is:

#!/usr/bin/env python
import json
import csv
import fileinput
import sys
import glob
import os
def flattenjson( b, delim ):
val = {}
for i in b.keys():
    if isinstance( b[i], dict ):
        get = flattenjson( b[i], delim )
        for j in get.keys():
            val[ i + delim + j ] = get[j]
    else:
        val[i] = b[i]
return val
def createcolumnheadings(cols):
    #create column headings
    print ('a', cols)
    columns = cols.keys()
    columns = list( set( columns ) )
    print('b', columns)
    return columns
doOnce=True
out_file= open( 'Excel.csv', 'wb' )
csv_w = csv.writer( out_file, delimiter="\t"  )
print sys.argv, os.getcwd()
os.chdir(sys.argv[1])
for line in fileinput.input(glob.glob("*.txt")):
    print('filename:', fileinput.filename(),'line  #:',fileinput.filelineno(),'line:', line)
    data = json.loads(line)
    flatdata = flattenjson(data, "__")
    if doOnce:
        columns=createcolumnheadings(flatdata)     
        print('c', columns)
        csv_w.writerow(columns)                
        doOnce=False
    csv_w.writerow( map( lambda x: flatdata.get( unicode(x).encode("utf-8"), unicode("").encode("utf-8") ), columns ))

Redacted single tweet that throws the error UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in position 14: ordinal not in range(128): is available here.

SOLUTION as per Alistair's advice I installed unicodescv. The steps were: Download the zip from here

install it: sudo pip install /path/to/zipfile/python-unicodecsv-master.zip

import unicodecsv as csv
csv_w = csv.writer(f, encoding='utf-8')
csv_w.writerow(flatdata.get(x, u'') for x in columns)

Can you show a complete example, with sample data, so that I can reproduce the problem on my machine and help? — Will, Jul 10 '16 at 21:38
Thanks!! I've added the program. the sample data is racist tweets! These are 1. racist and 2. have identifying information. Could I email them? — schoon, Jul 11 '16 at 09:55
I've put a single redacted tweet on dropcanvas, Link at end of question. Thanks again!! — schoon, Jul 11 '16 at 11:03
You have two exceptions which are complaining about different things - please clarify and provide the full stack trace from the exception so we can see what line is as fault. If the input data is offensive, you can easily recreate it without the offensive parts and paste it into your question. — Alastair McCormack, Jul 15 '16 at 09:15

score 1 · Accepted Answer · edited May 23 '17 at 12:22

1

Without seeing your data it would seem that your data contains Unicode data types (See How to fix: "UnicodeDecodeError: 'ascii' codec can't decode byte" for a brief explination of Unicode vs. str types)

Your solution to encode it is then error prone - any str with non-ascii encoded in it will throw an error when you unicode() it (See previous link for explanation).

You should get all you data into Unicode types before writing to CSV. As Python 2.7's CSV module is broken, you will need to use the drop in replacement: https://github.com/jdunck/python-unicodecsv.

You may also wish to break out your map into a separate statement to avoid confusion. Make to sure to provide the full stacktrace and examples of your code.

edited May 23 '17 at 12:22

Community

1
1

answered Jul 11 '16 at 08:53

Alastair McCormack

26,573
8
77
100

Thanks. What do you mean 'Python 2.7's CSV module is broken'? I only use the map because I cut and pasted it. Any chance of guidance on how to expand it? – schoon Jul 15 '16 at 05:42
It does not support Python 2.x Unicode strings meaning that you have to manually encode your data - the drop-in replacement handles encoding for you, so you can just use Unicodes strings and not worry about encoding in the middle of your code. – Alastair McCormack Jul 15 '16 at 09:17

bobince · Answer 2 · 2016-07-17T23:17:29.213

1

csv_w.writerow( map( lambda x: flatdata.get( unicode(x).encode("utf-8"), unicode("").encode("utf-8") ), columns ))

You've encoded the parameters passed to flatdata.get(), ie the dict key. But the unicode characters aren't in the key, they're in the value. You should encode the value returned by get():

csv_w.writerow([flatdata.get(x, u'').encode('utf-8') for x in columns])

edited Jul 17 '16 at 23:17

answered Jul 11 '16 at 23:12

bobince

528,062
107
651
834

Thanks. This gives me the new error ;_csv.Error: sequence expected'. Any idea why? – schoon Jul 15 '16 at 05:41
Added `[]` to argument, try that (it would seem `csv.writer` doesn't support iterators?) – bobince Jul 17 '16 at 23:18
Thanks. Now I get AttributeError: 'list' object has no attribute 'encode'. – schoon Jul 21 '16 at 11:48
One of the items in your JSON input is a list, not a string. (`flattenjson` only flattens dicts, not lists.) – bobince Jul 23 '16 at 21:09

UnicodeEncodeError: 'ascii' codec can't encode character error using writerow and map

2 Answers2