2

I'm using py2neo to export data from my neo4j database. (Using Python 2.7 on MacOS X)

Here's the code I've been using:

import csv
from py2neo import neo4j, cypher, node, rel
import pprint

ofile  = open('mydata.csv', 'wb')
writer = csv.writer(ofile, delimiter='\t', quotechar='|', quoting = csv.QUOTE_ALL)
graph_db = neo4j.GraphDatabaseService("http://xx.xx.xx.xx:7474/db/data/")
qs = '''MATCH (a:MyLabel) 
WHERE NOT a.shortdesc = ""
RETURN a.name, a.shortdesc, a.longdesc 
ORDER BY a.name'''
query = neo4j.CypherQuery(graph_db, qs)
writer.writerows(query.stream())

In the properties a.shortdesc and a.longdesc there are clearly some strange characters, and I can't figure out how to encode them properly. I'm getting this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 148: ordinal not in range(128)

I've been trying all sorts of different things... how can I take the namedtuples and properly encode them so I can write them to a csv file?

Jed Christiansen
  • 659
  • 10
  • 21

1 Answers1

4

You are trying to write Unicode data containing (among others) a U+201C LEFT DOUBLE QUOTATION MARK codepoint.

You'll need to encode your values to UTF-8 or find another way to represent the Unicode values as data.

Encoding can be done in a generator expression with a list comprehension to encode each column:

writer.writerows([unicode(c).encode('utf8') for c in row] for row in query.stream())

The unicode() call ensures that non-unicode values are first converted to unicode strings before attempting to encode.

You can also try to 'simplify' the values; the codepoint you found is a 'fancy' quote and is likely just there because a word processor or desktop spreadsheet application decided to replace regular quotes with those. If all of your data is otherwise just ASCII text or numbers, you can try to replace the 'fancy' stuff with ASCII equivalents.

The Unidecode package can replace such codepoints with ASCII versions again:

from unidecode import unidecode

writer.writerows([unidecode(unicode(c)) for c in row] for row in query.stream())
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343