working with dataset containing Persian records

Question

am working on a data set containing Persian records, I installed Persian and unicodcsv, but still have this error.

df = pd.read_csv('datasets\NSIA.Individuals.csv')
df.head()

the error I get is:

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 8-9: malformed \N character escape

David · Accepted Answer · 2019-12-13T07:25:44.707

0

The problem as shown from the error is not the file content but rather the path itself, I'm guessing your using windows and the combination "\N" is problematic because its an escape character, you can read more about escape characters in the following link.

You can do the following to the path and it will solve the above problem:

df = pd.read_csv('datasets\\NSIA.Individuals.csv')

another approach is:

df = pd.read_csv(r'datasets\NSIA.Individuals.csv')

There might be problem with the content afterward though.

edited Dec 13 '19 at 07:25

answered Dec 12 '19 at 06:18

David

8,113
2
17
36

What are the `'''` for? – AMC Dec 12 '19 at 07:17
@AlexanderCécile If for instance the string slide from 1 line (appearance wise) then this `''''''` is a way to jump lines and the reading will still be valid and nice – David Dec 12 '19 at 07:56
I'm confused, I don't expect people to use `'''` everywhere **just in case** there happens to be a longer string **at some point in the future**. If it's a single line, `'` is fine. – AMC Dec 12 '19 at 23:18
@AlexanderCécile Changed it, but it shouldn't make you confused everyone has their on conventions – David Dec 13 '19 at 07:26

working with dataset containing Persian records

1 Answers1