I'm using pandas to load csv files created by excel, do some analysis, and then save results to csv files. I've noticed the pandas to_csv and from_csv methods don't seem capable of handling special characters such as \r but don't raise any errors either.
In [7]: import pandas as pd
In [8]: data = {
'A': ['one', 'two', 'three'],
'B': ['four', 'five', 'six']
}
In [9]: df = pd.DataFrame(data)
In [10]: df
Out[10]:
A B
0 one four
1 two five
2 three six
In [11]: df.loc[1,'A'] = 't' + '\r' + 'o'
In [12]: df
Out[12]:
A B
0 one four
1 t\ro five
2 three six
In [13]: df.to_csv("my_df.csv")
In [14]: df2 = pd.DataFrame.from_csv("my_df.csv")
In [15]: df2
Out[15]:
A B
0 one four
1 t NaN
o five NaN
2 three six
Since I'm not specifying any encoding here I am assuming it is using ASCII but even when I specify encoding='utf-8' for the writing and reading I get the same result.
How do I write a robust csv writing and reading script so that the rows/columns are not corrupted or other unexpected things happen? If the only solution is to check and 'clean' every string before writing to csv then what is the easiest way to do that?