1

Im getting (with an python API) a .csv file from an email attachment that i received in gmail, transforming it into a dataframe to make some dataprep, and saving as .csv on my pc. It is working great, the problem is that i get '\n' on some columns(it came like that from the source attachment).

the code that i used to get the data and transform into dataframe and .csv

r = io.BytesIO(part.get_payload(decode = True))
df = pd.DataFrame(r)
df.to_csv('C:/Users/x.csv', index = False)

Example of df that i get:

+-------------+----------+---------+----------------------+
| Information | Modified | Created | MD_x0020_Agenda\r\n' |
+-------------+----------+---------+----------------------+
| c           | d        | f       | \r\n'                |
| b\n'        |          |         |                      |
| c           | e        | \r\n'   |                      |
+-------------+----------+---------+----------------------+

example of answer that is correct:

+-------------+----------+---------+----------------------+
| Information | Modified | Created | MD_x0020_Agenda\r\n' |
+-------------+----------+---------+----------------------+
| c           | d        | f       | \r\n'                |
| b           | c        | e       | \r\n'                |
+-------------+----------+---------+----------------------+

i tried to use the line_terminator. in my mind, if i force it to get only \r\n and not \n, it would work. It didnt.

df.to_csv('C:/Users/x.csv', index = False, line_terminator='\r\n')

can somebody give me a help with that? its really freaking me out, because of that i cant advance at my project. thanks.

  • 1
    If the payload is a csv, you should be able to use pd.read_csv(r) to capture the df and use it's native line break parsing! Have you tried that? – mochsner Jul 05 '20 at 03:36
  • Thanks for the answer, i mixed your answer with the answer of Renato Aranha and it worked! – Felipe Ribeiro Jul 05 '20 at 15:28

2 Answers2

2

Usually, this "\n" appears to mark that sentence is going for next line i.e ‘return’ key, line break.

You can get rid of it just by applying replace('\n', '') on your dataframe:

df = df.replace('\n', '')

For more details on the function, consider checking this specific Pandas documentation

Hope it works.

Renato Aranha
  • 300
  • 1
  • 10
2

I mixed the two answers and got the solution, thanks!!!!!

PS: with some research i found that this is a windows/excel issue, when you export .csv it considers \n and \r\n (\r too?) as new row. DataFrame considers only \r\n as new row(when default).

df = pd.read_csv(io.BytesIO(part.get_payload(decode = True)), header=None)  

#grab the first row for the header
new_header = df.iloc[0] 
#take the data less the header row
df = df[1:] 
#set the header row as the df header
df.columns = new_header 
#replace the \n wich is creating new lines
df['Information'] = df['Information'].replace(regex = '\n', value = '')
df.to_csv('C:/Users/x.csv', index = False', index = False)