Pandas read csv is shifting columns

Question

I'm trying to create a dataframe of a csv file that has 4 empty columns. When I open it on LibreOffice or Excel it correctly identifies the empty columns. However, opening with pd.read_csv() ends up shifting the columns' values by one.

How can I solve this? It seems like a problem with pandas read_csv() method.

My code is really standard:

import pandas as pd
df = pd.DataFrame.read_csv('csv_file.csv', sep=',')
df.head()

I changed the headers and used this:

df = pd.DataFrame.read_csv('csv_file.csv', sep=',', index_col=False).

This solved the problem, but what in my previous headers was causing this?

please give the csv file. If it too big, you can upload gist.github.com and post a link. — Haha TTpro, Aug 12 '17 at 16:58
Try this: `df = pd.DataFrame.read_csv('csv_file.csv', sep=',', index_col=0)` — MaxU - stand with Ukraine, Aug 12 '17 at 16:58

score 17 · Answer 1 · edited Nov 24 '18 at 16:53

17

It seems you need the parameter index_col=False to NOT read the first column to index in read_csv, sep=',' parameter can be omitted, because it is the default value:

df = pd.read_csv('csv_file.csv', index_col=False)

Your sample:

df = pd.read_csv('teste2.csv', index_col=False)
print (df)
  Header1 Header2  Header3  Unnamed: 3  Unnamed: 4  Header4  Header5  Header6  \
0     ptn  M00001        0         NaN         NaN        2        0        0   

   Header7  Header8    ...     Header22  Header23  Header24  Header25  \
0        0  -31.573    ...       -0.375       0.0   -64.168   276.586   

   Header26  Header27  Unnamed: 29  Unnamed: 30  Header28  Header29  
0    -0.232       0.0          NaN          NaN     0.702       1.0  

[1 rows x 33 columns]

edited Nov 24 '18 at 16:53

petezurich

9,280
9
43
57

answered Aug 12 '17 at 16:59

jezrael

822,522
95
1,334
1,252

Neither worked. I've no idea what is going on. I can see the empty columns when the csv is opened with gedit. What pandas is doing is a mystery. – Marcos Santana Aug 12 '17 at 17:06
I test it now and it works perfectly. So it seems some data problem. If you want, send me your file to my email in my profile, I can check it. Also what is your pandas version? `print (pd.show_versions())` ? – jezrael Aug 12 '17 at 17:07
pandas: 0.19.0. I'll send a modified version of the file. IP problem. – Marcos Santana Aug 12 '17 at 17:10
Just sent you the file – Marcos Santana Aug 12 '17 at 17:20
1

Thank you. index_col=False for me working, for you not? – jezrael Aug 12 '17 at 17:23
Oh, I just realized. it worked with the file i sent you. However, when i tried with my original file it didn't!. – Marcos Santana Aug 12 '17 at 17:27
I tried to use my original headers and the problem happened again on the file i sent you. I think there is some odd about these headers – Marcos Santana Aug 12 '17 at 17:29
Hmmm, so you dont get output like me in answer? It is pandas version related problem? – jezrael Aug 12 '17 at 17:52
Not with my original headers. – Marcos Santana Aug 12 '17 at 17:53
What version are you using? – Marcos Santana Aug 12 '17 at 17:53
I use `pandas: 0.20.2` – jezrael Aug 12 '17 at 17:54
I just update to `0.20.3` and there it works nice too. – jezrael Aug 12 '17 at 18:01
Same here. I updated and your solution works fine with the file I sent you. But still no effect on the original file. – Marcos Santana Aug 12 '17 at 18:03
It seems in original data some columns names are missing, I guess only, because no data (maybe sensitive data) – jezrael Aug 12 '17 at 18:04
'when the csv has delimiters at the end of the each line' - it becomes a malformed csv file. index_col=False solves that problem. read more at - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html – abhijat_saxena Mar 19 '21 at 13:14

score 4 · Answer 2 · edited Feb 26 '19 at 07:18

4

The problems occurs if your line ends with an delimiter (here comma[,]), which creates an empty cell generally not visible in MS Excel. If your csv line looks like this:

1,2282816,102.97245065789474,2432,0.8333333333333334,0.1388888888888889,certain,

then modify it to:

1,2282816,102.97245065789474,2432,0.8333333333333334,0.1388888888888889,certain

and pd.read_csv(fileName) will work fine.

edited Feb 26 '19 at 07:18

Andronicus

25,419
17
47
88

answered Feb 26 '19 at 06:42

SamanwaySahoo

41
1
2

That turned out to be the problem for me when trying to load transactions from a download from Chase bank in CSV format. Striping the trailing comma from each data row solved it. Thanks. :) – Steve Jorgensen Nov 15 '19 at 10:25

score 4 · Answer 3 · edited Jun 23 '19 at 09:04

4

I had a similar problem. Here is how I have solved it:

Opened excel file with google spreadsheet on google drive
Downloaded spread sheet as csv file
Read the csv file via pandas.read_csv('filename', sep=',', index_col=False))

Problem resolved.

edited Jun 23 '19 at 09:04

DINA TAKLIT

7,074
10
69
74

answered Jun 23 '19 at 07:53

Kaan Karahan

41
1

Y. Yazarel · Answer 4 · 2023-08-20T05:34:38.733

2

Try writing headers on top of each column if there are none. This time, read_csv() also reads the headers and lists them.
After that convert the dataframe to an array by

df=df.values

and the headers are gone.

edited Aug 20 '23 at 05:34

answered Aug 24 '18 at 22:29

Y. Yazarel

1,385
1
8
13

Pandas read csv is shifting columns

4 Answers4