14

I'm trying to create a dataframe of a csv file that has 4 empty columns. When I open it on LibreOffice or Excel it correctly identifies the empty columns. However, opening with pd.read_csv() ends up shifting the columns' values by one.

How can I solve this? It seems like a problem with pandas read_csv() method.

My code is really standard:

import pandas as pd
df = pd.DataFrame.read_csv('csv_file.csv', sep=',')
df.head()

I changed the headers and used this:

df = pd.DataFrame.read_csv('csv_file.csv', sep=',', index_col=False).

This solved the problem, but what in my previous headers was causing this?

petezurich
  • 9,280
  • 9
  • 43
  • 57
Marcos Santana
  • 911
  • 5
  • 12
  • 21

4 Answers4

17

It seems you need the parameter index_col=False to NOT read the first column to index in read_csv, sep=',' parameter can be omitted, because it is the default value:

df = pd.read_csv('csv_file.csv', index_col=False)

Your sample:

df = pd.read_csv('teste2.csv', index_col=False)
print (df)
  Header1 Header2  Header3  Unnamed: 3  Unnamed: 4  Header4  Header5  Header6  \
0     ptn  M00001        0         NaN         NaN        2        0        0   

   Header7  Header8    ...     Header22  Header23  Header24  Header25  \
0        0  -31.573    ...       -0.375       0.0   -64.168   276.586   

   Header26  Header27  Unnamed: 29  Unnamed: 30  Header28  Header29  
0    -0.232       0.0          NaN          NaN     0.702       1.0  

[1 rows x 33 columns]
petezurich
  • 9,280
  • 9
  • 43
  • 57
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
4

The problems occurs if your line ends with an delimiter (here comma[,]), which creates an empty cell generally not visible in MS Excel. If your csv line looks like this:

1,2282816,102.97245065789474,2432,0.8333333333333334,0.1388888888888889,certain,

then modify it to:

1,2282816,102.97245065789474,2432,0.8333333333333334,0.1388888888888889,certain

and pd.read_csv(fileName) will work fine.

Andronicus
  • 25,419
  • 17
  • 47
  • 88
SamanwaySahoo
  • 41
  • 1
  • 2
  • That turned out to be the problem for me when trying to load transactions from a download from Chase bank in CSV format. Striping the trailing comma from each data row solved it. Thanks. :) – Steve Jorgensen Nov 15 '19 at 10:25
4

I had a similar problem. Here is how I have solved it:

  1. Opened excel file with google spreadsheet on google drive
  2. Downloaded spread sheet as csv file
  3. Read the csv file via pandas.read_csv('filename', sep=',', index_col=False))

Problem resolved.

DINA TAKLIT
  • 7,074
  • 10
  • 69
  • 74
2

Try writing headers on top of each column if there are none. This time, read_csv() also reads the headers and lists them.
After that convert the dataframe to an array by

df=df.values 

and the headers are gone.

Y. Yazarel
  • 1,385
  • 1
  • 8
  • 13