0

I have a csv file that looks like this:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
"1,0,3,""Braund, Mr. Owen Harris"",male,22,1,0,A/5 21171,7.25,,S"

Can I use pandas to read the csv such that it gets read in the obvious way?

In other words, I want the csv file to be read as if it looked like this:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S

Any suggestions?

JustBaron
  • 2,319
  • 7
  • 25
  • 37
Kimarokko
  • 101
  • 1
  • why are there quotation marks in your csv file? Are the column names read in correctly? – pythonic833 May 10 '18 at 07:44
  • To be honest, I don't know where these quotation marks come from. I just downloaded the csv file from Kaggle, and when I open it on my machine (in notepad or in atom) these quotation marks are simply there... – Kimarokko May 10 '18 at 08:14
  • Perhaps I should have stated my question like this instead: Why are there "unnecessary" quotation marks in my Kaggle csv file? Should I learn to handle them or try to get rid of them? – Kimarokko May 10 '18 at 08:34

1 Answers1

0
pd.read_csv(data)

is the answer to your problem.

Here is the code I used for this Kaggle dataset:

training_set = pd.read_csv('train.csv')

Output (Just first row)

     PassengerId    Survived    Pclass  Name                        Sex     Age     SibSp   Parch   Ticket      Fare    Cabin   Embarked
0             1        0         3      Braund, Mr. Owen Harris     male    22         1        0   A/5 21171   7.2500  NaN     S
pythonic833
  • 3,054
  • 1
  • 12
  • 27
  • I tried that, of course, but is does not work on my machine. With that command, all of the data gets read into the first column (PassengerId) as one messy string, while all the other columns are just NaNs. – Kimarokko May 10 '18 at 07:27
  • Any ideas as to why the above suggestion might work on some machines but not on others? I use python 3 on a windows machine. – Kimarokko May 10 '18 at 07:34
  • try to give the delimiter explicitly by `pd.read_csv(data, delimiter=',')`. If that is not working then I have no idea why this is shouldn't be working on some machines. Is it just with this file? What happens if you construct a dummy datafile and try the same? – pythonic833 May 10 '18 at 07:41
  • Adding the explicit delimiter makes no difference for me, unfortunately. And I have the same issue, if I make a new dummy datafile "dummy_train.csv" with just the first two lines of the original csv file. To get the datafile read correctly into a dataframe I have to manually remove some of the quotation marks. Hm... Strange... But thank's for your suggestions. – Kimarokko May 10 '18 at 08:07