1

So I have this very stupid problem I have been stumbling upon for hours. I'm practicing on kaggle's Titanic ML exercice, using graphlab create.

My data is as shown below : Data

Now I want to replace some values in the table. For example I want to set (as a test) the age to 38 for Pclass==1 30 for Pclass==2 and 26 for Pclass==3

I have tried so many ways of doing this that I am lost.

All I have now is :

df = gl.SFrame(data)
df[(df["Pclass"]==1)] #will print the rows of the table where Pclass=1
df["Age"][df["Pclass"]==1] #will display an array containg only the column "Age" for Pclass=1

Now I am trying to use SFrame.apply properly but I'm confused.

I have tried

df["Age"][df["Pclass"]==1].apply(lambda x: 38)

That returns an array with the correct values but I was not able to apply it to the SFrame. For example, I have tried :

df = df["Age"][df["Pclass"]==1].apply(lambda x: 38)

But now my DataFrame has turned into a list ... (obviously)

Il have also tried :

df["Age"] = df["Age"][df["Pclass"]==1].apply(lambda x: 38)

But I get the following error : "RuntimeError: Runtime Exception. Column "__PassengerId-Survived-Pclass-Sex-Age-Fare" has different size than current columns!"

I'm sure the solution is pretty simple but I am too confused to find it by myself.

Ultimately I would like something like df["Age"] = something.apply(lambda x: 38 if Pclass==1 else 30 if Pclass==2 else 26 if Pclass==3)

Thanks.

Mike
  • 893
  • 7
  • 22
  • So I made some progress by doing this : df["Age"] = df["Age"].apply(lambda x:38 if x==1 else 30 if x==26 else 26) ... However I am stil stuck since writting df["Age"] = df["Age"].apply(lambda x:38 if x["Pclass"]==1 else 30 if x["Pclass"]==26 else 26) does return an error TypeError: 'float' object has no attribute '__getitem__' – Mike Mar 09 '17 at 20:56

2 Answers2

2

You can use alternative code as below:

Just create a new column 'Pclass_' in the original Sframe,then you can do:

df['Pclass_'] = [1 if item == 38 else 2 if item == 30 else 3 if item == 26 else 4 for item in df['Age']]

You can use any kind of (if-else-if) conditions in the list.

Chrinide Wu
  • 119
  • 8
0

OK, I have spent some time on this problem and found a solution : use pandas. I am used to pandas but new on Graphlab which I will not be using that much so I decided to stop wasting time on this simple problem.

Here is what I have done :

import pandas as pd
df2 = pd.read_csv("./train.csv")
df2.loc[(df2.Age.isnull()) & (df2["Pclass"] == 1), "Age"] = 35
df2.loc[(df2.Age.isnull()) & (df2["Pclass"] == 2), "Age"] = 30
df2.loc[(df2.Age.isnull()) & (df2["Pclass"] == 3), "Age"] = 25

And I'm done, everything works fine.

Mike
  • 893
  • 7
  • 22