0

I have the dataset df. I would like to get the last stage for each name as a new column.

Name     Stage     stage_number
a        Open          1
a        Paid          2
a        Transit       3
a        Wait          4
a        Complete      5
b        Open          1
b        Paid          2
b        Transit       3
b        Wait          4
b        Canceled      5

Expected Output:

Name     Stage     stage_number   Last_Stage
a        Open          1           Complete
a        Paid          2           Complete
a        Transit       3           Complete
a        Wait          4           Complete
a        Complete      5           Complete
b        Open          1           Cancelled
b        Paid          2           Cancelled
b        Transit       3           Cancelled
b        Wait          4           Cancelled
b        Canceled      5           Cancelled

I tried the below code but get an error,

def stage(df):
    for x in df['Name']:
        return df['Stage'].iloc[-1]

df['last_stage'] = df.apply(stage, axis = 1)
df

My error

AttributeError: 'str' object has no attribute 'iloc'
  • You say: "I would like to get the last stage_number for each name as a new column" but your expected output is different. What is the correct one? – IoaTzimas Sep 18 '20 at 23:40
  • Sorry. Its getting the last stage – Amogh Katwe Sep 18 '20 at 23:46
  • 1
    Does this answer your question? [Get only the first and last rows of each group with pandas](https://stackoverflow.com/questions/53927414/get-only-the-first-and-last-rows-of-each-group-with-pandas) – Trenton McKinney Sep 19 '20 at 00:53
  • What do you understand from that error message? Have you done any debugging? Please provide a [mcve], as well as the entire error output, and see [ask], [help/on-topic]. – AMC Sep 19 '20 at 01:22

2 Answers2

3

Does this work for you?

df["last_stage"] = df.groupby("Name")["Stage"].transform("last")

print(df)
  Name     Stage  stage_number last_stage
0    a      Open             1   Complete
1    a      Paid             2   Complete
2    a   Transit             3   Complete
3    a      Wait             4   Complete
4    a  Complete             5   Complete
5    b      Open             1   Canceled
6    b      Paid             2   Canceled
7    b   Transit             3   Canceled
8    b      Wait             4   Canceled
9    b  Canceled             5   Canceled
Cameron Riddell
  • 10,942
  • 9
  • 19
0

Cameron's solution is better, but, if you really want to go by your function, you can do it like this:

def stage(df):
    for name, group in df.groupby('Name'):
            for i in range(0, len(group)):
                 return group['Stage'].iloc[-1]

df['last_stage'] = df.apply(stage, axis = 1)
annicheez
  • 187
  • 5
  • ``` def stage(df): for name, group in df.groupby('Name'): for i in range(0, len(group)): yield group['Stage'].iloc[-1] df['last_stage'] = [label for label in df(stage)] ``` Apologies, this should work now. – annicheez Oct 06 '20 at 04:37