2

I have a Pandas DataFrame that looks like this (currently without an index other than the in-built row index, but if it's easier to add indexes to "Person" and "Car", that's fine too):

before = pd.DataFrame({
  'Email': ['john@example.com','mary@example.com','jane@example.com','john@example.com','mary@example.com'],
  'Person': ['John','Mary','Jane','John','Mary'],
  'Car': ['Ford','Toyota','Nissan','Nissan','Ford']
})

I'd like to re-shape it to look like this:

after = pd.DataFrame({
  'Person': ['John','Mary','Jane'],
  'Email': ['john@example.com','mary@example.com','jane@example.com'],
  'Ford': [True,True,False],
  'Nissan': [True,False,True],
  'Toyota': [False,True,False]
})

Note that John has owned both a Ford and a Nissan, Mary has owned a Ford and a Toyota, and Paul has stuck with his trusty Nissan.

I've tried various permutations of stacking a multi-indexed DataFrame, grouping, pivoting -- I can't seem to figure out how to take the value from the "Car" column and transpose it to a new column with the value "True", merging people together by, say, their name.

Nader Hisham
  • 5,214
  • 4
  • 19
  • 35
Dustin
  • 153
  • 1
  • 7

2 Answers2

1

Not sure if this is the best way to do this , but one way would be -

In [26]: before.pivot_table(index=['Email','Person'],columns=['Car'], aggfunc=lambda x: True).fillna(False).reset_index()
Out[26]:
Car             Email Person   Ford Nissan Toyota
0    jane@example.com   Jane  False   True  False
1    john@example.com   John   True   True  False
2    mary@example.com   Mary   True  False   True
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
  • Accepting this because of my fascination with elegant one-liners, and the lack of the disposable column. Thanks for the quick response. :) – Dustin Sep 15 '15 at 08:01
1
before['has_car'] = True

Out[93]:
car                Email    Person  has_car
Ford    john@example.com    John    True
Toyota  mary@example.com    Mary    True
Nissan  jane@example.com    Jane    True
Nissan  john@example.com    John    True
Ford    mary@example.com    Mary    True

df = before.pivot_table(index = ['Person' , 'Email'], columns= 'Car' , values='has_car')


Out[89]:
                            Ford    Nissan  Toyota
Person  Email           
Jane    jane@example.com    NaN     True    NaN
John    john@example.com    True    True    NaN
Mary    mary@example.com    True    NaN     True

df.fillna(False).reset_index()

Out[102]:
Car Person  Email               Ford    Nissan  Toyota
0   Jane    jane@example.com    False   True    False
1   John    john@example.com    True    True    False
2   Mary    mary@example.com    True    False   True
Nader Hisham
  • 5,214
  • 4
  • 19
  • 35