2

I have a pandas data frame in the following format:

Arrival Departure Park Station Count 
      8        10    5   [1,2]     1
      5        12    6   [3,4]     1
      8        10    5   [1,2]     1

I want to groupby this data frame by arrival, departure, park and station but since station is a list, I am getting an error. The output should look like:

    Arrival Departure Park Station Count 
        8        10    5   [1,2]     2
        5        12    6   [3,4]     1

Could you please let me know if there is any way to solve this issue?

user36729
  • 545
  • 5
  • 30

2 Answers2

4

The problem is that a Python list is a mutable type, and hence unhashable. In the place you'd put in the groupby criterion df.Station, put instead df.Station.apply(tuple). This will transform the lists into tuples, which are hashable (and immutable).

For example:

In [66]: df = pd.DataFrame({'Arrival': [8, 5, 4], 'Station': [[1, 2], [3, 4], [1, 2]]})

In [67]: df.groupby([df.Arrival, df.Station.apply(tuple)]).Arrival.sum()
Out[67]: 
Arrival  Station
4        (1, 2)     4
5        (3, 4)     5
8        (1, 2)     8
Name: Arrival, dtype: int64

Conversely,

df.groupby([df.Arrival, df.Station]).Arrival.sum()

won't work.

Community
  • 1
  • 1
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • It works but it gives me Series. How can I make it a dataframe? – user36729 Sep 21 '16 at 19:29
  • @user36729 Whenever you have a series `s`, and wish to make it into a DataFrame, you can use `s.to_frame()`. – Ami Tavory Sep 21 '16 at 19:34
  • Thanks. In this way, both 'Arrival' and 'Station' stick together in the index of data frame. Is there anyway to split them? – user36729 Sep 21 '16 at 21:33
  • @user36729 Happy to answer, but these questions have less to do with your original problem, and more to do with general `groupby` stuff. It's a bit hard to do a dialog in the comments. Nevertheless, after `.to_frame()`, you can call `.reset_index()`. If it complains about existing columns, use in between `.rename(columns={'Arrival': 'count})` (or whatever columns you have). – Ami Tavory Sep 21 '16 at 21:42
1
import pandas as pd
df = pd.DataFrame({'arrival':[8,5,8], 'departure':[10,12,10], \
'park':[5,6,5], 'station':[[1,2], [3,4], [1,2]]})

df['arrival_station'] = df.station.apply(lambda x: x[0])
df['departure_station'] = df.station.apply(lambda x: x[1])
print df

   arrival  departure  park station  arrival_station  departure_station
0        8         10     5  [1, 2]                1                  2
1        5         12     6  [3, 4]                3                  4
2        8         10     5  [1, 2]                1                  2

Now your station data is free and you can groupby as normal.

RoboCopNixon
  • 123
  • 7