how does pandas DataFrame use groupby() to split and combine data

Question

I have data like this: (a smaller version of actual DataFrame structure )

week day val 
1   0   8      
1   1   9
1   2   6
1   3   3
1   4   4
1   5   2
1   6   6
1   7   9
2   0   3
2   1   1
2   2   2
2   3   6
2   4   8
2   5   9
2   6   6
2   7   3
3   0   4
3   1   2
3   2   6
3   3   7
3   4   4
3   5   2
3   6   5
3   7   7
1   0   1
1   1   2
1   2   6
1   3   8
1   4   9
1   5   1
1   6   7
1   7   4
2   0   2
2   1   1
2   2   2
2   3   6
2   4   8
2   5   9
2   6   1
2   7   7
3   0   4
3   1   2
3   2   8
3   3   9
3   4   7
3   5   9
3   6   3
3   7   7

and, I want to use "week" and "day" as the group keys. It just like what I have done below:

data.loc[:,wd_val] = data.groupby([data['week'],data['day']]).mean()

I got an error:

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long 
long'

so,(1) what does mean "longlong" ?

Second，I add the parameter as_index:

data.loc[:,'wd_val']= 
data[['val']].groupby([data['week'],data['day']],as_index=False).mean()
data

but, the values of "wd_val" is NaN:

week    day val wd_val
0   1   0   8   NaN
1   1   1   9   NaN
2   1   2   6   NaN
3   1   3   3   NaN

(2) why do I got the wrong?

Third, I got a dataframe using the code below:

temp = data[['val']].groupby([data['week'],data['day']]).mean()
temp

            val
week  day   
 1     1    5.5
       2    6.0
       3    5.5
       4    6.5
       5    1.5
       6    6.5
       7    6.5
  2    0    2.5
       1    1.0
       2    2.0

and , I want to switch the index("week" and "day") into columns of DataFrame. How can I do that?

Were you trying to group by a list of labels like `["week", "day"]`? If so, don't use `[data["week"], data["day"]]`. I'm not sure whether it's going to try to handle that as a mapping, or as a list-of-Series-interpreted-as-2D-array-interpreted-as-Series, or what, but I don't think it's what you want. — abarnert, Jul 14 '18 at 04:32
Meanwhile, `long long` means `int64`. Presumably that's the `dtype` of either your `week` column or your `day` column. — abarnert, Jul 14 '18 at 04:34
Also, what is the actual structure of your DataFrame? Can you show us a constructor call to create it (or, ideally, a smaller version of it), or something that can be passed to `read_csv` or `read_text`, or something else we could use to try to debug things for you? — abarnert, Jul 14 '18 at 04:36
Thank you , the data in the first part of my questions is a smaller version of actual DataFrame structure. You can use it to run. — user10025959, Jul 14 '18 at 05:05
My target is to generate a new variable , the mean of every day in a week. so , I group the "val" by the two variable of "week" and "day". — user10025959, Jul 14 '18 at 05:16

score 1 · Answer 1 · answered Jul 14 '18 at 04:49

input:

data = pd.DataFrame([
    [1,0,0],
    [1,0,1],
    [1,1,0],
    [1,1,1],
    [1,2,0],
    [2,2,1],
    [2,2,2],
    [2,2,2]], columns=['week','day','val'])

Try:

pd.merge(data, data.groupby(['week','day']).mean(),
    on=['week', 'day'], 
    suffixes=('_orig', '_wk_mean'))

output:

   week  day  val_orig  val_wk_mean
0     1    0         0     0.500000
1     1    0         1     0.500000
2     1    1         0     0.500000
3     1    1         1     0.500000
4     1    2         0     0.000000
5     2    2         1     1.666667
6     2    2         2     1.666667
7     2    2         2     1.666667

long long is a data type

Python type long vs C 'long long'

A similar sql statement might look something like this:

select A.week
    , A.day
    , A.val as val_orig
    , B.val_wk_mean from data as A
join (
    SELECT avg(val) as val_wk_mean
        , week
        , day
    from data
    group by week, day
) as B
on A.week=B.week
and A.day=B.day

also see:

@user10025959 i would guess it might be some extra bracket somewhere. hope it helps. — jmunsch, Jul 14 '18 at 05:41
Thank you for help. I got a error(KeyError: 'week') when I run your python code above. my python version is Python 3.6. why is this? And , I think you understand my question very well — user10025959, Jul 14 '18 at 05:56

score 1 · Accepted Answer · answered Jul 15 '18 at 00:42

IIUC, you think you need to use transform

df['wd_val'] = df.groupby(['week','day'])['val'].transform('mean')

Output:

    week  day  val  wd_val
0      1    0    8     4.5
1      1    1    9     5.5
2      1    2    6     6.0
3      1    3    3     5.5
4      1    4    4     6.5
5      1    5    2     1.5
6      1    6    6     6.5
7      1    7    9     6.5
8      2    0    3     2.5
9      2    1    1     1.0
10     2    2    2     2.0
11     2    3    6     6.0
12     2    4    8     8.0
13     2    5    9     9.0
14     2    6    6     3.5
15     2    7    3     5.0
16     3    0    4     4.0
17     3    1    2     2.0
18     3    2    6     7.0
19     3    3    7     8.0
20     3    4    4     5.5
21     3    5    2     5.5
22     3    6    5     4.0
23     3    7    7     7.0
24     1    0    1     4.5
25     1    1    2     5.5
26     1    2    6     6.0
27     1    3    8     5.5
28     1    4    9     6.5
29     1    5    1     1.5
30     1    6    7     6.5
31     1    7    4     6.5
32     2    0    2     2.5
33     2    1    1     1.0
34     2    2    2     2.0
35     2    3    6     6.0
36     2    4    8     8.0
37     2    5    9     9.0
38     2    6    1     3.5
39     2    7    7     5.0
40     3    0    4     4.0
41     3    1    2     2.0
42     3    2    8     7.0
43     3    3    9     8.0
44     3    4    7     5.5
45     3    5    9     5.5
46     3    6    3     4.0
47     3    7    7     7.0

Thank you, simple and elegant method. – user10025959 Jul 16 '18 at 11:36 — user10025959, Jul 16 '18 at 11:36
and,your answer is what I want to do. – user10025959 Jul 16 '18 at 12:11 — user10025959, Jul 16 '18 at 12:11

how does pandas DataFrame use groupby() to split and combine data

2 Answers2