0

I have a couple of questions about the group by function. 1. I would like to group by pandas data frame by single column without aggregation. 2. After group by, I would like to split the dataset into several datasets by the month date. So, I wasn't able to do so, I am requesting help. I would appreciate it if you guys can help me. I have provided the code, expected results, and dataset below.

Original dataframe

data = {'month': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-01-01', '2022-02-01', '2022-03-01', '2022-01-01', '2022-02-01', '2022-03-01',], 
'Name': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'], 
'num': [1234, 1234, 1234, 456, 456, 456, 456, 100, 200,],
}

df = pd.DataFrame(data)
df

enter image description here

Expected Result for question #1

enter image description here

And I want to split the dataset into different datasets like this Expected Result for question #2 enter image description here

Thank You

Bad Coder
  • 177
  • 11

2 Answers2

2

IIUC, you can use a list comprehension and groupby:

dfs = [d for _,d in df.groupby('month')]

output:

[        month Name   num
 0  2022-01-01    A  1234
 3  2022-01-01    B   456
 6  2022-01-01    C   456,
         month Name   num
 1  2022-02-01    A  1234
 4  2022-02-01    B   456
 7  2022-02-01    C   100,
         month Name   num
 2  2022-03-01    A  1234
 5  2022-03-01    B   456
 8  2022-03-01    C   200]

Then you can access each dataframe with slicing:

dfs[0]

        month Name   num
0  2022-01-01    A  1234
3  2022-01-01    B   456
6  2022-01-01    C   456

Alternatively, add the months to the index to form a MultiIndex:

df2 = df.set_index(df['month'], append=True).swaplevel().sort_index()

output:

                   month Name   num
month                              
2022-01-01 0  2022-01-01    A  1234
           3  2022-01-01    B   456
           6  2022-01-01    C   456
2022-02-01 1  2022-02-01    A  1234
           4  2022-02-01    B   456
           7  2022-02-01    C   100
2022-03-01 2  2022-03-01    A  1234
           5  2022-03-01    B   456
           8  2022-03-01    C   200

Then use loc to get the subdataframes:

df2.loc['2022-01-01']

        month Name   num
0  2022-01-01    A  1234
3  2022-01-01    B   456
6  2022-01-01    C   456
mozway
  • 194,879
  • 13
  • 39
  • 75
  • @BadCoder which one? – mozway Sep 20 '22 at 16:54
  • the first solution but I have not tried the second solution and it might work as well. When I print single datasets, it prints in a data frame but if I run a loop, it will print in a different format and be hard to read. Why is that? – Bad Coder Sep 20 '22 at 17:00
1

try:

df.sort_values(['month', 'Name'])
    month       Name    num
0   2022-01-01  A       1234
3   2022-01-01  B       456
6   2022-01-01  C       456
1   2022-02-01  A       1234
4   2022-02-01  B       456
7   2022-02-01  C       100
2   2022-03-01  A       1234
5   2022-03-01  B       456
8   2022-03-01  C       200

D = []
for i in df['month'].unique():
    print(i)
    D.append(df.loc[df['month'].eq(i)])

#D is now list of separate dataframes

D[0]
    month       Name    num
0   2022-01-01  A       1234
3   2022-01-01  B       456
6   2022-01-01  C       456

D[1]
    month       Name    num
1   2022-02-01  A       1234
4   2022-02-01  B       456
7   2022-02-01  C       100

type(D[2])
pandas.core.frame.DataFrame

#-----
if you want to sort by only one column, then you need to create this one column as string:
df['monthName'] = df['month'] + df['Name']
df.sort_values('monthName')
    month       Name    num     monthName
0   2022-01-01  A       1234    2022-01-01A
3   2022-01-01  B       456     2022-01-01B
6   2022-01-01  C       456     2022-01-01C
1   2022-02-01  A       1234    2022-02-01A
4   2022-02-01  B       456     2022-02-01B
7   2022-02-01  C       100     2022-02-01C
2   2022-03-01  A       1234    2022-03-01A
5   2022-03-01  B       456     2022-03-01B
8   2022-03-01  C       200     2022-03-01C

you can drop this column as long as you sort:
df.sort_values('monthName').drop(columns='monthName')

#-------
if you want to read all the dataframes at once in jupyter:

for i in D:
    print(i)
   month         Name   num
0  2022-01-01    A      1234
3  2022-01-01    B       456
6  2022-01-01    C       456
   month         Name   num
1  2022-02-01    A      1234
4  2022-02-01    B       456
7  2022-02-01    C       100
   month         Name   num
2  2022-03-01    A      1234
5  2022-03-01    B       456
8  2022-03-01    C       200

to make it more easy to read them, you can also add a line or an empty line after each one:

for i in D:
    print(i)
    print('\n') #add empty line after each df
    #print('__________\n') #add line + empty line after each df

        month Name   num
0  2022-01-01    A  1234
3  2022-01-01    B   456
6  2022-01-01    C   456


        month    Name   num
1  2022-02-01    A      1234
4  2022-02-01    B       456
7  2022-02-01    C       100


        month    Name   num
2  2022-03-01    A      1234
5  2022-03-01    B       456
8  2022-03-01    C       200

[![screenshot][1]][1]
khaled koubaa
  • 836
  • 3
  • 14
  • Thank you for the solution. #1 In this case, sorting solves the issue but if we want to just group by a single column in pandas like SQL, how can we do that? – Bad Coder Sep 20 '22 at 16:22
  • we don't need to group, we need to sort, if you need to sort by one column, you should join both columns as string then sort, I will add it to my answer – khaled koubaa Sep 20 '22 at 16:25
  • Thank you for the solution. No, we don't need to combine and I think sort will solve the issue. How can we convert a list into a data frame? If we can have them in a data frame, it will be a more readable format. – Bad Coder Sep 20 '22 at 16:36
  • you mean the list of many dataframes ```D``` ? – khaled koubaa Sep 20 '22 at 16:37
  • Yes, Khaled. Having a list is hard to read large data sets and manipulate them. So, I was wondering if is it possible to have them in a data frame. – Bad Coder Sep 20 '22 at 16:38
  • you want to read them all at once in jupyter or csv ? – khaled koubaa Sep 20 '22 at 16:40
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/248206/discussion-between-bad-coder-and-khaled-koubaa). – Bad Coder Sep 20 '22 at 16:41