1

I have a dataset where I would like to rearrange and sort quarter values in numerical order, grouping by the 'id' column

Data

    id  date    stat
    aa  q1 22   y
    aa  q1 23   y
    aa  q2 22   y
    aa  q2 23   y
    aa  q3 22   y
    aa  q3 23   y
    aa  q4 22   y
    aa  q4 23   ok
    bb  q1 22   n
    bb  q1 23   n
    bb  q2 22   n
    bb  q2 23   n
    bb  q3 22   n
    bb  q3 23   n
    bb  q4 22   n
    bb  q4 23   ok

Desired

 id date    stat
aa  q1 22   y
aa  q2 22   y
aa  q3 22   y
aa  q4 22   y
aa  q1 23   y
aa  q2 23   y
aa  q3 23   ok
aa  q4 23   n
bb  q1 22   n
bb  q2 22   n
bb  q3 22   n
bb  q4 22   n
bb  q1 23   n
bb  q2 23   n
bb  q3 23   n
bb  q4 23   ok

Doing

Since my data is in quarters, I am using this

import pandas as pd    
pd.to_datetime(date).sort_values().to_period('Q')

However, I also need to group these by the 'id' column as the desired output shows. Any suggestion is appreciated

Lynn
  • 4,292
  • 5
  • 21
  • 44

2 Answers2

1

This should do the job:

import pandas as pd
pd.to_datetime(df['date'])
df.sort_values(by=['id', 'date'], inplace=True)
df.index = df['date']
df.index.to_period("Q")

Explanation:

df.sort_values(by=['id', 'date'], inplace=True) will first sort your data on the id column and then it'll sort that sorted data(i.e., on id column) on date column.

iamakhilverma
  • 564
  • 3
  • 9
  • Hi thank you @mrverma this didnt work- no change occurred – Lynn Nov 23 '21 at 03:39
  • It says ParserError: Unknown string format: q1 22 - although we are converting to datetime, however I think it has to do with how the quarters are displayed – Lynn Nov 23 '21 at 03:54
  • i think the problem might be in data as the solution to this question mentions here (https://stackoverflow.com/questions/34505579/pandas-to-datetime-valueerror-unknown-string-format). You may want to manually do the formatting of datetime using strftime from datetime module by passing it your date strings (https://docs.python.org/3/library/datetime.html). – iamakhilverma Nov 23 '21 at 03:59
  • also, as you mentioned in the reply to @wwnde 's answer to this problem that you're getting ValueError: cannot convert float NaN to integer, you may have some NaN values in your date column which you might want to deal with before using to_period('q') – iamakhilverma Nov 23 '21 at 04:04
  • ok thank you I do not have any NaN in the datsaet though, it looks exactly like post – Lynn Nov 23 '21 at 04:04
1

Rename axis, split q to extract integer, sort by

df[['temp1','temp2']]=df['date'].str.split('\s', expand=True)
df=df.sort_values(by=['id','temp2']).drop(columns=['temp1', 'temp2'])

    id   date stat
0   aa  q1 22    y
1   aa  q2 22    y
2   aa  q3 22    y
3   aa  q4 22    y
4   aa  q1 23    y
5   aa  q2 23    y
6   aa  q3 23   ok
7   aa  q4 23    n
8   bb  q1 22    n
9   bb  q2 22    n
10  bb  q3 22    n
11  bb  q4 22    n
12  bb  q1 23    n
13  bb  q2 23    n
14  bb  q3 23    n
15  bb  q4 23   ok
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • Hi thank you @wwnde it is saying ValueError: cannot convert float NaN to integer - I am using the same values within my post - is there some conversion that needs to occur? – Lynn Nov 23 '21 at 03:38
  • Thank you @wwnde, I do not see a change with this – Lynn Nov 23 '21 at 04:07
  • 1
    See my edits Lynn – wwnde Nov 23 '21 at 04:30
  • wow @wwnde this was a tricky one- I will see how you did - thank you kindly – Lynn Nov 23 '21 at 04:35
  • was wondering @wwnde - my full dataset has about 20 columns- it says ValueError: Columns must be same length as key - when I place the code in- how do I adjust the code to fit the columns? or any insight? - maybe I need to add the appropriate temp columns I think – Lynn Nov 23 '21 at 04:39
  • What are the temp columns doing? @wwnde are these placeholders? – Lynn Nov 23 '21 at 04:48
  • The columns as generated cannot affect existing columns. Unless you have columns with the same names. – wwnde Nov 23 '21 at 04:48
  • 1
    The temp columns are place holders. – wwnde Nov 23 '21 at 04:49