1

I have a data frame as the following:

df = pd.DataFrame({'id':[3333311,3455572,6464544,2323322,2222111,4333311,5454566,3321767],'A':['12 days','35 days','36 days','56 days','54 days','44 days','56 days','54 days'],'B':['6 days','31 days','33 days','46 days','44 days','16 days','41 days','42 days'],'Percentage':[0.41,0.36,0.36,0.42,0.25,0.56,0.25,0.42]})

              id        A         B       Percentage
    1       3333311   12 days   6 days    0.41
    3953    3455572   35 days   31 days   0.36
    46458   6464544   36 days   33 days   0.36
    39378   2323322   56 days   46 days   0.42
    115880  2222111   54 days   44 days   0.25
    115882  4333311   44 days   16 days   0.56
    118882  5454566   56 days   41 days   0.25
    118884  3321767   54 days   42 days   0.42

I want to sort it first on Percentage.

Then when a tie breaker comes, it should sort on A and B simultaneously as

     if A.iloc[1] < A.iloc[2]
      and B.iloc[1] < B.iloc[2]
then df.iloc[2] should come first and vice versa.


But      if A.iloc[1] < A.iloc[2]
         and B.iloc[1] > A.iloc[2]

    or   if A.iloc[1] > A.iloc[2]
         and B.iloc[1] < A.iloc[2]

I have tried this but not getting exact result

df = df.sort_values(by='B').sort_values(by='A').sort_values(by='Percentage', ascending=False)

Expected output will be like this:

         id        A          B       Percentage
115882  4333311   44 days   16 days   0.56
39378   2323322   56 days   46 days   0.42
118884  3321767   54 days   42 days   0.42
1       3333311   12 days   6 days    0.41
46458   6464544   36 days   33 days   0.36
3953    3455572   35 days   31 days   0.36
118882  5454566   56 days   41 days   0.25
115880  2222111   54 days   44 days   0.25

Now here on 0.25 percentage tie breaker, there is the condition: if A.iloc[118882] > A.iloc[115880] and B.iloc[118882] < B.iloc[115880]

Here we will perform other operations and consider other aggregations.

Can we do it by sorted function with cmp Parameter?

DAme
  • 697
  • 8
  • 21
user9419602
  • 31
  • 1
  • 6

2 Answers2

3

Use sort_values on multiple columns and defining them as descending with the ascending=False argument.

df = df.sort_values(['Percentage', 'A', 'B'], ascending=[False, False, False])

Output

        id        A        B  Percentage
5  4333311  44 days  16 days        0.56
3  2323322  56 days  46 days        0.42
7  3321767  54 days  42 days        0.42
0  3333311  12 days   6 days        0.41
2  6464544  36 days  33 days        0.36
1  3455572  35 days  31 days        0.36
6  5454566  56 days  41 days        0.25
4  2222111  54 days  44 days        0.25
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • Hey, thank you. But I have a condition also . And I have to take that condition for 2 columns simulatenously. – user9419602 Jul 29 '19 at 10:03
  • Explain your conditions in words and not with your code, since it does not make much sense. My output matches your expected output. So what is the problem exactly? @user9419602 – Erfan Jul 29 '19 at 10:09
  • The condition is : I have many columns in the dataframe. And I have to sort first on the Percentage criteria and when the tie breaker is there then I have to check 2 columns - A and B And A and B columns are related to each other. So suppose I have 3 rows on the tie breaker and now I have to check on A and B columns . – user9419602 Jul 29 '19 at 10:24
  • So I will first match between the rows that the value of A column and B column is greater than the previous row column A and column B value , then that row will be going on the first but if value of one column is greater than the previous value of column and other column value is lesser than the previous row then this is also considered as a tie breaker then we will sort the dataframe on the basis of other column – user9419602 Jul 29 '19 at 10:24
  • Any hints for this ? – user9419602 Jul 29 '19 at 10:42
0

Use df.sort_values(['Percentage','A','B'])

Nithin cp
  • 109
  • 2
  • 3
  • 11
  • Thank you but this is not giving the actual output as we have to check for the condition also. – user9419602 Jul 29 '19 at 09:44
  • What is the condition? Could you be a bit more clearer? – Nithin cp Jul 29 '19 at 09:50
  • Okay suppose I have two rows : And if (A.iloc[1] > A.iloc[2] and B.iloc[1] >B.iloc[2]) then row 1 should come on first position and vice versa. But whenever one of the condition is not satisfied then we will sort the dataframe on the basis of other columns. – user9419602 Jul 29 '19 at 10:00
  • Did you get it ? – user9419602 Jul 29 '19 at 10:04
  • The condition is : I have many columns in the dataframe. And I have to sort first on the Percentage criteria and when the tie breaker is there then I have to check 2 columns - A and B And A and B columns are related to each other. So suppose I have 3 rows on the tie breaker and now I have to check on A and B columns . – user9419602 Jul 29 '19 at 10:24
  • So I will first match between the rows that the value of A column and B column is greater than the previous row column A and column B value , then that row will be going on the first but if value of one column is greater than the previous value of column and other column value is lesser than the previous row then this is also considered as a tie breaker then we will sort the dataframe on the basis of other column – user9419602 Jul 29 '19 at 10:24
  • Any hints for this ? – user9419602 Jul 29 '19 at 10:42