2

I have a dataframe with 3 columns including A , B , C .i need to create columns using B,C columns as shown in the output section below.

dataframe:

A       B        C      
C_1    pink     1971    
C_1    pink     1972    
C_1    blue     1972    
C_1    red      1973    
C_1    pink     1973    
C_1    white    1974    

output:

Here for the first row , the C year is 1971 which is the least one and b column count is 1 so for this row the output should be 1 for B_cnt_C-0 and further it should be 0 as we dont info before 1971 but for 2nd and 3rd row v have info for 1971 and 1972.so output should be as shown below for 2nd and 3rd row and same procedure for all remaining rows.

A       B        C       B_cnt_C-0  B_cnt_C-1  B_cnt_C-2    B_cnt_C-3
C_1    pink     1971         1         0            0           0
C_1    pink     1972         2         1            0           0
C_1    blue     1972         2         1            0           0
C_1    red      1973         2         2            1           0
C_1    pink     1973         2         2            1           0
C_1    white    1974         1         2            2           1   

1 Answers1

0

Use:

#get number of unique values per groups
s = df.groupby(['A','C'])['B'].nunique()
#get max number of groups
a = df.groupby('A')['C'].nunique().max()
#shift per groups and join together
df1 = pd.concat([s.groupby(level=0).shift(x).fillna(0, downcast='int') 
                 for x in range(a)], axis=1)
#new columns names
df1.columns = ['B_cnt_C-{}'.format(x) for x in range(len(df1.columns))]
print (df1)
          B_cnt_C-0  B_cnt_C-1  B_cnt_C-2  B_cnt_C-3
A   C                                               
C_1 1971          1          0          0          0
    1972          2          1          0          0
    1973          2          2          1          0
    1974          1          2          2          1
#join to original
df = df.join(df1, on=['A','C'])
print (df)
     A      B     C  B_cnt_C-0  B_cnt_C-1  B_cnt_C-2  B_cnt_C-3
0  C_1   pink  1971          1          0          0          0
1  C_1   pink  1972          2          1          0          0
2  C_1   blue  1972          2          1          0          0
3  C_1    red  1973          2          2          1          0
4  C_1   pink  1973          2          2          1          0
5  C_1  white  1974          1          2          2          1

Solution with more groups:

print (df)
      A      B     C
0   C_1   pink  1971
1   C_1   pink  1972
2   C_1   blue  1972
3   C_1    red  1973
4   C_1   pink  1973
5   C_1  white  1974
6   C_2   pink  1975
7   C_2   pink  1976
8   C_3   blue  1976
9   C_3    red  1978
10  C_3   pink  1979
11  C_3  white  1979

s = df.groupby(['A','C'])['B'].nunique()
a = df.groupby('A')['C'].nunique().max()
df1 = pd.concat([s.groupby(level=0).shift(x).fillna(0, downcast='int') 
                 for x in range(a)], axis=1)

df1.columns = ['B_cnt_C-{}'.format(x) for x in range(len(df1.columns))]
df = df.join(df1, on=['A','C'])
print (df)
      A      B     C  B_cnt_C-0  B_cnt_C-1  B_cnt_C-2  B_cnt_C-3
0   C_1   pink  1971          1          0          0          0
1   C_1   pink  1972          2          1          0          0
2   C_1   blue  1972          2          1          0          0
3   C_1    red  1973          2          2          1          0
4   C_1   pink  1973          2          2          1          0
5   C_1  white  1974          1          2          2          1
6   C_2   pink  1975          1          0          0          0
7   C_2   pink  1976          1          1          0          0
8   C_3   blue  1976          1          0          0          0
9   C_3    red  1978          1          1          0          0
10  C_3   pink  1979          2          1          1          0
11  C_3  white  1979          2          1          1          0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank u very much for the reply i will check u r solution and let u know if it is working or not for me –  Jan 12 '19 at 06:54
  • @ram - sure, if not, let me know what is problem. – jezrael Jan 12 '19 at 06:55
  • accepted it , i have another question , i want to post it but it says to wait for 90 mins to post. once i post the other one i will ping u the question url here.can u pls look into it –  Jan 12 '19 at 07:23
  • @ram - Are you sure? there is no green tick. – jezrael Jan 12 '19 at 07:23
  • yes jezrael , it says like less tha 15 reputation is recorded , it might take some time i guess and i 'm new to stackoverflow –  Jan 12 '19 at 08:41
  • @ram To mark an answer as accepted, click on the check mark beside the answer to toggle it from hollow to green. Then get +2points, so possible upvote - change `0` to `1` (upvoting) – jezrael Jan 12 '19 at 08:43
  • i think now it is showing as green , jezrael i have another question which i posted , can i share the question url here so that u can answer it. –  Jan 12 '19 at 08:57
  • pls check this question : https://stackoverflow.com/questions/54157922/pandas-get-rolling-data-for-past-year-to-check-a-list-of-values-for-a-column-and –  Jan 12 '19 at 09:01
  • kindly let me know if u have need any clarification –  Jan 12 '19 at 09:02
  • ,Are you there ? –  Jan 12 '19 at 09:24
  • @ram - Sorry, my problem is not understand it :( – jezrael Jan 12 '19 at 10:49
  • https://stackoverflow.com/questions/54157922/pandas-get-rolling-data-for-past-year-to-check-a-list-of-values-for-a-column-and –  Jan 12 '19 at 11:00
  • did u visited the above question url ,i can explain it in more detailed manner if you want –  Jan 12 '19 at 11:01