0

How can we split the data as mention below?

Sample Data:

  EmployeeId  City     join_month    Year
0   001        Mumbai        1       2018
1   001        Bangalore     3       2018
2   002        Pune          2       2019
3   002        Mumbai        6       2017
4   003        Delhi         9       2018
5   003        Mumbai        12      2019
6   004        Bangalore     11      2017
7   004        Pune          10      2018
8   005        Mumbai         5      2017

Required Output should be -

   EmployeeId  City     join_month       Year    2018_jan_count   2018_feb_count 2018_march_count 
    0   001        Mumbai        1       2018
    1   001        Bangalore     3       2018
    2   002        Pune          2       2019
    3   002        Mumbai        6       2017
    4   003        Delhi         9       2018
    5   003        Mumbai        12      2019
    6   004        Bangalore     11      2017
    7   004        Pune          10      2018
    8   005        Mumbai         5      2017
manoj kumar
  • 105
  • 5

2 Answers2

0

You can use df.apply

df.apply(pd.value_counts)

This will apply a column based aggregation function (in this case date)

Shabari nath k
  • 920
  • 1
  • 10
  • 23
  • Thanks! But I need to split them as mentioned above and how can split month and year into one column like i have 11/1/2020 I need to split them into one column(month & Year) i.e., jan-2020 @Anti_cse51 – manoj kumar Feb 02 '21 at 05:36
  • does this answer your question ? `https://stackoverflow.com/questions/29836477/pandas-create-new-column-with-count-from-groupby`. You can just replace the column names..It should work – Shabari nath k Feb 02 '21 at 06:02
0

I build the yearmonth value and then use it as a column on the pivot table. I count the employee id aggregated by the city and yearmonth

 months=[(1,'Jan'),(2,'Feb'),(3,'Mar'),(4,'Apr'),(5,'May'),(6,'Jun'),(7,'Jul'),(8,'Aug'),(9,'Sept'),(10,'Oct'),(11,'Nov'),(12,'Dec')]
 employeeId=['001','001','002','002','003','003','004','004','005']
 city=['Mumbai', 'Bangalore','Pune','Mumbai','Delhi','Mumbai','Bangalore','Pune','Mumbai']
 join_month=[1,3,2,6,9,12,11,10,1]
 char_month=[b for item in join_month for a,b in months if item==a ]
 year=[2018, 2018,2019,2017,2018,2017,2017,2018,2018]
 char_yearmonth=[]
 [char_yearmonth.append(str(year[i])+"_"+char_month[i]) for i in range(len(year))]

 df=pd.DataFrame({'EmployeeId': employeeId,'City':city,'YearMonth':char_yearmonth})

 fp=df.pivot_table(index=['City'], columns=['YearMonth'],aggfunc='count').fillna(0)
 print(fp)

 EmployeeId                                                         \
 YearMonth   2017_Dec 2017_Jun 2017_Nov 2018_Jan 2018_Mar 2018_Oct 2018_Sept   
 City                                                                          
 Bangalore        0.0      0.0      1.0      0.0      1.0      0.0       0.0   
 Delhi            0.0      0.0      0.0      0.0      0.0      0.0       1.0   
 Mumbai           1.0      1.0      0.0      2.0      0.0      0.0       0.0   
 Pune             0.0      0.0      0.0      0.0      0.0      1.0       0.0   

                
 YearMonth 2019_Feb  
 City                
 Bangalore      0.0  
 Delhi          0.0  
 Mumbai         0.0  
 Pune           1.0  
Golden Lion
  • 3,840
  • 2
  • 26
  • 35