3

I have something like this:

 fromJobtitle         toJobtitle         size
0              CEO                CEO    65
1              CEO     Vice President    23
2              CEO           Employee    56
3   Vice President                CEO   112
4         Employee                CEO    20

I would like to count number of co-occurences so that it combines the double occurences (showing only how many elements there are between the 2)

An example Output:

0              CEO     Vice President   135
1              CEO           Employee    76
2              CEO                CEO    65

3 Answers3

2
import pandas as pd
df = pd.DataFrame({
    'fromJobtitle': ['CEO', 'CEO', 'CEO', 'Vice President', 'Employee'],
    'toJobtitle': ['CEO', 'Vice President', 'Employee', 'CEO', 'CEO'],
    'size': [65, 23, 56, 112, 20]
    })
df['combination'] = df.apply(lambda row: tuple(sorted([
                                                       row['fromJobtitle'], 
                                                       row['toJobtitle']
                                                      ])), axis=1)

then:

df = df.groupby('combination').sum().reset_index()

results:

    combination             size
0   (CEO, CEO)              65
1   (CEO, Employee)         76
2   (CEO, Vice President)   135

finally:

df['from'] = df.apply(lambda row: row['combination'][0], axis=1)
df['to'] = df.apply(lambda row: row['combination'][1], axis=1)
df = df.drop('combination', axis=1)
df.head()

result:

    size    from    to
0   65      CEO     CEO
1   76      CEO     Employee
2   135     CEO     Vice President
Amin Ba
  • 1,603
  • 1
  • 13
  • 38
  • Also, how do you remove mappings to same elements ? (CEO to CEO etc) – darklight213 Jun 03 '21 at 16:59
  • @darklight213 I map the first element of tuple in the `cobination` column to the `to` column and the second one to the `from` column and then drop the `combination` column – Amin Ba Jun 03 '21 at 17:02
2

Try:

df.groupby(lambda x: tuple(sorted(df.loc[x, ['fromJobTitle', 'toJobTitle']]))).sum()

Here is the result:

                       size
(CEO, CEO)               65
(CEO, Employee)          76
(CEO, Vice President)   135
Riccardo Bucco
  • 13,980
  • 4
  • 22
  • 50
2

Here's a different solution:

First create a column which combines the names in alphabetical order

df['titles'] = np.where(df['fromJobtitle']<df['toJobtitle'],df['fromJobtitle']+"|"+df['toJobtitle'],df['toJobtitle']+"|"+df['fromJobtitle'])

0               CEO|CEO
1    CEO|Vice President
2          CEO|Employee
3    CEO|Vice President
4          CEO|Employee
Name: titles, dtype: object

Then group by that name and sum it

df_groups = df.groupby('titles').sum().reset_index()
df_groups

titles              size
CEO|CEO             65
CEO|Employee        76
CEO|Vice President  135

Then just split the combined name into individual pieces

df_groups[['fromJobTitle', 'toJobTitle']] = df_groups['titles'].str.split('|', expand=True)
df_groups

size fromJobTitle   toJobTitle
65    CEO           CEO
76    CEO           Employee
135   CEO           Vice President
Shubham Periwal
  • 2,198
  • 2
  • 8
  • 26