0

I have the table below (please see image). What I want to do is look at each department and determine which individual in every single department could serve as a subject matter expert for that department and explain to me what their department does. Therefore, what I did was group the DataFrame by Department ID then split it out into multiple DataFrames. I wrote some code to essentially say if there is an individual that 80% of the people in a given department are reporting up to, then use that individual. If not, NaN. The code is working as I would like, the issue that I am running into is that I can't get the DataFrames to recombine. i.e. when I export the DataFrame to CSV, it only exports one of the DataFrames. I think the information below should suffice, but if any more information is needed please let me know. Any help is greatly appreciated!

Raw Data

Current iteration of code:

import pandas as pd
import openpyxl

#Reads in the data.
df = pd.read_excel('C:/Users/WorleyMG/OneDrive - Crowe LLP/Desktop/Test/Test1.xlsx')

#Breaks the data out among the Department IDs.
df_group = df.groupby('Department ID')
df_split = [df_group.get_group(x) for x in df_group.groups]

#Figures out who has the most reports among teh different departments.
for df in df_split:
    df_count_mgr_id = df['Manager ID'].value_counts()
    df2 = pd.DataFrame(df_count_mgr_id)
    df2['percent'] = (df2['Manager ID'] / df2['Manager ID'].sum())
    df2 = df2.reset_index()
    df2 = df2.rename(columns={'index':'manager id','Manager ID':'count'})
    df2.loc[df2['percent'] >= .8, 'SME'] = 'This is the SME'
    df2 = df2[df2['SME'] == 'This is the SME']
    df = pd.concat((df2['manager id'],df), axis=0)
    df = df.rename(columns={0: 'manager id'})
    copy_down = df['manager id'].iloc[0]
    df['manager id']=copy_down
    df = df.dropna(subset=['ID'])
    print(df)

Current output:

       ID     Name  Department ID  ...  3rd Manager ID 3rd Manager Name  manager id
0  7838.0   Batman        11640.0  ...             NaN              NaN      7838.0
1  8636.0    Jacob        11640.0  ...          5356.0          William      7838.0
2  7618.0   Jingle        11640.0  ...          5356.0          William      7838.0
3  7848.0   Heimer        11640.0  ...          5356.0          William      7838.0
4  7872.0  Schmidt        11640.0  ...          5356.0          William      7838.0
5  7532.0    Sarah        11640.0  ...          5356.0          William      7838.0
6  7878.0   Connor        11640.0  ...          5356.0          William      7838.0
7  8500.0   Austin        11640.0  ...          5356.0          William      7838.0
8  7926.0   Powers        11640.0  ...          5356.0          William      7838.0

[9 rows x 10 columns]
        ID    Name  Department ID  ...  3rd Manager ID 3rd Manager Name  manager id
9   8222.0  Freddy        12122.0  ...          7504.0            Chris         NaN
10  9102.0  Kruger        12122.0  ...          7504.0            Chris         NaN
11  6020.0    Tony        12122.0  ...          5356.0          William         NaN
12  7608.0   Stark        12122.0  ...          5356.0          William         NaN
13  7920.0   Bruce        12122.0  ...          5356.0          William         NaN
14  6636.0   Wayne        12122.0  ...          5356.0          William         NaN
    
[6 rows x 10 columns]

Desired Output:

       ID     Name  Department ID  ...  3rd Manager ID 3rd Manager Name  manager id
0  7838.0   Batman        11640.0  ...             NaN              NaN      7838.0
1  8636.0    Jacob        11640.0  ...          5356.0          William      7838.0
2  7618.0   Jingle        11640.0  ...          5356.0          William      7838.0
3  7848.0   Heimer        11640.0  ...          5356.0          William      7838.0
4  7872.0  Schmidt        11640.0  ...          5356.0          William      7838.0
5  7532.0    Sarah        11640.0  ...          5356.0          William      7838.0
6  7878.0   Connor        11640.0  ...          5356.0          William      7838.0
7  8500.0   Austin        11640.0  ...          5356.0          William      7838.0
8  7926.0   Powers        11640.0  ...          5356.0          William      7838.0
9   8222.0  Freddy        12122.0  ...          7504.0            Chris         NaN
10  9102.0  Kruger        12122.0  ...          7504.0            Chris         NaN
11  6020.0    Tony        12122.0  ...          5356.0          William         NaN
12  7608.0   Stark        12122.0  ...          5356.0          William         NaN
13  7920.0   Bruce        12122.0  ...          5356.0          William         NaN
14  6636.0   Wayne        12122.0  ...          5356.0          William         NaN
    
[15 rows x 10 columns]
Scott Thompson
  • 22,629
  • 4
  • 32
  • 34
  • Welcome to Stack Overflow! Please, see how to provide a [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). Also, [Why should I not upload images of code/data/errors when asking a question?](https://meta.stackoverflow.com/q/285551/14627505) – Vladimir Fokow Aug 17 '22 at 02:53
  • Does this answer your question? [Concatenate rows of two dataframes in pandas](https://stackoverflow.com/questions/28135436/concatenate-rows-of-two-dataframes-in-pandas) – Kraigolas Aug 17 '22 at 02:56
  • 2
    Why do you split the data in the first place? Just write a function `f` that does the operations you want inputting and returning a `df`, then `df2 = df.groupby('Department ID').apply(f)` – mozway Aug 17 '22 at 04:33
  • @mozway - Thank you for the advice! I spent some time trying to find an example of what you have described above and am unable to find one. Do you have a source you are referencing or would you be able to provide any additional information? Thanks again! – Matt W. Aug 17 '22 at 11:06

0 Answers0