0

I have got 7 dataframes with same column names in python, but row indices are largely different and only few of the row indices match.

df1 
      col1 col2 col3
a     123  456  786 
b     121  454  787
c     111  444  777

df2 
      col1 col2 col3
d     13  46    86
e     11  44    87
c     11  44    77

df3 
      col1  col2  col3
d     1231  4567  1786
g     1214  4546  1787
h     1115  4445  1777
c     12    12    10

Answer should be:

newdf
newcol col1  col2  col3  dfcol
c      111   444   777   1
c      11    44    77    2
c      12    12    10    3
d      13    46    86    2
d      1231  4567  1786  3

It is similar to this How to get the common index of two pandas dataframes? but not exactly.

AAA
  • 332
  • 1
  • 10

1 Answers1

1

IIUC

This is my beginner way of doing it.

import pandas as pd

Create dataframe

df1 = pd.DataFrame({
  'id': ['a', 'b', 'c'],
  'col1': [123, 121, 111],
  'col2': [456, 454, 444],
  'col3': [786, 787, 777],
})

df2 = pd.DataFrame({
  'id': ['d', 'e', 'c'],
  'col1': [13, 46, 86],
  'col2': [11, 44, 87],
  'col3': [11, 44, 77],
})

df3 = pd.DataFrame({
  'id': ['d', 'g', 'h', 'c'],
  'col1': [1231, 1214, 1115, 12],
  'col2': [4567, 4546, 4445, 12],
  'col3': [1786, 1787, 1777, 10],
})

Create dataframe list as well as giving name to each dataframe

df_collection = [df1, df2, df3]
df1.name = 1
df2.name = 2
df3.name = 3

Create list:

id_col = [[df.loc[i, 'id'], df.loc[i, 'col1'], df.loc[i, 'col2'], df.loc[i, 'col3'], df.name] for df in df_collection for i in range(len(df))]

Construct new dataframe:

df = pd.DataFrame(id_col)
df.columns = ['id', 'col1', 'col2', 'col3', 'df']

Remove rows with only 1 occurence:

new_df = df[df.groupby('id').id.transform('count') > 1].sort_values('id')

This will give us the following:

    id  col1   col2   col3  df
2   c   111    444    777   1
5   c   86     87     77    2
9   c   12     12     10    3
3   d   13     11     11    2
6   d   1231   4567   1786  3
garagnoth
  • 214
  • 2
  • 10
  • This worked! But I had to convert index column into a new column in all dfs. Thanks! – AAA Oct 03 '21 at 21:54