Here is a simple program that transitively maps sets of columns:
import pandas as pd
df1vals = [{'c1': '1', 'c2': "2"}]
df1 = pd.DataFrame(df1vals, columns = ['c1' , 'c2'])
df2vals = [{'c2': '2', 'c3': "100"}]
df2 = pd.DataFrame(df2vals, columns = ['c2' , 'c3'])
df3vals = [{'c3': '100', 'c4': "x"}]
df3 = pd.DataFrame(df3vals, columns = ['c3' , 'c4'])
df4vals = [{'c1': '1', 'c4': "m"}]
df4 = pd.DataFrame(df4vals, columns = ['c1' , 'c4'])
df5vals = [{'c2': '2', 'c4': "k"}]
df5 = pd.DataFrame(df5vals, columns = ['c2' , 'c4'])
dfs = [df1,df2, df3, df4, df5]
merged_df = dfs[0]
for df in dfs[1:]:
common_cols = list(set(merged_df.columns) & set(df.columns))
merged_df = pd.merge(merged_df, df, on=common_cols, how='outer')
display(merged_df)
This works to produce this output:
c1 c2 c3 c4
0 1 2 100 x
1 1 NaN NaN m
2 NaN 2 NaN k
This data is all good, but it's missing the fact that c1 1 is associated with c4 k through c2 2. So, I would also want this row:
1 2 NaN k
I would think that the same logic that maps 1 -> 2 -> 100 -> x would also work on this, but it does not. Why?