Idea is convert both columns/Series to DataFrames by Series.str.split
, then use cross join by DataFrame.merge
, get differences and sum them , get rows with minimal diffs by DataFrameGroupBy.idxmin
and DataFrame.loc
, last join together with DataFrame.pop
for use and drop columns:
df11 = df1['col'].str.split('x', expand=True).astype(int)
df22 = df2['col'].str.split('x', expand=True).astype(int)
df = df11.assign(a=1).merge(df22.assign(a=1), on='a')
df['diff'] = df['0_x'].sub(df['0_y']).abs() + df['1_x'].sub(df['1_y']).abs()
df = df.loc[df.groupby(['0_x','1_x'])['diff'].idxmin()]
df['a'] = df.pop('0_x').astype(str).str.cat(df.pop('0_y').astype(str), 'x')
df['b'] = df.pop('1_x').astype(str).str.cat(df.pop('1_y').astype(str), 'x')
print (df)
a diff b
1 320x320 0 480x480
28 382x320 197 215x350
16 480x480 0 320x320
45 676x640 76 320x360
59 694x640 94 320x360
106 1080x1280 1196 2020x1024
78 1080x1280 1299 2123x1024
97 2094x1600 674 1080x900
Similar idea with euclidean distance, with sample data same output:
df11 = df1['col'].str.split('x', expand=True).astype(int)
df22 = df2['col'].str.split('x', expand=True).astype(int)
df = df11.assign(a=1).merge(df22.assign(a=1), on='a')
df['diff'] = np.sqrt(df['0_x'].sub(df['0_y']) ** 2 + df['1_x'].sub(df['1_y']) ** 2)
df = df.loc[df.groupby(['0_x','1_x'])['diff'].idxmin()]
df['a'] = df.pop('0_x').astype(str).str.cat(df.pop('0_y').astype(str), 'x')
df['b'] = df.pop('1_x').astype(str).str.cat(df.pop('1_y').astype(str), 'x')
print (df)
a diff b
1 320x320 0.000000 480x480
30 382x480 143.627992 215x320
16 480x480 0.000000 320x320
45 676x640 53.814496 320x360
59 694x640 67.201190 320x360
106 1080x1280 1015.881883 2020x1024
78 1080x1280 1117.050133 2123x1024
97 2094x1600 525.771814 1080x900
Another numpy solution:
df11 = df1['col'].str.split('x', expand=True).astype(int)
df22 = df2['col'].str.split('x', expand=True).astype(int)
a1 = np.sqrt(np.square(df11[0].to_numpy()[:, None] - df22[0].to_numpy()) +
np.square(df11[1].to_numpy()[:, None] - df22[1].to_numpy()))
df1['b1'] = df2['col'].to_numpy()[np.argmin(a1, axis=1)]
a2 = (np.abs(df11[0].to_numpy()[:, None] - df22[0].to_numpy()) +
np.abs(df11[1].to_numpy()[:, None] - df22[1].to_numpy()))
df1['b2'] = df2['col'].to_numpy()[np.argmin(a2, axis=1)]
print (df1)
col b1 b2
1 320x480 320x480 320x480
2 480x320 480x320 480x320
3 382x215 480x320 320x350
4 676x320 640x360 640x360
5 694x320 640x360 640x360
6 1080x2123 1280x1024 1280x1024
7 2094x1080 1600x900 1600x900
8 1080x2020 1280x1024 1280x1024