2

Using Pandas DataFrames, I am trying to implement self join on a data table with shop ids, product ids, and levels. The Shop and Product ID determine a level. Unknown levels are indicated by null. The problem is as follows: For Shop ID A, find the level of the product in other shops. For products not sold by other shops, report a null value. For products which are not sold by Shop A, report a null value.

I have provided the complete pandas code below as well as the expected solution.

from io import StringIO
import pandas as pd

SOURCE_DATA = StringIO("""
shop_id,product_id,level
A,p,1
A,q,2
A,r,3
B,p,2
B,q,1
B,s,3
C,p,3
C,q,1
C,t,2
D,p,3
D,q,3
D,r,3
E,s,1
E,t,2
E,u,3
""")

EXPECTED = StringIO("""
target_shop_id,shop_id,product_id,target_level,level
A,B,p,1,2
A,B,q,2,1
A,B,r,3,NULL
A,B,s,NULL,3
A,C,p,1,3
A,C,q,2,1
A,C,r,3,NULL
A,C,t,NULL,2
A,D,p,1,3
A,D,q,2,3
A,D,r,3,3
A,E,p,1,NULL
A,E,q,2,NULL
A,E,r,3,NULL
A,E,s,NULL,1
A,E,t,NULL,2
A,E,u,NULL,3
""")

df_source = pd.read_csv(SOURCE_DATA, sep=",")
df_expected = pd.read_csv(EXPECTED, sep=",")

print(df_source)
print(df_expected)

id = 'A'
df_target = df_source[(df_source['shop_id'] == id)]
df_non_target = df_source[(df_source['shop_id'] != id)]

df_result = df_target.merge(df_non_target, how="outer", on="product_id")
print(df_result)

Expected Result:

   target_shop_id shop_id product_id  target_level  level
0               A       B          p           1.0    2.0
1               A       B          q           2.0    1.0
2               A       B          r           3.0    NaN
3               A       B          s           NaN    3.0
4               A       C          p           1.0    3.0
5               A       C          q           2.0    1.0
6               A       C          r           3.0    NaN
7               A       C          t           NaN    2.0
8               A       D          p           1.0    3.0
9               A       D          q           2.0    3.0
10              A       D          r           3.0    3.0
11              A       E          p           1.0    NaN
12              A       E          q           2.0    NaN
13              A       E          r           3.0    NaN
14              A       E          s           NaN    1.0
15              A       E          t           NaN    2.0
16              A       E          u           NaN    3.0

My Result:

   shop_id_x product_id  level_x shop_id_y  level_y
0          A          p      1.0         B        2
1          A          p      1.0         C        3
2          A          p      1.0         D        3
3          A          q      2.0         B        1
4          A          q      2.0         C        1
5          A          q      2.0         D        3
6          A          r      3.0         D        3
7        NaN          s      NaN         B        3
8        NaN          s      NaN         E        1
9        NaN          t      NaN         C        2
10       NaN          t      NaN         E        2
11       NaN          u      NaN         E        3

My result is missing 5 rows where the 'level_y' is NaN.

Any suggestions as to how to fix my code?

Rob R
  • 21
  • 2

0 Answers0