How to generate a cross join table, having one of the IDs from a range in Python?

Question

I have a table with the total number of secondary IDs per each principal ID. So I need to generate a second table in which each row correspond to the combination of primary ID and Secondary ID. For example if for primary ID1 I have 5 as the number of secondary ids I would need 5 rows, each with the same Primary ID and the second ID going from 1 to 5. Doing this with a loop takes a lot of time so I was wondering if there is a more efficient way of doing this without involving loops.

Visual example:

Table

enter image description here

Table Output

enter image description here

score 0 · Answer 1 · answered Oct 15 '20 at 15:11

You can use np.repeat with .groupby:

df_out = pd.DataFrame( np.repeat(df['ID'], df['Number_of_IDs']) )
df_out['ID2'] = df_out.groupby(df_out['ID'])['ID'].transform(lambda x: range(len(x))) + 1

print(df_out)

Prints:

    ID  ID2
0  ID1    1
0  ID1    2
0  ID1    3
1  ID2    1
1  ID2    2
2  ID3    1
3  ID4    1
3  ID4    2

Or:

df_out = pd.DataFrame( np.repeat(df['ID'], df['Number_of_IDs']) )
df_out['ID2'] = 1
df_out['ID2'] = df_out.groupby(df_out['ID'])['ID2'].cumsum()

print(df_out)

score 0 · Answer 2 · answered Oct 15 '20 at 15:20

You can apply lambda function on the number of IDs and then use explode on the dataframe.

df['Number_of_IDs'] = df['Number_of_IDs'].apply(lambda x : range(1, x+1))
df = df.explode('Number_of_IDs'))
df = df.rename(columns={'Number_of_IDs': 'ID2'})

Output-

   ID   ID2
0  ID1    1
0  ID1    2
0  ID1    3
1  ID2    1
1  ID2    2
2  ID3    1
3  ID4    1
3  ID4    2

How to generate a cross join table, having one of the IDs from a range in Python?

2 Answers2