0

I have the dataframe:

df = b_150 h_200 b_250 h_300 b_350 h_400  c1  c2 q4
       1.    2.    3.     4    5.    6.   3.  4.  4

I want to add rows with possible shuffles between values of b_150, b_250, b_350 and h_200, h_300, h_400

So for example

df = add_shuffles(df, cols=[b_150, b_250, b350], n=1)
df = add_shuffles(df, cols=[h_200, h_300, h_400], n=1)

I will add 2 combinations (1 for l1 and one for l2) to get:

df = b_150 h_200 b_250 h_300 b_350 h_400   c1  c2 q4
       1.    2.    3.     4    5.    6.    3.  4.  4
       3.    2.    5.     4    1.    6.    3.  4.  4 
       1.    2.    3.     6    5.    4.    3.  4.  4

What is the most efficient way to do it?

Cranjis
  • 1,590
  • 8
  • 31
  • 64
  • Do you want n random shuffles? Or all possible combinations (here 36)? – mozway Mar 27 '22 at 07:26
  • @mozway I want to randomly choose n shuffles out of all possible one for the colums cols (that is passed to the function – Cranjis Mar 27 '22 at 07:30

1 Answers1

1

Try:

def columns_shuffler():
    x, y = random.sample(list(cols), 2)
    if y:
        return random.sample(cols[0], len(cols[0])) + cols[1]
    else:
        return cols[0] + random.sample(cols[1], len(cols[1]))

msk = df.columns.str.contains('b')
msk1 = df.columns.str.contains('h')
cols = dict(enumerate([df.columns[msk].tolist(), df.columns[msk1].tolist()]))
out = pd.concat([df, pd.DataFrame(np.c_[np.r_[[df[columns_shuffler()] 
                                         for _ in range(n)]].reshape(n, -1), 
                                        np.tile(df.loc[:, ~(msk | msk1)], (n,1))], 
                                  columns=cols[0]+cols[1]+df.columns[~(msk|msk1)].tolist())])