pandas dataframe add rows that are shuffle of values of specific columns

Question

I have the dataframe:

df = b_150 h_200 b_250 h_300 b_350 h_400  c1  c2 q4
       1.    2.    3.     4    5.    6.   3.  4.  4

I want to add rows with possible shuffles between values of b_150, b_250, b_350 and h_200, h_300, h_400

So for example

df = add_shuffles(df, cols=[b_150, b_250, b350], n=1)
df = add_shuffles(df, cols=[h_200, h_300, h_400], n=1)

I will add 2 combinations (1 for l1 and one for l2) to get:

df = b_150 h_200 b_250 h_300 b_350 h_400   c1  c2 q4
       1.    2.    3.     4    5.    6.    3.  4.  4
       3.    2.    5.     4    1.    6.    3.  4.  4 
       1.    2.    3.     6    5.    4.    3.  4.  4

What is the most efficient way to do it?

Do you want n random shuffles? Or all possible combinations (here 36)? — mozway, Mar 27 '22 at 07:26
@mozway I want to randomly choose n shuffles out of all possible one for the colums cols (that is passed to the function — Cranjis, Mar 27 '22 at 07:30

score 1 · Answer 1 · 2022-03-27T14:48:42.967

1

Try:

def columns_shuffler():
    x, y = random.sample(list(cols), 2)
    if y:
        return random.sample(cols[0], len(cols[0])) + cols[1]
    else:
        return cols[0] + random.sample(cols[1], len(cols[1]))

msk = df.columns.str.contains('b')
msk1 = df.columns.str.contains('h')
cols = dict(enumerate([df.columns[msk].tolist(), df.columns[msk1].tolist()]))
out = pd.concat([df, pd.DataFrame(np.c_[np.r_[[df[columns_shuffler()] 
                                         for _ in range(n)]].reshape(n, -1), 
                                        np.tile(df.loc[:, ~(msk | msk1)], (n,1))], 
                                  columns=cols[0]+cols[1]+df.columns[~(msk|msk1)].tolist())])

edited Mar 27 '22 at 14:48

answered Mar 27 '22 at 08:57

thanks, pls notice that when h_cols are shuffled - b_cols should remain intact, and also the opposite – Cranjis Mar 27 '22 at 09:16
Sorry forgot to mention there could be additional columns that should be kept, please see edit – Cranjis Mar 27 '22 at 10:15
1

@okuoub did it work? – Apr 01 '22 at 15:06

pandas dataframe add rows that are shuffle of values of specific columns

1 Answers1