1

So what I want is to randomly chose a given amount of elements on my dataframe, and to those elements, apply an operation (which will be a multiplication by a number which will also be randomly chosen between a range) to an 'eta' column. I'm stuck on achieving this.

atm I've randomly gotten a list of index within my dataframe but what I dont know is how to only apply the multiplication to the elements with those index and not the rest.

Some help would be trully appreciated!

numExtreme = 50 #Amount of registres to apply the random modification.
mod_indices = np.random.choice(df.index, numExtreme, replace=False) #set of "numExtreme" indices randomly chosen.
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Alberto
  • 91
  • 7

2 Answers2

1

Use DataFrame.loc for select random indices and only column eta and multiple by random array between 1 and 50:

np.random.seed(123)
df = pd.DataFrame({'eta':np.random.randint(10, size=10),
                   'another':np.random.randint(10, size=10)})
print (df)
   eta  another
0    2        9
1    2        0
2    6        0
3    1        9
4    3        3
5    9        4
6    6        0
7    1        0
8    0        4
9    1        1

numExtreme = 5 #Amount of registres to apply the random modification.
mod_indices = np.random.choice(df.index, numExtreme, replace=False)

print (mod_indices)
[5 1 0 8 6]

arr = np.random.randint(1, 50, len(mod_indices))
print (arr)
[49  8 42 36 29]

df.loc[mod_indices, 'eta'] *= arr
print (df)
   eta  another
0   84        9
1   16        0
2    6        0
3    1        9
4    3        3
5  441        4
6  174        0
7    1        0
8    0        4
9    1        1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • But I want to also multiply each one by a random number within a range, and using loc would multiply all of them by the same number – Alberto Mar 28 '23 at 08:06
  • 1
    @Alberto - Added sample data, is it what need? – jezrael Mar 28 '23 at 08:14
  • @jezreal But, I have a question, on your solution you onlu have a column, so it will multiply that column, but If I have some other columns (I have some string column, and other numerical columns), how can I indicate that it has to apply the operation to the 'eta' column only? – Alberto Mar 28 '23 at 08:25
  • 1
    @Alberto - If use `df.loc[mod_indices, 'eta'] *= ` - it means multiple only for indices from `mod_indices` and only column `eta`, added to answer for testing. – jezrael Mar 28 '23 at 08:27
1

As you use a multiplication choose between 1 or numExtreme:

df['eta'] *= np.random.choice([1, numExtreme], len(df))

IIUC, to choose a number between 0 and numExtreme, you can also use randint:

df['eta'] *= np.random.choice([1, np.random.randint(numExtreme)], len(df))
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • But if I do that I would multiply all the rows of my dataframe by a random number, what I want to do is to multiply some rows by random numbers, and those rows have to be chosen randomly too – Alberto Mar 28 '23 at 08:08
  • @Alberto. I already updated my answer. Can you check it? Do you want to choose a random number between 0 and numExtreme then choose some random lines and apply the multiplication? – Corralien Mar 28 '23 at 08:09
  • No, I think I'm not explaining well due to my english level. NumExtreme is the amount of rows that I want to multiply, these rows will be randomly chosen. And the multiplication that I want to do to those randomly chosen rows will also be a random number which has to be different for each row. For example, I have a dataset of 10 rows, I want to chose two rows of those (numExtreme=2) and multiply each of those two rows for a random number between a range, but not the same number. – Alberto Mar 28 '23 at 08:18