10

I am working with Python in Bigquery and have a large dataframe df (circa 7m rows). I also have a list lst that holds some dates (say all days in a given month).

I am trying to create an additional column "random_day" in df with a random value from lst in each row.

I tried running a loop and apply function but being quite a large dataset it is proving challenging.

My attempts passed by the loop solution:

df["rand_day"] = ""

for i in a["row_nr"]:
  rand_day = sample(day_list,1)[0]
  df.loc[i,"rand_day"] = rand_day

And the apply solution, defining first my function and then calling it:

def random_day():
  rand_day = sample(day_list,1)[0]
  return day

df["rand_day"] = df.apply(lambda row: random_day())

Any tips on this? Thank you

Jo Costa
  • 421
  • 1
  • 6
  • 17

1 Answers1

14

Use numpy.random.choice and if necessary convert dates by to_datetime:

df = pd.DataFrame({
        'A':list('abcdef'),
        'B':[4,5,4,5,5,4],
})

day_list = pd.to_datetime(['2015-01-02','2016-05-05','2015-08-09'])
#alternative
#day_list = pd.DatetimeIndex(['2015-01-02','2016-05-05','2015-08-09'])

df["rand_day"] = np.random.choice(day_list, size=len(df))
print (df)
   A  B   rand_day
0  a  4 2016-05-05
1  b  5 2016-05-05
2  c  4 2015-08-09
3  d  5 2015-01-02
4  e  5 2015-08-09
5  f  4 2015-08-09
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    I have a follow up question to the above @jezrael - how can I create a list of values and then add them to a dataframe with a given distribution? The above works to randomly add in the elements of a list, but say I have a list of values [50, 40, 30, 20, 10] is there a way to assign x% of my df the 50 value, y% 40, z% 30 etc... or assign them to the dataframe in a normal distribution across the len(df)? – CJ90 May 21 '20 at 16:40
  • 1
    Small note that the numpy docs now recommend using [`numpy.random.Generator.choice`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.choice.html) instead of [`numpy.random.choice`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html) – lazappi Feb 07 '22 at 13:02