0

I am building a dataframe and need to assign values from a defined list to a new column in the dataframe. I have found an answer which gives a method to assign elements from a list randomly to a new column in a dataframe here (How to assign random values from a list to a column in a pandas dataframe?).

But I want to be able to control the distribution of the elements in my list within the new dataframe by either assigning a frequency of occurrences or some other method to control how many times each list element appears in the dataframe.

For example, if I have a list my_list = [50, 40, 30, 20, 10] how can I say that for a dataframe (df) with n number of rows assign 50 to 10% of the rows, 40 to 20%, 30 to 30%, 20 to 35% and 10 to 5% of the rows.

Any other method to control for the distribution of list elements is welcome, the above is a simple explanation to illustrate how one way to be able to control frequency may look.

CJ90
  • 99
  • 1
  • 10

1 Answers1

1

You can use choice function from numpy.random, providing probability distribution.

>>> a = np.random.choice([50, 40, 30, 20, 10], size=100, p=[0.1, 0.2, 0.3, 0.35, 0.05])
>>> pd.Series(a).value_counts().sort_index(ascending=False)
50     9
40    25
30    19
20    38
10     9
dtype: int64

Just put the desired size into size parameter (dataframe's length)

Viacheslav Zhukov
  • 1,130
  • 9
  • 15
  • so the 'p=...' parameter is what I was missing. I am unfamiliar with the np.random.choice function, probably should've explored that a bit further. Thanks for the answer! – CJ90 May 21 '20 at 19:42