How to assign values from a list to a pandas dataframe and control the distribution/frequency each list element has in the dataframe

Question

I am building a dataframe and need to assign values from a defined list to a new column in the dataframe. I have found an answer which gives a method to assign elements from a list randomly to a new column in a dataframe here (How to assign random values from a list to a column in a pandas dataframe?).

But I want to be able to control the distribution of the elements in my list within the new dataframe by either assigning a frequency of occurrences or some other method to control how many times each list element appears in the dataframe.

For example, if I have a list my_list = [50, 40, 30, 20, 10] how can I say that for a dataframe (df) with n number of rows assign 50 to 10% of the rows, 40 to 20%, 30 to 30%, 20 to 35% and 10 to 5% of the rows.

Any other method to control for the distribution of list elements is welcome, the above is a simple explanation to illustrate how one way to be able to control frequency may look.

score 1 · Accepted Answer · answered May 21 '20 at 19:02

1

You can use choice function from numpy.random, providing probability distribution.

>>> a = np.random.choice([50, 40, 30, 20, 10], size=100, p=[0.1, 0.2, 0.3, 0.35, 0.05])
>>> pd.Series(a).value_counts().sort_index(ascending=False)
50     9
40    25
30    19
20    38
10     9
dtype: int64

Just put the desired size into size parameter (dataframe's length)

answered May 21 '20 at 19:02

Viacheslav Zhukov

1,130
9
15

so the 'p=...' parameter is what I was missing. I am unfamiliar with the np.random.choice function, probably should've explored that a bit further. Thanks for the answer! – CJ90 May 21 '20 at 19:42

How to assign values from a list to a pandas dataframe and control the distribution/frequency each list element has in the dataframe

1 Answers1