Frequency Table from All DataFrame Data

Question

Want to generate frequency table from all values in DataFrame. I do not want the values from the index and index can be destroyed.

Sample data:

col_list = ['ob1','ob2','ob3','ob4', 'ob5']
df = pd.DataFrame(np.random.uniform(73.965,74.03,size=(25, 5)).astype(float), columns=col_list)

My attempt based off this answer:

my_bins = [i for i in np.arange(73.965, 74.030, 0.005)]
df2 = df.apply(pd.Series.value_counts, bins=my_bins)

Code crashes, can't find another example that does what I'm trying.

Desired out put is a frequency table with counts for all values in bins. Something like this:

data_range	Frequency
73.965<=73.97	1
73.97<=73.975	0
73.98<=73.985	3
73.99<=73.995	2

And so on.

your code runs fine on my system. Btw, `bins=np.arange(73.965, 74.030, 0.005)` works as well. — Quang Hoang, Feb 07 '23 at 02:07
@QuangHoang - I appreciate the hints to make my code better, thank you! — Programming_Learner_DK, Feb 08 '23 at 01:00

score 1 · Accepted Answer · answered Feb 07 '23 at 02:24

Your approach/code works fine with me.

my_bins = [i for i in np.arange(73.965, 74.030, 0.005)]

out1 = (
        df.apply(pd.Series.value_counts, bins=my_bins)
          .sum(axis=1).reset_index()
          .set_axis(['data_range', 'Frequency'], axis=1)
       )

#32.6 ms ± 803 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Here is different approach (using cut) that seems to be ~12x faster than apply.

my_bins = np.arange(73.965, 74.030, 0.005)

labels = [f"{np.around(l, 3)}<={np.around(r, 3)}"
          for l, r in zip(my_bins[:-1], my_bins[1:])]

out2 = (
        pd.Series(pd.cut(df.to_numpy().flatten(),
                         my_bins, labels=labels))
            .value_counts(sort=False).reset_index()
            .set_axis(['data_range', 'Frequency'], axis=1)
       )

#2.42 ms ± 45.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Output :

print(out2)

       data_range  Frequency
0   73.965<=73.97         16
1   73.97<=73.975          0
2   73.975<=73.98         15
3   73.98<=73.985         12
4   73.985<=73.99          7
..            ...        ...
7    74.0<=74.005          8
8   74.005<=74.01          9
9   74.01<=74.015          7
10  74.015<=74.02          7
11  74.02<=74.025         11

[12 rows x 2 columns]

Frequency Table from All DataFrame Data

1 Answers1