0

I have a DataFrame consisting of two columns as follows:

col1      col2
0.33      4.33
0.21      4.89
3.2       18.78
6.22      0.05
6.0       2.1
...       ...
...       ...

Now I would like to create a 200 x 200 numpy array by binning both columns. The x-axis should be col1 and the y-axis should be col2. col1 should be binned logarithmically from 0 to 68 and col2 logarithmically from 0 to 35. I would like to use logarithmic binning because there are more smaller values than larger values (i.e. the bins are getting larger with larger values). The 200 x 200 array should then store the amount of samples in each bin (i.e. the count).

Is this possible to do in an efficient way?

machinery
  • 5,972
  • 12
  • 67
  • 118

1 Answers1

1

Something like this might work for you... (note that you have to choose how close to zero the lower end is):

bins1 = np.logspace(np.log10(0.001), np.log10(68), num=201)
bins2 = np.logspace(np.log10(0.001), np.log10(35), num=201)

result = np.histogram2d(df['col1'], df['col2'], bins=[bins1, bins2])

...where result[0] are the counts in the bins, and result[1] and result[2] are the bin edges (the same as bins1 and bins2)

Rick M
  • 1,012
  • 1
  • 7
  • 9