I have age range from 18 to 69(inclusive). I want to plot a histogram to show the distribution of these age values.
But I want the bin_edges on histogram to be integers on x-axis and only cover the range 18 to 69 (inclusive). Not like 15 to 75 etc.
I achieve this using the code below:
data = df['age']
num_bins = 5
bin_width = (max(data) - min(data)) / num_bins
int_bin_edges = [int(min(data) + i * bin_width) for i in range(num_bins + 1)]
plt.hist(data,bins=int_bin_edges,edgecolor='black')
plt.xticks(int_bin_edges)
plt.show()
The problem now is that the bins now have unequal bin widths but the data representation is accurate and I can clearly see that how many data points fall within a certain range represented by a bin.
Is it ok to have unequal bin widths? like 18-28(10 bin width then 28-38(10 bin width) then 38-48(10 bin width) then 48-58(10 bin width) and last 58-69(11 bin width)--> Causing unequal bins widths
Or do you recommend any other solution to this problem?