0

csv['Followers'] is a column in a pandas df that contains 20k rows of follower count for each tweet collected via twitter API. I am trying to make a histogram separating the data into 4 bins so I can label each row accordingly. However, I am only seeing one bar. Can anyone help with this? Thank You

x = csv['Followers'].astype(int)
print(x)
x.plot.hist(bins = 4)

Failed Attempt Screenshot

NYC Coder
  • 7,424
  • 2
  • 11
  • 24
Josh C
  • 1
  • 1

1 Answers1

0

try bin value to 15,25,50,75,100,200,1000 to see changes.

Step 1: Find the smallest and largest data point. If your smallest and/or largest numbers are not whole numbers, go to Step 2. If they are whole numbers, go to Step 3.

Step 2: Lower the minimum a little and raise the maximum a little. For example, 1.2 as a minimum becomes 1, and 99.9 as a maximum becomes 100.

Step 3: Decide how many bins you need using your best guess and using the guidelines listed in the intro paragraph above.

Step 4: Divide your range (the numbers in your data set) by the bin size you chose in Step 3. For example, if you have numbers that range from 0 to 50, and you chose 5 bins, your bin size is 50/5=10.

Step 5: Create the bin boundaries by starting with your smallest number (from Steps 1 and 2) and adding the bin size from Step 4. For example, if your smallest number is 0 and your bin size is 10 you would have bin boundaries of 0, 10, 20…

For 10 observations in the set, the number of class intervals is:

K = 1 + 3.322 log(10) = 4.322 ≅ 4

For 55 observations in the set, the number of class intervals is:

K = 1 + 3.322 log(55) = 6.781 ≅ 7

In your case

1 + 3.322 log(20000)=15.2880216456 =15(best)

refer:https://www.statisticshowto.com/choose-bin-sizes-statistics/

Zesty Dragon
  • 551
  • 3
  • 18