I have a huge data set, which I would like to lets say bin and plot. Because when I plot the raw data it looks like this.. A very ugly plot:
Based on this I generated the mean, std and the size values with a range of 1 and kicked out NaN values and replaced the index with the following code:
test = df.groupby(pd.cut(df['value'], bins=np.arange(160900)))['ratio'].agg(['mean', 'std', 'size'])
test_filtered = test[test[['mean', 'std', 'size']].notnull().all(1)]
test_filtered.reset_index(level=0, inplace=True)
After that I get this
value mean std size
0 (160088, 160089] 17.5080464 0.0777015 43
1 (160089, 160090] 17.5167586 0.0637891 25
2 (160188, 160189] 17.5099577 0.0892071 13
3 (160189, 160190] 17.4971442 0.0917634 60
4 (160288, 160289] 17.5440752 0.0659020 51
5 (160289, 160290] 17.5638237 0.0615202 64
6 (160290, 160291] 17.5382187 0.0294264 2
7 (160388, 160389] 17.5282669 0.1120136 2
8 (160389, 160390] 17.5479696 0.0794665 64
9 (160390, 160391] 17.5716048 0.0892945 15
10 (160391, 160392] 17.4969686 0.0284094 2
11 (160488, 160489] 17.5587446 0.0449601 5
12 (160489, 160490] 17.5566764 0.0636091 62
13 (160490, 160491] 17.5279026 0.0561810 2
14 (160588, 160589] 17.5922320 0.0126914 2
15 (160589, 160590] 17.5832962 0.0733587 25
16 (160590, 160591] 17.5607141 0.0706487 32
17 (160688, 160689] 17.5186035 0.0773348 6
18 (160689, 160690] 17.5234588 0.0816204 51
19 (160690, 160691] 17.4688810 0.0981311 4
20 (160788, 160789] 17.5797546 0.0264994 6
21 (160789, 160790] 17.5517244 0.0470787 51
22 (160790, 160791] 17.5600856 0.0720480 2
23 (160889, 160890] 17.5355430 0.0584237 34
SO now the question is, how to plot now the mean over the value? I tried some code, but I only get a bunch of Errors... Further, the bins are fixed to 1, but maybe I need another range. Do you know how to specify another "bin window" than 1?
Alternatively do you know a better method how to bin the data with a lets say specific "bin window"?
Thanks in advance, if you know how to fix the problem. ;)
Greets