0

I am feeling quite dumb to ask it there but it is bothering me since a day or so ... I have developed the EM algorithm, in order to approximate a bimodal graph with a mixture of Gaussians function, so this algorithm return me the best mean / scale to use for each Gaussian law, and the weight of each laws in the Gaussian mixture.

I am generating a set of points following this normal law, which is an unordered list, looking like this:

probabilities = [11.140773523849642, 4.46967411994397, 2.9845155606082128, 4.419381681516789, 2.9013319483609785, 1.240943279030751, 0.5527225927933941, 6.521762109511402, 5.0838857450649275, 7.22416091793929, 7.708476274313367
, 3.400862091414172, 5.309143788487473, 3.291903306564958, 8.219220567906923, 9.28251312280537, 5.756527435192811, 6.366828453732332, 7.650882460361641, 4.415611358046998, 2.770399408745053, 15.042873350009614, 7.432490645081263
, 2.156658281070685, 7.7385977648371576, 2.763962238550341, 2.3274606159747737, 6.843009378453477, 4.809080984042136, 2.2419994302113686, 5.166937390203131, 1.7374639743668725, 9.513892488528096, 11.671618814779752, 3.7908835798529865
, 7.3618182593063715, -0.4988551537419097, 4.340143260727068, 0.39000277136649153, 4.309033069871369, 6.341859284697849, 6.614984995785273, 7.064947672232784, 8.617868805757066, 14.311479061377447, -2.557407817786438, 4.317808825622402
, 0.8914674001923482, 4.8502372139935686, 4.661350050585709, 11.964055805692936, 12.712706738790857, 7.607274979881618, 10.514296741569105, 8.340863294256486, 3.369278998535397, 4.113286882565437, 8.864041070325223, 7.632896203315526
, 7.772172307769646, 3.130089802821244, 7.649463362936439, 5.573949174780778, 7.3368028720535, 9.282688590033873, 9.514781150353496, 6.358875662879916, 7.3298483651508635, 3.936572664094797, 7.798860141431769, 6.6721797819684125, 5.953232555833141
, 13.633830280270168, 5.20833247375714, 3.8258674845071243, 7.075294142390803, 5.1404831417882555, 6.571932650027719, 9.265563643288782, 2.7621131753987105, 7.323891123937256, 9.649292857396894, 1.1358491942097384, 9.850782863622442, 5.185392936240422
, 8.429380574259401, 11.138837663809847, 4.831176524579094, 3.9684180086907914, 1.972944947897461, 7.360544591066922, 8.057646877461318, 4.531662599943007, 6.1719788684651, 4.310418956896079, 6.739391572783465, 8.753247921488397, 12.151131966194196
, 7.401395134982768, 5.678642089476471, 3.229726380709782, 8.897309202709783, 1.3568813355743616, 9.635245016963387, 3.9053564093596167, -0.16269248951346071, 5.889837693166381, 2.32554054242099, 9.3168961092242, 9.9837775188, 6.691579402792799
, 6.960218167936941, 9.723678253753178, 11.761014802805606, 5.238731998428516, 7.115670665738658, 5.232147042111274, 8.517606481431613, 6.93065253523378, 8.283640433945022, 4.065774772657084, 8.2485608277015, 4.688627413195853, 10.237982833055758
, 7.354896754342418, 4.479680153503405, 2.5341799496329602, 9.806804277297294, 2.017720575270583, 0.4967892254628987, 10.456791361028099, 4.06454522278911, 9.774581384664836, 4.030113768037493, 11.7977582410657, -2.8115025654479, 4.697449277148616
, 12.001856914828505, 8.577210567787047, 9.17855420888803, 4.149947066252863, 9.246913843716163, 1.5673371671927239, 5.3368376993574875, 4.809889514876936, 5.241156817514549, 12.104605906201611, 9.007434227535827, 7.440245727624712, 1.4287722762358603
, 7.54003798087395, 6.978683720922006, 8.716622266623464, 4.127875121767966, 2.417254469671048, 5.671870858036384, 9.917075995980944, 7.4454585414498835, 9.363426763767578, 2.029307377765554, 4.014842669892369, 3.6441589238817564, 1.8556319830187666
, 3.4102775313770324, 7.479393669667967, 7.3196504839344305, 4.208453267867538, 1.843436585802852, 5.930024050617857, 6.488276365115011, 4.198748449662057, 6.888521964160391, 9.300118155015578, 3.1789495839655655, 4.1879900480863395, 7.410442873000186
, 4.282158345785579, 2.6350817771747908, 2.7455284415518793, 7.362768191200179, 10.58260994814674, 6.543987535825946, 5.864688486956145, 4.934248966426946, 9.229065070852101, 11.279002194745184, 9.940060268658033, 0.7396792604288569, 7.3824015070125935
, 6.775491294300273, 1.7625221112576668, 5.418925441788629, 9.750887598610085, 2.387482368182369, 7.420889767169091, 5.160151668455128, 4.810398973970154, 0.42276731843979753, 14.69601461447657, 10.08275531787912]

When I am plotting it with matplotlib and using the following functions:

x =np.linspace(-6.339576693117235 * 2, 6.339576693117235 * 2, 200) # Centered on mean
plt.plot(x, probabilities)
plt.show()

I am obviously getting some nonsense graph like this one: enter image description here

Which is not a normal law at all ... Does it exists some way to give my existing datas to some function and then compute a normal law ? I have seen that Scipy is providing a normal law function, but that is not exactly what I am looking at, as I have already have a set of data.

EDIT: As Evert asked just belowed, I have compute the histogram of these values, getting this histogram (sorry for the quality):

enter image description here

But what I would like to have is in fact the bell curve, instead of the histogram. Is it possible to do so ? Thank you very much for the help.

Ecterion
  • 161
  • 3
  • 19
  • The scatter looks normally distributed to me. Perhaps you want a histogram instead? –  Mar 31 '17 at 01:55
  • I would like a nice Gaussian curve in fact, thats what bothering me. I am surely doing something wrong but I don't know what ... – Ecterion Mar 31 '17 at 01:58
  • A bell curve? Then you'll indeed want to create a histogram instead. That'll probably show you a mean of around 5 and a width of around 3. –  Mar 31 '17 at 01:59
  • Oh, but isn't it possible to go through a bell curve ? In fact I have already gathered data from an histogram to go to a bell curve, I would like one to fit my original histogram, which is the goal of EM algorithm. I will edit my post, to be a bit more clear. – Ecterion Mar 31 '17 at 02:01
  • Your data does not follow a Gaussian curve, it follows a Gaussian distribution; not the same. You'll need to bin the data first, and fit the resulting bin numbers to a Gaussian curve. You may want to step back, grab some paper and a pencil or a colleague, and think about what your data actually is. –  Mar 31 '17 at 02:13
  • 1
    I didn't ask for a histogram, but yes, that histogram *is* your bell curve; you can fit the histogram numbers to `a exp(-bx^2)`. –  Mar 31 '17 at 02:14
  • 2
    Check out: http://stackoverflow.com/questions/20011122/fitting-a-normal-distribution-to-1d-data – Robbie Mar 31 '17 at 02:17
  • Oh, thank you for the links and tips, that is exactly what I was looking for ! – Ecterion Mar 31 '17 at 02:24

0 Answers0