2

I have the following data:

import pandas as pd
from plotnine import *

gd_sp_tmp = pd.DataFrame({ 'variable': {0: 'var1', 1: 'var1', 2: 'var1', 3: 'var1', 4: 'var1', 5: 'var1', 6: 'var1', 7: 'var1', 8: 'var1', 9: 'var1', 10: 'var1', 11: 'var1', 12: 'var1', 13: 'var1', 14: 'var1', 15: 'var1', 16: 'var1', 17: 'var1', 18: 'var1', 19: 'var1', 20: 'var1', 21: 'var1', 22: 'var1', 23: 'var1', 24: 'var1', 25: 'var1', 26: 'var1', 27: 'var1', 28: 'var1', 29: 'var1', 30: 'var1', 31: 'var1', 32: 'var1', 33: 'var1', 34: 'var1', 35: 'var1', 36: 'var1', 37: 'var1', 38: 'var1', 39: 'var1', 40: 'var1', 41: 'var1', 42: 'var1', 43: 'var1', 44: 'var1', 45: 'var1', 46: 'var1', 47: 'var1', 48: 'var1', 49: 'var1', 50: 'var2', 51: 'var2', 52: 'var2', 53: 'var2', 54: 'var2', 55: 'var2', 56: 'var2', 57: 'var2', 58: 'var2', 59: 'var2', 60: 'var2', 61: 'var2', 62: 'var2', 63: 'var2', 64: 'var2', 65: 'var2', 66: 'var2', 67: 'var2', 68: 'var2', 69: 'var2', 70: 'var2', 71: 'var2', 72: 'var2', 73: 'var2', 74: 'var2', 75: 'var2', 76: 'var2', 77: 'var2', 78: 'var2', 79: 'var2', 80: 'var2', 81: 'var2', 82: 'var2', 83: 'var2', 84: 'var2', 85: 'var2', 86: 'var2', 87: 'var2', 88: 'var2', 89: 'var2', 90: 'var2', 91: 'var2', 92: 'var2', 93: 'var2', 94: 'var2', 95: 'var2', 96: 'var2', 97: 'var2', 98: 'var2', 99: 'var2'}, 'value': {0: 0.6058597809345508, 1: 0.5793863580299581, 2: 0.8464980992038321, 3: 0.24855227431181698, 4: 1.8852877490212698, 5: 0.4234171954404873, 6: 0.3435477323074209, 7: 3.358464370031963, 8: 0.5253401196517882, 9: 2.358632857360592, 10: 0.15960003602748035, 11: 0.2882705893127418, 12: 1.0995070639266127, 13: 0.3492611123700738, 14: 0.656410247866536, 15: 1.7926397942332677, 16: 0.2809984468410994, 17: 2.146319743864339, 18: 1.6912849075574694, 19: 1.233812138850312, 20: 0.21044290817060624, 21: 0.7130666643073327, 22: 0.521102906290718, 23: 0.8191663841868542, 24: 0.20231016020355008, 25: 1.542239677553837, 26: 0.07752167395995535, 27: 0.07661799644296931, 28: 0.13728522388491152, 29: 1.4268916808352554, 30: 1.2219293081314697, 31: 1.089318287649674, 32: 0.5889304040483466, 33: 3.871173476569569, 34: 0.2571045126240674, 35: 0.27332795371650104, 36: 1.2121464473427577, 37: 2.0229834870080117, 38: 0.5538327169626888, 39: 0.3354345395246616, 40: 0.39169801317212116, 41: 1.0415690828271393, 42: 0.9584774133158281, 43: 0.13738535777663943, 44: 1.874003757544322, 45: 1.7852374480589213, 46: 1.6370785639935181, 47: 0.8738310745465996, 48: 0.4777945179886022, 49: 0.7289840311727211, 50: 0.7922955784270402, 51: 0.9104711980757718, 52: 1.5561240516907253, 53: 0.3303774972464219, 54: 2.110632552079527, 55: 0.49383897345236455, 56: 0.5328351983603986, 57: 1.101045960316634, 58: 0.6511245820579645, 59: 1.1162218482680217, 60: 1.1528904383298124, 61: 0.34335972679097204, 62: 1.018800464369946, 63: 0.5416579415333236, 64: 1.214519609326636, 65: 0.23298089233642374, 66: 1.2353245009353024, 67: 0.41366066807689983, 68: 0.3922217060873213, 69: 0.47724897903224234, 70: 1.2372675447604105, 71: 0.860009005949974, 72: 0.975115860544153, 73: 0.34103695692671854, 74: 3.715667756746576, 75: 0.8245813402150265, 76: 1.0146261204408322, 77: 1.429071625166872, 78: 1.1575801036803262, 79: 0.8892865356335216, 80: 1.4682387127243648, 81: 0.2790711201452777, 82: 0.21458250943662763, 83: 1.626193381231688, 84: 0.7862776167644395, 85: 0.8063680366888433, 86: 2.1349518016852866, 87: 0.16790682625128348, 88: 2.6898324320852316, 89: 3.1017929388719687, 90: 2.2161796611039484, 91: 0.27323366047568587, 92: 0.9876405202465337, 93: 0.5878226010690092, 94: 0.975411448085179, 95: 0.7933992437453187, 96: 1.3443593604932238, 97: 1.5392784611233619, 98: 1.1729165101630914, 99: 0.7643250100538129}})

I create the following histograms

plot_posterior_test = ggplot(data=gd_sp_tmp) + \
                           geom_histogram(aes(x='value', y='stat(density)')) + \
                           facet_wrap('~variable')

I would like to add on this plot, the pdf of a lognormal distribution with scale = 0.8 and location = -0.5 in all graphs. Any ideas how could I do that using plotnine ?

quant
  • 4,062
  • 5
  • 29
  • 70

1 Answers1

1

Use stat_function. For example, given your prior code, try this

import scipy.stats as stats

(ggplot(data=gd_sp_tmp)
 + geom_histogram(aes(x='value'))
 + stat_function(fun=stats.lognorm.pdf, args=dict(s=.95, loc=0.8, scale=-0.5))
 + facet_wrap('~variable')
)

It is up to you to make sure that the parameters make sense, otherwise they will compute to NaN values

has2k1
  • 2,095
  • 18
  • 16
  • Thank you. It works that way. Now the problem is that the bars from the histogram are too high (and therefore the pdf is not very clear). Is it possible to make the `y-axis` as a percentage (so that it becomes lower), so that the shape of the pdf is more clear ? And if so, how ? – quant Oct 03 '18 at 09:07
  • @quant, For the histogram you should plot the density so that area under each plot is one. `geom_histogram(aes(x='value', y='stat(density)'))`. See the [documentation](https://plotnine.readthedocs.io/en/stable/generated/plotnine.stats.stat_bin.html) for more. – has2k1 Oct 03 '18 at 14:52
  • This is what I am plotting – quant Oct 03 '18 at 14:57