Understanding the relationship between lognormal and normal distribution using scipy.stats and numpy

Question

I'd be very grateful if someone could help me understand where I'm going wrong. I have some data describing probability distributions. The data provides me with values for P10, P50 and P90. I also know that the distribution is lognormal.

I've read that, for a random variable X that is log normally distributed, then Y = ln(X) has a normal distribution - e.g. wikipedia (https://en.wikipedia.org/wiki/Log-normal_distribution).

However, when I try to understand this using scipystats and numpy, I cannot get it to be true. Since I know it is true and I know there are no issues with the simple functions I'm using in these python libraries I know that there is a gap in my understanding somewhere. I just, for the life of me, cannot see what I'm missing...

The code I'm using is:

    # build a lognormal distribution with scipystats (ss):

    # set parameters (based on the standard normal distribution mu=0 and sigma=1:
    s, mu, sd, size = 0.5,0,1,100000

    # save the distribution:
    X = ss.lognorm.rvs(s,loc=mu,scale=sd,size=size)

    # convert to normal distribution (i.e. calc the natural log of X):
    Y = np.log(X)

    # Check if Y is normal using ratio between p90-p50 and p50-p10 - should be 1:
    p10,p50,p90 = np.percentile(Y,[10,50,90])
    (p90-p50)/(p50-p10)

The above returns 0.9932 - or something else pretty close to 1. So far so good. I can vary s and scale as much as I like (or have tried so far) and the normal test always comes close to 1. The problem comes if I vary mean (mu, loc):

    # build a lognormal distribution with scipystats (ss):

    # set parameters (normal distribution mu=100 and sigma=10:
    s, mu, sd, size = 0.5,100,10,100000

    # save the distribution:
    X = ss.lognorm.rvs(s,loc=mu,scale=sd,size=size)

    # convert to normal distribution (i.e. calc the natural log of X):
    Y = np.log(X)

    # Check if Y is normal using ratio between p90-p50 and p50-p10 - should be 1:
    p10,p50,p90 = np.percentile(Y,[10,50,90])
    (p90-p50)/(p50-p10)

In this instance the answer I get is around 1.8 - i.e. not a normal distribution. Like I say, I'm clearly misunderstanding something, but i can't see what it is.

In summary, if I use ss.lognorm.rvs to calculate a series of log normally distributed random variables with loc of anything other than 0, and then use np.log to get the natural log of the random variables, then this new distribution is not normally distributed which, on the surface, appears to violate the rule described at the top of the wikipedia article linked at the top of this question!

I'm very grateful for any help anyone can give me - I just want to be confident that I understand how to relate the lognormal data to a normal curve!

score 0 · Answer 1 · answered May 04 '20 at 12:40

Have a look at these methods to check how things work in scipy.stats:

In [95]: ss.lognorm(s=0.1).mean()                                                                                                                                                                                  
Out[95]: 1.005012520859401

In [96]: np.exp(0.1**2 / 2)                                                                                                                                                                                        
Out[96]: 1.005012520859401

In [97]: ss.lognorm(s=0.1).var()                                                                                                                                                                                   
Out[97]: 0.010151172942587642

In [98]: (np.exp(0.1**2) - 1) * np.exp(0.1 **2)                                                                                                                                                                    
Out[98]: 0.010151172942587642

I find scipy.stats conventions a bit confusing and have to go through each time.

Understanding the relationship between lognormal and normal distribution using scipy.stats and numpy

1 Answers1