1

I have one column name as df['Air temperature'] (datatype-float64)

I want to convert this column into normal distribution so i can use imperical rule to find 95,99% CI. or any other approach is also fine to find 95%,995 of CI.

enter image description here

zi=df['Air_temperature'] 
from sklearn.preprocessing import MinMaxScaler
min_max=MinMaxScaler()
df_minmax=pd.DataFrame(min_max.fit_transform(zi))
df_minmax.head()

I tried this code but im getting [Expected 2D array, got 1D array instead: error ] even i applied Reshape operation still Im getting errors. please suggest me any approach to convert data into Normal distribution or stand normal distribution & find the CI

Akash Desai
  • 498
  • 5
  • 11
  • Does this answer your question? [Sklearn transform error: Expected 2D array, got 1D array instead](https://stackoverflow.com/questions/58498187/sklearn-transform-error-expected-2d-array-got-1d-array-instead) – G. Anderson Apr 08 '21 at 18:08
  • i want to convert that column into normal distribution – Akash Desai Apr 08 '21 at 18:12

1 Answers1

3

I would use something like This answer to fit a gaussian (normal dist.) curve to the data, then use the generated distribution with the scipy.stats method .interval(0.95) (here) to give the endpoints which contain 95% of the CDF.

example:

import pandas as pd
from scipy.stats import norm
import numpy as np
from matplotlib import pyplot as plt

normal = np.random.normal(size=1000)
noise = np.random.uniform(size=500, low=-2, high=2)
data = np.concatenate([normal, noise])   # some dummy data
# make it a DataFrame
df = pd.DataFrame(data=data, index=range(len(data)), columns=["data"])  
df.plot(kind="density")

########### YOU ARE HERE ###################

data = df.to_numpy()                              # Numpy arrays are easier for 1D data
mu, std = norm.fit(data)                          # Fit a normal distribution
print("Mu and Std: ", mu, std)

CI_95 = norm.interval(0.95, loc=mu, scale=std)    # Find the 95% CI endpoints
print("Confidence Interval: ", CI_95)

plt.vlines(CI_95, ymin=0, ymax=0.4)               # plotting stuff
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
plt.plot(x, norm.pdf(x, mu, sigma))
plt.show()

OUTPUT:

Mu and Std:  -0.014830093874393395 1.0238114937847707
Confidence Interval:  (-2.0214637486506972, 1.9918035609019102)

Plot

  • yes i tried it thanks. still im getting this TypeError: float() argument must be a string or a number, not 'AxesSubplot' – Akash Desai Apr 08 '21 at 18:17
  • Look at the example code, it sounds like youre passing a matplotlib object to float(), which it cant interpret as a number. – franklinscudder Apr 08 '21 at 18:59