0

I am trying to calculate the kurtosis and skewness over a data and I managaed to create table but for some reason teh result is only for few columns and not for the whole fields.

For example, as you cann see, I have many fields (columns): enter image description here

I calculate the skenwess and kurtosis using the next code:

sk=pd.DataFrame(data.skew())
kr=pd.DataFrame(data.kurtosis())
sk['kr']=kr


sk.rename(columns ={0: 'sk'}, inplace =True)

but then I get result that contains about half of the data I have:

enter image description here

I have tried to do head(10) but it doesn't change the fact that some columns dissapeard.

How can I calculte this for all the columns?

Reut
  • 1,555
  • 4
  • 23
  • 55
  • please make your data reproducible so it is easy fo the people to workaround your problem – Ashwini Nov 25 '19 at 14:13
  • Have you tried to fillna values? Please try `data.fillna(0, inplace=True)` and check if all your columns are numerical – FBruzzesi Nov 25 '19 at 14:38

1 Answers1

1

It is really hard to reproduce the error since you did not give the original data. Probably your dataframe contains non-numerical values in the missing columns which would result in this behavior.

 dat = {"1": {'lg1':0.12, 'lg2':0.23, 'lg3':0.34, 'lg4':0.45},
"2":{'lg1':0.12, 'lg2':0.23, 'lg3':0.34, 'lg4':0.45}, 
"3":{'lg1':0.12, 'lg2':0.23, 'lg3':0.34, 'lg4':0.45}, 
"4":{'lg1':0.12, 'lg2':0.23, 'lg3':0.34, 'lg4':0.45}, 
"5":{'lg1':0.12, 'lg2':0.23, 'lg3': 'po', 'lg4':0.45}}

 df = pd.DataFrame.from_dict(dat).T

 print(df)
    lg1   lg2   lg3   lg4
 1  0.12  0.23  0.34  0.45
 2  0.12  0.23  0.34  0.45
 3  0.12  0.23  0.34  0.45
 4  0.12  0.23  0.34  0.45
 5  0.12  0.23    po  0.45

 print(df.kurtosis())
 lg1    0
 lg2    0
 lg4    0

The solution would be to preprocess the data.

One word of advice would be to check for consistency in the error, i.e. are always the same lines missing?

bajah
  • 83
  • 5