0

I'm trying to find to what degree the chemical properties of a wine dataset influence the quality property of the dataset.

The error:

ValueError: y data is not in domain of logit link function. Expected domain: [0.0, 1.0], but found [3.0, 9.0]

The code:

import pandas as pd

from pygam import LogisticGAM

white_data = pd.read_csv("winequality-white.csv",sep=';');

X = white_data[[
    "fixed acidity","volatile acidity","citric acid","residual sugar","chlorides","free sulfur dioxide",
    "total sulfur dioxide","density","pH","sulphates","alcohol"
]]

print(X.describe)

y = pd.Series(white_data["quality"]);

print(white_quality.describe)

white_gam = LogisticGAM().fit(X, y)

The output of said code:

<bound method NDFrame.describe of       fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0               7.0              0.27         0.36            20.7      0.045   
1               6.3              0.30         0.34             1.6      0.049   
2               8.1              0.28         0.40             6.9      0.050   
3               7.2              0.23         0.32             8.5      0.058   
4               7.2              0.23         0.32             8.5      0.058   
...             ...               ...          ...             ...        ...   
4893            6.2              0.21         0.29             1.6      0.039   
4894            6.6              0.32         0.36             8.0      0.047   
4895            6.5              0.24         0.19             1.2      0.041   
4896            5.5              0.29         0.30             1.1      0.022   
4897            6.0              0.21         0.38             0.8      0.020   

      free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                    45.0                 170.0  1.00100  3.00       0.45   
1                    14.0                 132.0  0.99400  3.30       0.49   
2                    30.0                  97.0  0.99510  3.26       0.44   
3                    47.0                 186.0  0.99560  3.19       0.40   
4                    47.0                 186.0  0.99560  3.19       0.40   
...                   ...                   ...      ...   ...        ...   
4893                 24.0                  92.0  0.99114  3.27       0.50   
4894                 57.0                 168.0  0.99490  3.15       0.46   
4895                 30.0                 111.0  0.99254  2.99       0.46   
4896                 20.0                 110.0  0.98869  3.34       0.38   
4897                 22.0                  98.0  0.98941  3.26       0.32   

      alcohol  
0         8.8  
1         9.5  
2        10.1  
3         9.9  
4         9.9  
...       ...  
4893     11.2  
4894      9.6  
4895      9.4  
4896     12.8  
4897     11.8  

[4898 rows x 11 columns]>
<bound method NDFrame.describe of 0       6
1       6
2       6
3       6
4       6
       ..
4893    6
4894    5
4895    6
4896    7
4897    6
Name: quality, Length: 4898, dtype: int64>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-71-e1c5720823a6> in <module>
     16 print(white_quality.describe)
     17 
---> 18 white_gam = LogisticGAM().fit(X, y)

~/miniconda3/lib/python3.7/site-packages/pygam/pygam.py in fit(self, X, y, weights)
    893 
    894         # validate data
--> 895         y = check_y(y, self.link, self.distribution, verbose=self.verbose)
    896         X = check_X(X, verbose=self.verbose)
    897         check_X_y(X, y)

~/miniconda3/lib/python3.7/site-packages/pygam/utils.py in check_y(y, link, dist, min_samples, verbose)
    227                              .format(link, get_link_domain(link, dist),
    228                                      [float('%.2f'%np.min(y)),
--> 229                                       float('%.2f'%np.max(y))]))
    230     return y
    231 

ValueError: y data is not in domain of logit link function. Expected domain: [0.0, 1.0], but found [3.0, 9.0]

The files: (I'm using Jupyter Notebook but I don't think you'd need to): https://drive.google.com/drive/folders/1RAj2Gh6WfdzpwtgbMaFVuvBVIWwoTUW5?usp=sharing

Jonathan Woollett-light
  • 2,813
  • 5
  • 30
  • 58

1 Answers1

0

You probably want to use LinearGAM – LogisticGAM is for classification tasks.

Boris
  • 338
  • 2
  • 20