0

I am running the linear regression, however, i am having an error, which i can't fix. Please help me with this error. Thank you so much

import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model

data = pd.read_csv(r"C:\Users\quynh.tranngoc\Desktop\B32_DownloadTable_20230508_100440.csv")
print(data)

data.shape

data.plot(kind= 'scatter',x = 'total_order', y ='top_up_by_seller')
plt.show()

data.plot(kind='box')
plt.show()

data.corr()

order = pd.DataFrame(data['total_order'])
seller_top_up = pd.DataFrame(['top_up_by_seller'])
print(order)

lm = linear_model.LinearRegression()
model =lm.fit(order, seller_top_up)

The error I am having is

File ~\Desktop\untitled0.py:30 in <module>
    model =lm.fit(order, seller_top_up)

  File ~\Anaconda3\lib\site-packages\sklearn\linear_model\_base.py:662 in fit
    X, y = self._validate_data(

  File ~\Anaconda3\lib\site-packages\sklearn\base.py:581 in _validate_data
    X, y = check_X_y(X, y, **check_params)

  File ~\Anaconda3\lib\site-packages\sklearn\utils\validation.py:979 in check_X_y
    y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric)

  File ~\Anaconda3\lib\site-packages\sklearn\utils\validation.py:997 in _check_y
    y = y.astype(np.float64)

ValueError: could not convert string to float: 'top_up_by_seller'

I wish i could fix my error

  • Hey there. The error seems to suggest that your target (`'top_up_by_seller'`) has a `string` type and cannot be converted to `float` which is the only type of object that a `LinearRegression` can predict. What is it that you are trying to predict? Are you sure that linear regression is the right approach? – Binpord May 12 '23 at 08:47
  • yes, i'm pretty sure. I also check the raw input data if there's a number on it or not, but after a few time checking, the number is still there but the machine seems unable to be read – ngọc quỳnh trần May 12 '23 at 08:51
  • Oh, sorry, my bad. The error here is that as a target you are trying to specify `pd.DataFrame(['top_up_by_seller'])` which is a dataframe with a single value of `'top_up_by_seller'`. I guess what you were aiming for is the `pd.DataFrame(data['top_up_by_seller'])`. If I may also suggest you get rid of the `pd.DataFrame` in there and just use `seller_top_up = data['top_up_by_seller']` and the same for the `order` since I am pretty sure that those will yield similar result. – Binpord May 12 '23 at 19:50
  • thank u so much!! The problem's been solved. As i need to astype the top_up_by_seller into float, so that my code can run. Thank you so much for the help – ngọc quỳnh trần May 13 '23 at 17:28

0 Answers0