Hello i am new to dusk Ml, i have been trying to use dask ml to train a logistic regression model to predict tweet sentiment. I have converted a pandas dataframe to a dask dataframe. After that i performed train test split. After that i used hashing vectorizer on X_train and X_test.
i executed the line Train_X_vect.compute().shape
to check the shape and it returned (180224, 7000)
where else y_train.compute().shape
returned (180224,)
Whenever I try, to fit them in a logistic regression model i get an error saying "cannot add intercept to array with unknown chunk"
this is my code:
from dask_ml.feature_extraction.text import HashingVectorizer
from dask_ml.model_selection import train_test_split
from dask_ml.linear_model import LogisticRegression
dask_df = dd.from_pandas(pandas_df,npartitions=4)
X_train, X_test, y_train, y_test = train_test_split(dask_df ["preprocess"], dask_df ["target"],random_state=42)
vectorizer = HashingVectorizer(n_features=7000)
vectorizer.fit(X_train)
Train_X_vect = vectorizer.transform(X_train)
Test_X_vect = vectorizer.transform(X_test)
lr = LogisticRegression()
lr.fit(Train_X_vect,y_train)
I also used "fit_intercept = False" but then i wuld get this error: "IndexError: Index dimension must be <= 2"
Please could you tell me what i am doing wrong, and how should I fix it? Thank you sir