Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
0
votes
0 answers

Repetition of raw dataset after clustering

from sklearn.feature_extraction.text import TfidfVectorizer tfidf_vectorizer = TfidfVectorizer(max_df=0.08, max_features=200, min_df=0.02, stop_words='english', use_idf=True,…
Jeet Dadhich
  • 71
  • 1
  • 1
  • 6
0
votes
1 answer

How can I organize data using Pandas?

I'm a newbie at Python. I'm trying to organize a CSV file into a readable grid. When I converted my Excel file to CSV, the output became garbled, a mess of commas and scattered values. I tried list, but it still didn't organize the data the way I…
dabberson567
  • 43
  • 2
  • 2
  • 11
0
votes
1 answer

Number of features of the model must match the input

For some reason the features of this dataset is being interpreted as rows, "Model n_features is 16 and input n_features is 18189" Where 18189 is the number of rows and 16 is the correct feature list. The suspect code is here: for var in cat_cols: …
AaronS
  • 23
  • 5
0
votes
1 answer

Performing PCA on a dataframe with Python with sklearn

I have a sample input file that has many rows of all variants, and columns represent the number of components. A01_01 A01_02 A01_03 A01_04 A01_05 A01_06 A01_07 A01_08 A01_09 A01_10 A01_11 A01_12 A01_13 A01_14 A01_15 A01_16 A01_17 …
user5927494
  • 129
  • 1
  • 10
0
votes
1 answer

Imputer with different types of values

Does the Imputer in sklearn can deal with different types of data? For example string and numbers are both represented as ?, when applying the Imputer it works with only one strategy.
shermanv
  • 13
  • 5
-1
votes
2 answers

Sklearn Random Forest: determine the name of features ascertained by parameter grid for model fit and prediction

New to ML here and trying my hands on fitting a model using Random Forest. Here is my simplified code: X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.15, random_state=42) model = RandomForestRegressor() …
-1
votes
1 answer

Error when trying to fit a dataset. (python)

I am trying to fit a sklearn linear regression model with many points from a pandas dataframe. this is the program: features =["floors", "waterfront", "lat", "bedrooms", "sqft_basement", "view", "bathrooms", "sqft_living15", "sqft_above", "grade",…
-1
votes
1 answer

ValueError: Input contains NaN, infinity or a value too large for dtype('float64') when using randomizedSearch

I am trying to use RandomizedSearchCV from sklearn on an MLPRegressor model, and I have scaled the data using standardScaler. The code for the model is presented below. When I try to run the code I get this error from the…
-1
votes
1 answer

Do we need to exclude OneHotEncoded columns while standardizing or normalizing using MinMaxScaler() or StandardScaler()?

This is the final cleaned DataFrame (df2) before Standardizing my code: scaler=StandardScaler() df2[list(df2.columns)]=scaler.fit_transform(df2[list(df2.columns)]) df2 This returns a DataFrame after Standardizing every column including dummies and…
-1
votes
1 answer

How to implement regularization

My task was to implement model parameter tuning using stochastic gradient descent. Below is my function implementation code. However, I would like to add any regularization. def gradient(X, y, w, batch, alpha): gradients = [] error =…
-1
votes
1 answer

Polynomial Features Error: X has 10 features, but PolynomialFeatures is expecting 9 features as input

Today i'm modeling a dataframe using PolinomialFeatures from sklearn but I keep encountering this error: ValueError: X has 10 features, but PolynomialFeatures is expecting 9 features as input. Coming from the line where I generate the new data frame…
-1
votes
1 answer

RuntimeWarning: invalid value encountered in divide in ML By Sklearn in Python

After I run my project these error shown and i don't know what am i doing? :\Users\Alir\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\utils\extmath.py:1047: RuntimeWarning: invalid value encountered in divide updated_mean =…
-1
votes
1 answer

What is an acceptable enough difference between the accuracy of the Train_set and Test_set?

I am working on a Data Science project which is a model to predict whether the imports are Fake or not. I have a training database on which one of my models is achieving up to 92-93% accuracy but on 51% of the test database, it is achieving only…
-1
votes
1 answer

Sklearn can't convert string to float

I'm using Sklearn as a machine learning tool, but every time I run my code, it gives this error: Traceback (most recent call last): File "C:\Users\FakeUserMadeUp\Desktop\Python\Machine Learning\MachineLearning.py", line 12, in
-1
votes
1 answer

Pandas groupby -- get output value based on max value of another column

I have the following dataframe: df = pd.DataFrame({'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'], 'Habitat':['Jungle', 'Jungle', 'Sky', 'Sky'], …
DumbCoder
  • 233
  • 2
  • 9