Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
6
votes
1 answer

Python sklearn installation windows

When trying to install Python's sklearn package on Windows 10 using pip I am given an EnvironmentError that tells me there is no such file or directory of a specific file: ERROR: Could not install packages due to an EnvironmentError: [Errno 2]…
Chuckster
  • 69
  • 1
  • 4
6
votes
2 answers

Sklearn's SimpleImputer doesn't work in a pipeline?

I have a pandas dataframe that has some NaN values in a particular column: 1291 NaN 1841 NaN 2049 NaN Name: some column, dtype: float64 And I have made the following pipeline in order to deal with it: from sklearn.preprocessing import…
Marcel
  • 223
  • 1
  • 3
  • 5
6
votes
2 answers

IF else and for loop in one line

I need to apply if else condition and for loop in single line.I need to update both 'RL' and "RM" at a time and update other values as 'Others'.How to do it??.IS it possible?? train['MSZoning']=['RL' if x=='RL' else 'Others' for x in…
Anesh
  • 453
  • 3
  • 7
  • 15
6
votes
3 answers

Reverse Label Encoding giving error

I label encoded my categorical data into numerical data using label encoder data['Resi'] = LabelEncoder().fit_transform(data['Resi']) But I when I try to find how they are mapped internally…
NgBrandon
  • 183
  • 2
  • 7
6
votes
1 answer

Jaccard Similarity for Texts in a pandas DataFrame

I want to measure the jaccard similarity between texts in a pandas DataFrame. More precisely I have some groups of entities and there is some text for each entity over a period of time. I want to analyse the text similarity (in here the Jaccard…
6
votes
1 answer

Removing rows with a duplicate column pandas dataframe (Python)

I have a csv, which I read using pandas and created a dataframe. The dataframe looks like this: description title lorem ipsum A ipsum lorem A dolor sit amet C amet sit dolor B It has 1034 rows and 2 columns Now I want to…
Vipul Mehra
  • 141
  • 1
  • 2
  • 8
6
votes
1 answer

LabelBinarizer for multiple columns in data frame

I have a csv file which has 25 columns some are numeric and some are categorical and some are like names of actors, directors. I want use regression models on this data. In order to do so I have to convert the categorical columns string types to…
aks_Nin
  • 147
  • 4
  • 13
5
votes
3 answers

ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 455, 30), found shape=(None, 30)

Here is the little project of Cancer detection, and it has already has the dataset and colab code, but I get an error when I execute model.fit(x_train, y_train, epochs=1000) The error is: ValueError: Input 0 of layer "sequential" is incompatible…
Weber Wang
  • 51
  • 1
  • 1
  • 5
5
votes
1 answer

Group by MinMaxScaler in pandas dataframe

I would like to apply minmax scaler to column X2 and X3 in dataframe df and add columns X2_Scale and X3_Scale for each month. df = pd.DataFrame({ 'Month': [1,1,1,1,1,1,2,2,2,2,2,2,2], 'X1': [12,10,100,55,65,60,35,25,10,15,30,40,50], …
melik
  • 1,268
  • 3
  • 21
  • 42
5
votes
0 answers

sklearn Stacking Estimator passthrough skips preprocessing and passes original data

This issue has been discussed here but there has been no comments: https://github.com/scikit-learn/scikit-learn/issues/16473 I have some numerical features and categorical features in X. The categorical features were one hot encoded. So my pipeline…
Lim Kaizhuo
  • 714
  • 3
  • 7
  • 16
5
votes
3 answers

ModuleNotFoundError: No module named 'sklearn.externals.joblib'

I'm using Python 3, and trying to use joblib. I have the following I am trying to import: import sklearn.externals as extjoblib import joblib I receive the error: ModuleNotFoundError: No module named 'sklearn.externals.joblib' I try to use pip3…
greendaysbomb
  • 364
  • 2
  • 6
  • 23
5
votes
1 answer

Sklearn partial dependence plot returns ValueError: percentiles are too close to each other

I wanted to draw partial-dependency-plot of some of the input variable with the target value. Using sklearn, I trained a gradient boosting model and then with the obtained model, I ran sklearn.inspection.plot_partial_dependence. But, I get…
Afshin Oroojlooy
  • 1,326
  • 3
  • 21
  • 43
5
votes
1 answer

Min Max scaling for whole dataframe python

i am using from sklearn.preprocessing import MinMaxScaler with following code and dataset: df = pd.DataFrame({ "A" :…
Nightingale
  • 133
  • 1
  • 7
5
votes
2 answers

cx_Freeze gives TypeError: expected str, bytes or os.PathLike object, not NoneType

I am trying to prepare an executable using cx_Freeze but get the following error : Traceback (most recent call last): File "setup.py", line 19, in executables = executables File…
Sam
  • 161
  • 1
  • 9
5
votes
3 answers

scikit-learn transformer that bins data based on user supplied cut points

I am trying to include a transformer in a scikit-learn pipeline that will bin a continuous data column into 4 values based on my own supplied cut points. The current arguments to KBinsDiscretizer do not work mainly because the strategy argument only…
Steven M. Mortimer
  • 1,618
  • 14
  • 36