Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
6
votes
1 answer
Python sklearn installation windows
When trying to install Python's sklearn package on Windows 10 using pip I am given an EnvironmentError that tells me there is no such file or directory of a specific file:
ERROR: Could not install packages due to an EnvironmentError: [Errno
2]…

Chuckster
- 69
- 1
- 4
6
votes
2 answers
Sklearn's SimpleImputer doesn't work in a pipeline?
I have a pandas dataframe that has some NaN values in a particular column:
1291 NaN
1841 NaN
2049 NaN
Name: some column, dtype: float64
And I have made the following pipeline in order to deal with it:
from sklearn.preprocessing import…

Marcel
- 223
- 1
- 3
- 5
6
votes
2 answers
IF else and for loop in one line
I need to apply if else condition and for loop in single line.I need to update both 'RL' and "RM" at a time and update other values as 'Others'.How to do it??.IS it possible??
train['MSZoning']=['RL' if x=='RL' else 'Others' for x in…

Anesh
- 453
- 3
- 7
- 15
6
votes
3 answers
Reverse Label Encoding giving error
I label encoded my categorical data into numerical data using label encoder
data['Resi'] = LabelEncoder().fit_transform(data['Resi'])
But I when I try to find how they are mapped internally…

NgBrandon
- 183
- 2
- 7
6
votes
1 answer
Jaccard Similarity for Texts in a pandas DataFrame
I want to measure the jaccard similarity between texts in a pandas DataFrame.
More precisely I have some groups of entities and there is some text for each entity over a period of time. I want to analyse the text similarity (in here the Jaccard…

alex_rieber
- 61
- 1
- 4
6
votes
1 answer
Removing rows with a duplicate column pandas dataframe (Python)
I have a csv, which I read using pandas and created a dataframe.
The dataframe looks like this:
description title
lorem ipsum A
ipsum lorem A
dolor sit amet C
amet sit dolor B
It has 1034 rows and 2 columns
Now I want to…

Vipul Mehra
- 141
- 1
- 2
- 8
6
votes
1 answer
LabelBinarizer for multiple columns in data frame
I have a csv file which has 25 columns some are numeric and some are categorical and some are like names of actors, directors. I want use regression models on this data. In order to do so I have to convert the categorical columns string types to…

aks_Nin
- 147
- 4
- 13
5
votes
3 answers
ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 455, 30), found shape=(None, 30)
Here is the little project of Cancer detection, and it has already has the dataset and colab code, but I get an error when I execute
model.fit(x_train, y_train, epochs=1000)
The error is:
ValueError: Input 0 of layer "sequential" is incompatible…

Weber Wang
- 51
- 1
- 1
- 5
5
votes
1 answer
Group by MinMaxScaler in pandas dataframe
I would like to apply minmax scaler to column X2 and X3 in dataframe df and add columns X2_Scale and X3_Scale for each month.
df = pd.DataFrame({
'Month': [1,1,1,1,1,1,2,2,2,2,2,2,2],
'X1': [12,10,100,55,65,60,35,25,10,15,30,40,50],
…

melik
- 1,268
- 3
- 21
- 42
5
votes
0 answers
sklearn Stacking Estimator passthrough skips preprocessing and passes original data
This issue has been discussed here but there has been no comments: https://github.com/scikit-learn/scikit-learn/issues/16473
I have some numerical features and categorical features in X. The categorical features were one hot encoded. So my pipeline…

Lim Kaizhuo
- 714
- 3
- 7
- 16
5
votes
3 answers
ModuleNotFoundError: No module named 'sklearn.externals.joblib'
I'm using Python 3, and trying to use joblib. I have the following I am trying to import:
import sklearn.externals as extjoblib
import joblib
I receive the error: ModuleNotFoundError: No module named 'sklearn.externals.joblib'
I try to use pip3…

greendaysbomb
- 364
- 2
- 6
- 23
5
votes
1 answer
Sklearn partial dependence plot returns ValueError: percentiles are too close to each other
I wanted to draw partial-dependency-plot of some of the input variable with the target value. Using sklearn, I trained a gradient boosting model and then with the obtained model, I ran sklearn.inspection.plot_partial_dependence. But, I get…

Afshin Oroojlooy
- 1,326
- 3
- 21
- 43
5
votes
1 answer
Min Max scaling for whole dataframe python
i am using from sklearn.preprocessing import MinMaxScaler
with following code and dataset:
df = pd.DataFrame({
"A" :…

Nightingale
- 133
- 1
- 7
5
votes
2 answers
cx_Freeze gives TypeError: expected str, bytes or os.PathLike object, not NoneType
I am trying to prepare an executable using cx_Freeze but get the following error :
Traceback (most recent call last):
File "setup.py", line 19, in
executables = executables
File…

Sam
- 161
- 1
- 9
5
votes
3 answers
scikit-learn transformer that bins data based on user supplied cut points
I am trying to include a transformer in a scikit-learn pipeline that will bin a continuous data column into 4 values based on my own supplied cut points. The current arguments to KBinsDiscretizer do not work mainly because the strategy argument only…

Steven M. Mortimer
- 1,618
- 14
- 36