Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
8
votes
1 answer
Arrange bar chart in ascending / descending order
I have a random forest feature importance procedure. All the feature importance parameters have been generated for each variable. I have also plotted it on a horizontal bar graph.
Now I would like to sort the bars into ascending / descending order.…

Ombre
- 103
- 1
- 3
- 10
8
votes
4 answers
Standardization before or after categorical encoding?
I'm working on a regression algorithm, in this case k-NearestNeighbors to predict a certain price of a product.
So I have a Training set which has only one categorical feature with 4 possible values. I've dealt with it using a one-to-k categorical…

Franch
- 621
- 4
- 9
- 22
8
votes
2 answers
sklearn: Found input variables with inconsistent numbers of samples: [1, 99]
I'm trying to build a simple regression line with pandas in spyder.
After executing the following code, I got this error:
Found input variables with inconsistent numbers of samples: [1, 99]
the code:
import numpy as np
import pandas as pd
dataset…

sheldonzy
- 5,505
- 9
- 48
- 86
8
votes
2 answers
Sklearn LabelEncoder throws TypeError in sort
I am learning machine learning using Titanic dataset from Kaggle. I am using LabelEncoder of sklearn to transform text data to numeric labels. The following code works fine for "Sex" but not for "Embarked".
encoder =…

Bhavani Ravi
- 2,130
- 3
- 18
- 41
8
votes
3 answers
Combine Sklearn TFIDF with Additional Data
I am trying to prepare data for supervised learning. I have my Tfidf data, which was generated from a column in my dataframe called "merged"
vect = TfidfVectorizer(stop_words='english', use_idf=True, min_df=50, ngram_range=(1,2))
X =…

jrjames83
- 901
- 2
- 9
- 22
7
votes
2 answers
Consistent ColumnTransformer for intersecting lists of columns
I want to use sklearn.compose.ColumnTransformer consistently (not parallel, so, the second transformer should be executed only after the first) for intersecting lists of columns in this way:
log_transformer = p.FunctionTransformer(lambda x:…

konstantin_doncov
- 2,725
- 4
- 40
- 100
7
votes
1 answer
Difference between r2_score and scoring ='r2' in cross_val_score
I am trying to generate R square value from cross_validation.cross_val_score which is about 0.35 and then I applied the model into the same train dataset and used "r2_score" function to generate R square, which is about 0.87. I wonder I was given…

lionking19063
- 79
- 1
- 7
7
votes
2 answers
Is numerical encoding necessary for the target variable in classification?
I am using sklearn for text classification, all my features are numerical but my target variable labels are in text. I can understand the rationale behind encoding features to numerics but don't think this applies for the target variable?

Nanda kumar
- 81
- 1
- 1
- 3
7
votes
2 answers
How to map categorical data to category_encoders.OrdinalEncoder in python pandas dataframe
I'm trying to use category_encoders.OrdinalEncoder to map categories to integers in a pandas dataframe. But I'm getting the following error without any other helpful hints.
TypeError: 'NoneType' object is not iterable
Code runs fine without the…

jeffhale
- 3,759
- 7
- 40
- 56
7
votes
1 answer
pandas: Replicate / Broadcast single indexed DataFrame on MultiIndex DataFrame: HowTo and Memory Efficiency
Problem
ML data preparation for stock trading. I have 3-dim MultiIndex on a large DataFrame (maybe n=800000 x f=20). One index-dimension is date with about dt=1000 levels, the others identify m=800 different stocks (with 20 features each, individual…

ascripter
- 5,665
- 12
- 45
- 68
7
votes
1 answer
Access Rows by integers and Columns by labels Pandas
My data is like this:
[First row is headers]
Name,Email,Age
Sachith,ko@gmail.com,23
Sim,sm@gmail.com,234
Yoshi,yosi@hotmail.com,2345
sarla,sarla@gmail.com,234
I would like to access elements such that rows are specified as integers and columns by…

sir_osthara
- 154
- 2
- 9
7
votes
2 answers
How to run non-linear regression in python
i am having the following information(dataframe) in python
product baskets scaling_factor
12345 475 95.5
12345 108 57.7
12345 2 1.4
12345 38 21.9
12345 320 88.8
and I want to run the following non-linear regression…

Mukul
- 461
- 1
- 5
- 16
7
votes
1 answer
How to quickly calculate cosine similarity for large number of vectors in Python?
I have a set of 100 thousand vectors and I need to retrieve top-25 closest vector based on cosine similarity.
Scipy and Sklearn have implementations for computing cosine distance/similarity 2 vectors but I will need to compute the Cosine Sim for…

silent_dev
- 1,566
- 3
- 20
- 45
7
votes
3 answers
sklearn SVM fit() "ValueError: setting an array element with a sequence"
I am using sklearn to apply svm on my own set of images. The images are put in a data frame.
I pass to the fit function a numpy array that has 2D lists, these 2D lists represents images and the second input I pass to the function is the list of…

Perihan Gad
- 71
- 1
- 1
- 2
7
votes
1 answer
Using easy_install with sklearn-pandas
I am trying to install sklearn-pandas.
On my attempt:
easy_install sklearn-pandas
I get the result:
The package setup script has attempted to modify files on your system
that are not within the EasyInstall build area, and has been aborted.
This…

tumultous_rooster
- 12,150
- 32
- 92
- 149