Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
8
votes
1 answer

Arrange bar chart in ascending / descending order

I have a random forest feature importance procedure. All the feature importance parameters have been generated for each variable. I have also plotted it on a horizontal bar graph. Now I would like to sort the bars into ascending / descending order.…
Ombre
  • 103
  • 1
  • 3
  • 10
8
votes
4 answers

Standardization before or after categorical encoding?

I'm working on a regression algorithm, in this case k-NearestNeighbors to predict a certain price of a product. So I have a Training set which has only one categorical feature with 4 possible values. I've dealt with it using a one-to-k categorical…
8
votes
2 answers

sklearn: Found input variables with inconsistent numbers of samples: [1, 99]

I'm trying to build a simple regression line with pandas in spyder. After executing the following code, I got this error: Found input variables with inconsistent numbers of samples: [1, 99] the code: import numpy as np import pandas as pd dataset…
sheldonzy
  • 5,505
  • 9
  • 48
  • 86
8
votes
2 answers

Sklearn LabelEncoder throws TypeError in sort

I am learning machine learning using Titanic dataset from Kaggle. I am using LabelEncoder of sklearn to transform text data to numeric labels. The following code works fine for "Sex" but not for "Embarked". encoder =…
Bhavani Ravi
  • 2,130
  • 3
  • 18
  • 41
8
votes
3 answers

Combine Sklearn TFIDF with Additional Data

I am trying to prepare data for supervised learning. I have my Tfidf data, which was generated from a column in my dataframe called "merged" vect = TfidfVectorizer(stop_words='english', use_idf=True, min_df=50, ngram_range=(1,2)) X =…
jrjames83
  • 901
  • 2
  • 9
  • 22
7
votes
2 answers

Consistent ColumnTransformer for intersecting lists of columns

I want to use sklearn.compose.ColumnTransformer consistently (not parallel, so, the second transformer should be executed only after the first) for intersecting lists of columns in this way: log_transformer = p.FunctionTransformer(lambda x:…
konstantin_doncov
  • 2,725
  • 4
  • 40
  • 100
7
votes
1 answer

Difference between r2_score and scoring ='r2' in cross_val_score

I am trying to generate R square value from cross_validation.cross_val_score which is about 0.35 and then I applied the model into the same train dataset and used "r2_score" function to generate R square, which is about 0.87. I wonder I was given…
lionking19063
  • 79
  • 1
  • 7
7
votes
2 answers

Is numerical encoding necessary for the target variable in classification?

I am using sklearn for text classification, all my features are numerical but my target variable labels are in text. I can understand the rationale behind encoding features to numerics but don't think this applies for the target variable?
Nanda kumar
  • 81
  • 1
  • 1
  • 3
7
votes
2 answers

How to map categorical data to category_encoders.OrdinalEncoder in python pandas dataframe

I'm trying to use category_encoders.OrdinalEncoder to map categories to integers in a pandas dataframe. But I'm getting the following error without any other helpful hints. TypeError: 'NoneType' object is not iterable Code runs fine without the…
jeffhale
  • 3,759
  • 7
  • 40
  • 56
7
votes
1 answer

pandas: Replicate / Broadcast single indexed DataFrame on MultiIndex DataFrame: HowTo and Memory Efficiency

Problem ML data preparation for stock trading. I have 3-dim MultiIndex on a large DataFrame (maybe n=800000 x f=20). One index-dimension is date with about dt=1000 levels, the others identify m=800 different stocks (with 20 features each, individual…
ascripter
  • 5,665
  • 12
  • 45
  • 68
7
votes
1 answer

Access Rows by integers and Columns by labels Pandas

My data is like this: [First row is headers] Name,Email,Age Sachith,ko@gmail.com,23 Sim,sm@gmail.com,234 Yoshi,yosi@hotmail.com,2345 sarla,sarla@gmail.com,234 I would like to access elements such that rows are specified as integers and columns by…
sir_osthara
  • 154
  • 2
  • 9
7
votes
2 answers

How to run non-linear regression in python

i am having the following information(dataframe) in python product baskets scaling_factor 12345 475 95.5 12345 108 57.7 12345 2 1.4 12345 38 21.9 12345 320 88.8 and I want to run the following non-linear regression…
Mukul
  • 461
  • 1
  • 5
  • 16
7
votes
1 answer

How to quickly calculate cosine similarity for large number of vectors in Python?

I have a set of 100 thousand vectors and I need to retrieve top-25 closest vector based on cosine similarity. Scipy and Sklearn have implementations for computing cosine distance/similarity 2 vectors but I will need to compute the Cosine Sim for…
silent_dev
  • 1,566
  • 3
  • 20
  • 45
7
votes
3 answers

sklearn SVM fit() "ValueError: setting an array element with a sequence"

I am using sklearn to apply svm on my own set of images. The images are put in a data frame. I pass to the fit function a numpy array that has 2D lists, these 2D lists represents images and the second input I pass to the function is the list of…
7
votes
1 answer

Using easy_install with sklearn-pandas

I am trying to install sklearn-pandas. On my attempt: easy_install sklearn-pandas I get the result: The package setup script has attempted to modify files on your system that are not within the EasyInstall build area, and has been aborted. This…
tumultous_rooster
  • 12,150
  • 32
  • 92
  • 149