Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
5
votes
2 answers
XGBoost get classifier object form booster object?
I usually get to feature importance using
regr = XGBClassifier()
regr.fit(X, y)
regr.feature_importances_
where type(regr) is .
However, I have a pickled mXGBoost model, which when unpacked returns an object of type . This is the same object as if…

L Xandor
- 1,659
- 4
- 24
- 48
5
votes
1 answer
How to weigh data points with sklearn training algorithms
I am looking to train either a random forest or gradient boosting algorithm using sklearn. The data I have is structured in a way that it has a variable weight for each data point that corresponds to the amount of times that data point occurs in the…

Stephen Strosko
- 597
- 1
- 5
- 18
5
votes
3 answers
Making a string out of pandas DataFrame
I have pandas DataFrame which looks like this:
Name Number Description
car 5 red
And I need to make a string out of it which looks like this:
"""Name: car
Number: 5
Description: red"""
I'm a beginner and I really don't get how…

primadonna
- 142
- 4
- 12
5
votes
3 answers
Read multiple CSV files in Pandas in chunks
How to import and read multiple CSV in chunks when we have multiple csv files and total size of all csv is around 20gb?
I don't want to use Spark as i want to use a model in SkLearn so I want the solution in Pandas itself.
My code is:
allFiles =…

pythonNinja
- 453
- 5
- 13
5
votes
1 answer
How to use custom scoring function in sklearn cross_val_score
I want to use Adjusted Rsquare in the cross_val_score function. I tried with make_scorer function but it is not working.
from sklearn.cross_validation import train_test_split
X_tr, X_test, y_tr, y_test = train_test_split(X, Y, test_size=0.2,…

merkle
- 1,585
- 4
- 18
- 33
5
votes
2 answers
Cross-validation gives Negative R2?
I am partitioning 500 samples out a 10,000+ row dataset just for sake of simplicity. Please copy and paste X and y into your IDE.
X =
array([ -8.93, -0.17, 1.47, -6.13, -4.06, -2.22, -2.11, -0.25,
0.25, 0.49, 1.7 , -0.77, …

Chipmunkafy
- 566
- 2
- 5
- 17
5
votes
1 answer
python sklearn accuracy_score name not defined
x = df2.Tweet
y = df2.Class
from sklearn.cross_validation import train_test_split
SEED = 2000
x_train, x_validation_and_test, y_train, y_validation_and_test = train_test_split(x, y, test_size=.02, random_state=SEED)
x_validation, x_test,…

Shivam...
- 409
- 1
- 8
- 21
5
votes
3 answers
Install sklearn_pandas with conda via Windows command line
I'd like to install the sklearn_pandas library with conda via the Windows command line. The package is apparently "private" on the conda repository (admittedly this may well be why I cannot install it, but I prefer to ask for advice just in case…

ongenz
- 890
- 1
- 10
- 20
5
votes
1 answer
pd.get_dummies dataframe same size when Sparse = True as when Sparse = False
I have a dataframe with several string columns that I want to convert to categorical data so that I can run some models and extract important features from.
However, due to the amount of unique values, the one-hot encoded data expands into a large…

trystuff
- 686
- 1
- 8
- 18
5
votes
1 answer
How to Select Top 1000 words using TF-IDF Vector?
I have a Documents with 5000 reviews. I applied tf-idf on that document. Here sample_data contains 5000 reviews. I am applying tf-idf vectorizer on the sample_data with one gram range. Now I want to get the top 1000 words
from the sample_data which…

merkle
- 1,585
- 4
- 18
- 33
5
votes
2 answers
How to get feature importance in logistic regression using weights?
I have a dataset of reviews which has a class label of positive/negative. I am applying Logistic regression to that reviews dataset. Firstly, I am converting into Bag of words. Here sorted_data['Text'] is reviews and final_counts is a sparse…

merkle
- 1,585
- 4
- 18
- 33
5
votes
2 answers
How to normalize dataframe by standard deviation using scikit-learn?
Given the following dataframe and left-x column:
| | left-x | left-y | right-x | right-y |
|-------|--------|--------|---------|---------|
| frame | | | | |
| 0 | 149 | 181 | 170 | 175 |
| 1 …

JP Ventura
- 5,564
- 6
- 52
- 69
5
votes
1 answer
How to groupby and map by two columns pandas dataframe
i have a problem on python working with a pandas dataframe i'm trying to make a machine learning model predictin the surface . I have the surface column in the train dataframe and i don't have it in the test dataframe . So , i would to create some…

John Karimov
- 151
- 1
- 1
- 9
5
votes
4 answers
Getting Error on StandardScalar Fit_Transform
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
from sklearn.preprocessing import StandardScaler
sc_X =…

Vikas Kyatannawar
- 136
- 1
- 1
- 8
5
votes
1 answer
How to convert Countvectorized data back to text data in Python?
how can I convert count vectorized text data back to textual form. I have text data which I had made into sparse matrix using countvectorizer for classification. Now I want the sparse martix of text data to be converted back into text data.
My…

aeapen
- 871
- 1
- 14
- 28