Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
0
votes
1 answer
MinMax Scaler in sklearn does not normalize values of column between 0 and 1
I'm working on KNN algorithm in python and tried to normalise my data frames with the MinMaxScaler to transform the data in a range between 0 to 1.
However when I return the output, I observe some column min / max the output exceeds 1. Am i using…

misctp asdas
- 973
- 4
- 13
- 35
0
votes
0 answers
sklearn.decomposition.KernelPCA
I have a following problem when i am running the below piece of code. (Fyi.. I have already installed scikit-learn using GIT Shell with the following command pip install -U scikit-learn)
import pandas as pd
import pandas.io.data as web
from…

abhi_phoenix
- 387
- 1
- 5
- 19
0
votes
1 answer
sklearn SGDClassifier fit() vs partial_fit()
I am confused about fit() and partial_fit() method of SGDClassifier. Documentation says for both, "Fit linear model with Stochastic Gradient Descent.".
What I know about stochastic gradient descent is, it takes one (or a fraction of whole) training…

AL AMIN
- 13
- 1
- 4
0
votes
1 answer
Using IMDB data for the sci-kit regression models package which has text values in feature variables
I have a csv file containing IMDB movie ratings data. The file has 27 features and 1 target variable. I have attached SampleData. And also the data set can be downloaded from KaggleData.
I have learnt that sklearn package of python requires all the…

aks_Nin
- 147
- 4
- 13
0
votes
2 answers
How to change the name of columns of a Pandas dataframe when it was saved with "pickle"?
I saved a Pandas DataFrame with "pickle". When I call it it looks like Figure A (that is alright). But when I want to change the name of the columns it looks like Figure B.
What am I doing wrong? What are the other ways to change the name of…

Aizzaac
- 3,146
- 8
- 29
- 61
0
votes
1 answer
CountVectorizer: transform method returns multidimensional array on a single text line
Firstly, I fit it on the corpus of sms:
from sklearn.feature_extraction.text import CountVectorizer
clf = CountVectorizer()
X_desc = clf.fit_transform(X).toarray()
Seems to works fine:
X.shape = (5574,)
X_desc.shape = (5574, 8713)
But then I…

Rocketq
- 5,423
- 23
- 75
- 126
0
votes
1 answer
unorderable types: dict() <= int() in running OneVsRest Classifier
I am running a multilabel classification on the input data with 330 features and about 800 records. I am leveraging RandomForestClassifier with following param_grid:
> param_grid = {"n_estimators": [20],
> "max_depth": [6],
> …

Abhi
- 1,153
- 1
- 23
- 38
0
votes
1 answer
ValueError: shapes (2,2) and (4,6) not aligned: 2 (dim 1) != 4 (dim 0)
Complaining about this line:
log_centers = pca.inverse_transform(centers)
Code:
# TODO: Apply your clustering algorithm of choice to the reduced data
clusterer = KMeans(n_clusters=2, random_state=0).fit(reduced_data)
# TODO: Predict the cluster…

user1072337
- 12,615
- 37
- 116
- 195
0
votes
3 answers
How to get a list of useless features using sklearn?
I have a dataset to build a classificator:
dataset = pd.read_csv(sys.argv[1], decimal=",",delimiter=";", encoding='cp1251')
X=dataset.ix[:, dataset.columns != 'class']
Y=dataset['class']
I want to select important features only, so I…

Polly
- 1,057
- 5
- 14
- 23
0
votes
1 answer
Elementwise operation on pandas series
I have a pandas Series x with values 1, 2 or 3.
I want it to have values monkey, gorilla, and tarzan depending on the values.
I guess I should do something like
values = ['monkey', 'gorilla', 'tarzan']
x = values[x - 1]
but it doesn't work. I…

Jamgreen
- 10,329
- 29
- 113
- 224
0
votes
0 answers
Unexpected StandardScaler fit_transform output
I am trying to scale a pandas Series with StandardScaler().fit_transform(). However, the output is always an array of zeros.
The input Series has a length of 201, when I do:
print values[:5]
I get a list of floats as below:
0 1943.0
1 …

user3391529
- 21
- 3
0
votes
1 answer
how to use gridSearch CV with scipy?
i have been trying to tune my SVM using Gridsearchcv but it is throwing errors.
my code is :
train = pd.read_csv('train_set.csv')
label = pd.read.csv('lebel.csv')
params = { 'C' : [ 0.01 , 0.1 , 1 , 10]
clf = GridSearchCV(SVC() , params , n_jobs =…

Anurag Pandey
- 373
- 2
- 5
- 21
0
votes
0 answers
pandas wrapper raise ValueError
I got the below error while trying to run my python script via pandas, when runing on a 30 millon records data , please advise what went wrong
Traceback (most recent call last): File "extractyooochoose2.py", line 32, in totalitems=[len(x) for x…

Trinadh Gupta
- 306
- 5
- 18
0
votes
1 answer
How to use my own classifier in ensemble python
The main aim is to add a deep learning classification method like CNN as an individual in ensemble in python.
The following code works fine:
clf1=CNN()
eclf1=VotingClassifier(estimators=[('lr', clf1)], voting='soft')
…

Amn Kh
- 531
- 3
- 7
- 19
0
votes
2 answers
Create Sparse Matrix in Python
Working with data and would like to create a sparse matrix to later be used for clustering purposes.
fileHandle = open('data', 'r')
for line in fileHandle:
json_list = []
fields = line.split('\t')
json_list.append(fields[0])
…

jKraut
- 2,325
- 6
- 35
- 48