Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
0
votes
1 answer

MinMax Scaler in sklearn does not normalize values of column between 0 and 1

I'm working on KNN algorithm in python and tried to normalise my data frames with the MinMaxScaler to transform the data in a range between 0 to 1. However when I return the output, I observe some column min / max the output exceeds 1. Am i using…
misctp asdas
  • 973
  • 4
  • 13
  • 35
0
votes
0 answers

sklearn.decomposition.KernelPCA

I have a following problem when i am running the below piece of code. (Fyi.. I have already installed scikit-learn using GIT Shell with the following command pip install -U scikit-learn) import pandas as pd import pandas.io.data as web from…
abhi_phoenix
  • 387
  • 1
  • 5
  • 19
0
votes
1 answer

sklearn SGDClassifier fit() vs partial_fit()

I am confused about fit() and partial_fit() method of SGDClassifier. Documentation says for both, "Fit linear model with Stochastic Gradient Descent.". What I know about stochastic gradient descent is, it takes one (or a fraction of whole) training…
0
votes
1 answer

Using IMDB data for the sci-kit regression models package which has text values in feature variables

I have a csv file containing IMDB movie ratings data. The file has 27 features and 1 target variable. I have attached SampleData. And also the data set can be downloaded from KaggleData. I have learnt that sklearn package of python requires all the…
0
votes
2 answers

How to change the name of columns of a Pandas dataframe when it was saved with "pickle"?

I saved a Pandas DataFrame with "pickle". When I call it it looks like Figure A (that is alright). But when I want to change the name of the columns it looks like Figure B. What am I doing wrong? What are the other ways to change the name of…
Aizzaac
  • 3,146
  • 8
  • 29
  • 61
0
votes
1 answer

CountVectorizer: transform method returns multidimensional array on a single text line

Firstly, I fit it on the corpus of sms: from sklearn.feature_extraction.text import CountVectorizer clf = CountVectorizer() X_desc = clf.fit_transform(X).toarray() Seems to works fine: X.shape = (5574,) X_desc.shape = (5574, 8713) But then I…
Rocketq
  • 5,423
  • 23
  • 75
  • 126
0
votes
1 answer

unorderable types: dict() <= int() in running OneVsRest Classifier

I am running a multilabel classification on the input data with 330 features and about 800 records. I am leveraging RandomForestClassifier with following param_grid: > param_grid = {"n_estimators": [20], > "max_depth": [6], > …
Abhi
  • 1,153
  • 1
  • 23
  • 38
0
votes
1 answer

ValueError: shapes (2,2) and (4,6) not aligned: 2 (dim 1) != 4 (dim 0)

Complaining about this line: log_centers = pca.inverse_transform(centers) Code: # TODO: Apply your clustering algorithm of choice to the reduced data clusterer = KMeans(n_clusters=2, random_state=0).fit(reduced_data) # TODO: Predict the cluster…
user1072337
  • 12,615
  • 37
  • 116
  • 195
0
votes
3 answers

How to get a list of useless features using sklearn?

I have a dataset to build a classificator: dataset = pd.read_csv(sys.argv[1], decimal=",",delimiter=";", encoding='cp1251') X=dataset.ix[:, dataset.columns != 'class'] Y=dataset['class'] I want to select important features only, so I…
Polly
  • 1,057
  • 5
  • 14
  • 23
0
votes
1 answer

Elementwise operation on pandas series

I have a pandas Series x with values 1, 2 or 3. I want it to have values monkey, gorilla, and tarzan depending on the values. I guess I should do something like values = ['monkey', 'gorilla', 'tarzan'] x = values[x - 1] but it doesn't work. I…
Jamgreen
  • 10,329
  • 29
  • 113
  • 224
0
votes
0 answers

Unexpected StandardScaler fit_transform output

I am trying to scale a pandas Series with StandardScaler().fit_transform(). However, the output is always an array of zeros. The input Series has a length of 201, when I do: print values[:5] I get a list of floats as below: 0 1943.0 1 …
0
votes
1 answer

how to use gridSearch CV with scipy?

i have been trying to tune my SVM using Gridsearchcv but it is throwing errors. my code is : train = pd.read_csv('train_set.csv') label = pd.read.csv('lebel.csv') params = { 'C' : [ 0.01 , 0.1 , 1 , 10] clf = GridSearchCV(SVC() , params , n_jobs =…
Anurag Pandey
  • 373
  • 2
  • 5
  • 21
0
votes
0 answers

pandas wrapper raise ValueError

I got the below error while trying to run my python script via pandas, when runing on a 30 millon records data , please advise what went wrong Traceback (most recent call last): File "extractyooochoose2.py", line 32, in totalitems=[len(x) for x…
Trinadh Gupta
  • 306
  • 5
  • 18
0
votes
1 answer

How to use my own classifier in ensemble python

The main aim is to add a deep learning classification method like CNN as an individual in ensemble in python. The following code works fine: clf1=CNN() eclf1=VotingClassifier(estimators=[('lr', clf1)], voting='soft') …
Amn Kh
  • 531
  • 3
  • 7
  • 19
0
votes
2 answers

Create Sparse Matrix in Python

Working with data and would like to create a sparse matrix to later be used for clustering purposes. fileHandle = open('data', 'r') for line in fileHandle: json_list = [] fields = line.split('\t') json_list.append(fields[0]) …
jKraut
  • 2,325
  • 6
  • 35
  • 48