Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
0
votes
1 answer
k means on structured data using python - more than one column
how does one do k means on multiple columns in structured data ?
In the example below its been done on 1 column (name)
tfidf_matrix = tfidf_vectorizer.fit_transform(df_new['name'])
here only name is used but say we wanted to use name and country,…

Naresh MG
- 633
- 2
- 11
- 19
0
votes
0 answers
Unravel in sklearn pipeline
I'm trying to create a simple pipeline to transform categorical data into one-hot vectors, unfortunately it fails because for some reason the data need to be ravel() beforehand.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import…

Charles Fried
- 57
- 2
- 9
0
votes
1 answer
Scipy in Jupiter
Can someone help me to figure out why i'm having this error code : ValueError: n_components must be < n_features; got 10 >= 0
import pandas as pd
from scipy.sparse import csr_matrix
users = pd.read_table(open('ml-1m/users.dat', encoding =…
user8571992
0
votes
1 answer
Feature selection from sklearn logisitc regression
I have created a binary classification model for a text using sklearn logistic regression model. Now I want to select the features used for model. My code looks like this-
train, val, y_train, y_test = train_test_split(np.arange(data.shape[0]), lab,…

Y0gesh Gupta
- 2,184
- 5
- 40
- 56
0
votes
0 answers
What is SpectralEmbedding in sklearn?
I am using Affinity Propogation to cluster my similarity matrixsims. My code is as follows. According to an answer of my previous question I am using SpectralEmbedding to plot my data points of the similarity matrix sims.
import…
user8566323
0
votes
1 answer
Need to perform Principal component analysis on a dataframe collection in python using numpy or sklearn
I am having a 'dataframe collection' df with data below. I am trying to perform Principal component analysis(PCA) on dataframe collection using sklearn. But i am getting Typeerror
from sklearn.decomposition import PCA
df # dataframe collection
pca…

Arvinth Kumar
- 964
- 3
- 15
- 32
0
votes
1 answer
Pandas Sklearn Pipeline - CV on DataMapper transforms?
I'm wondering how best to define parameters for datamapper transforms in a pipeline using pandas-sklearn.
Here is a reproducible example notebook using titanic data.
I'm setting it up as:
# use pandas sklearn to do some preprocessing
full_mapper =…

andrewm4894
- 1,451
- 4
- 17
- 37
0
votes
0 answers
KerasClassifier error with categorical data
I try to create neural network for categorical data in python (3.5).
I have a table with 47 independent variables (X), and table with 1 column of dependent variable (y). This variable is categorical and it is one of three possible options.
Because…

Marko Zadravec
- 8,298
- 10
- 55
- 97
0
votes
1 answer
Count vectorizer ValueError: Expected 2-dimensional array, got 1
I have an error message here:
ValueError: Expected 2-dimensional array, got 1
But it seems like my variables are all 2d already.
This is how my variables look like:
http://www.oldschool-samp.com/slike/?v=variables.png
#read preprocessed…
user8451312
0
votes
0 answers
Check certain data of confusion matrix
I have a confusion matrix:
I got it by running the scikit package for confusion matrices. I would now like to find out for example the 4 False Positives and the 2 True negatives and which concrete feature values they had to make a describtive…

inneb
- 1,060
- 1
- 9
- 20
0
votes
0 answers
testing and training sklearn
I am working on a Point Cloud Project. I have about 90 3D Point Cloud images. After all the preprocessing work on my data, I am using PCA for dimensionality reduction, and the output i get is in 2D (when I plot these principle components I get back…

Chaitanya
- 31
- 10
0
votes
0 answers
Python groupby imputing
I have a dataframe with 200 rows and 151 columns, with the output variable being of the type object.
I am trying to impute Null values in the input variables (150 columns) with the mean value of the column section grouped by output variable.
Is…

Hello_Boy
- 29
- 1
- 5
0
votes
1 answer
How do I load gigabytes of data from Google Cloud Storage into a pandas dataframe?
I am trying to load gigabytes of data from Google Cloud Storage or Google BigQuery into pandas dataframe so that I can attempt to run scikit's OneClassSVM and Isolation Forest (or any other unary or PU classification). So I tried pandas-gbq but…

Flair
- 2,609
- 1
- 29
- 41
0
votes
1 answer
How to convert Fantasy Premier League Data from JSON to CSV?
I am new to python and as per my thesis work I am trying to convert JSON to csv.I am able to download data in JSON but when I am writing it back using dictionaries it is not converting JSON to CSV with every column.
import pandas as pd
…

Anjana Aggarwal
- 1
- 3
0
votes
3 answers
How do I use use scikit LabelEncoder for new labels?
So my code like is:
>>> le = preprocessing.LabelEncoder()
>>> le.fit(train["capital city"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1])
>>>…

Flair
- 2,609
- 1
- 29
- 41