Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
0
votes
1 answer

k means on structured data using python - more than one column

how does one do k means on multiple columns in structured data ? In the example below its been done on 1 column (name) tfidf_matrix = tfidf_vectorizer.fit_transform(df_new['name']) here only name is used but say we wanted to use name and country,…
Naresh MG
  • 633
  • 2
  • 11
  • 19
0
votes
0 answers

Unravel in sklearn pipeline

I'm trying to create a simple pipeline to transform categorical data into one-hot vectors, unfortunately it fails because for some reason the data need to be ravel() beforehand. from sklearn.pipeline import Pipeline from sklearn.preprocessing import…
Charles Fried
  • 57
  • 2
  • 9
0
votes
1 answer

Scipy in Jupiter

Can someone help me to figure out why i'm having this error code : ValueError: n_components must be < n_features; got 10 >= 0 import pandas as pd from scipy.sparse import csr_matrix users = pd.read_table(open('ml-1m/users.dat', encoding =…
user8571992
0
votes
1 answer

Feature selection from sklearn logisitc regression

I have created a binary classification model for a text using sklearn logistic regression model. Now I want to select the features used for model. My code looks like this- train, val, y_train, y_test = train_test_split(np.arange(data.shape[0]), lab,…
0
votes
0 answers

What is SpectralEmbedding in sklearn?

I am using Affinity Propogation to cluster my similarity matrixsims. My code is as follows. According to an answer of my previous question I am using SpectralEmbedding to plot my data points of the similarity matrix sims. import…
user8566323
0
votes
1 answer

Need to perform Principal component analysis on a dataframe collection in python using numpy or sklearn

I am having a 'dataframe collection' df with data below. I am trying to perform Principal component analysis(PCA) on dataframe collection using sklearn. But i am getting Typeerror from sklearn.decomposition import PCA df # dataframe collection pca…
Arvinth Kumar
  • 964
  • 3
  • 15
  • 32
0
votes
1 answer

Pandas Sklearn Pipeline - CV on DataMapper transforms?

I'm wondering how best to define parameters for datamapper transforms in a pipeline using pandas-sklearn. Here is a reproducible example notebook using titanic data. I'm setting it up as: # use pandas sklearn to do some preprocessing full_mapper =…
andrewm4894
  • 1,451
  • 4
  • 17
  • 37
0
votes
0 answers

KerasClassifier error with categorical data

I try to create neural network for categorical data in python (3.5). I have a table with 47 independent variables (X), and table with 1 column of dependent variable (y). This variable is categorical and it is one of three possible options. Because…
Marko Zadravec
  • 8,298
  • 10
  • 55
  • 97
0
votes
1 answer

Count vectorizer ValueError: Expected 2-dimensional array, got 1

I have an error message here: ValueError: Expected 2-dimensional array, got 1 But it seems like my variables are all 2d already. This is how my variables look like: http://www.oldschool-samp.com/slike/?v=variables.png #read preprocessed…
user8451312
0
votes
0 answers

Check certain data of confusion matrix

I have a confusion matrix: I got it by running the scikit package for confusion matrices. I would now like to find out for example the 4 False Positives and the 2 True negatives and which concrete feature values they had to make a describtive…
inneb
  • 1,060
  • 1
  • 9
  • 20
0
votes
0 answers

testing and training sklearn

I am working on a Point Cloud Project. I have about 90 3D Point Cloud images. After all the preprocessing work on my data, I am using PCA for dimensionality reduction, and the output i get is in 2D (when I plot these principle components I get back…
Chaitanya
  • 31
  • 10
0
votes
0 answers

Python groupby imputing

I have a dataframe with 200 rows and 151 columns, with the output variable being of the type object. I am trying to impute Null values in the input variables (150 columns) with the mean value of the column section grouped by output variable. Is…
Hello_Boy
  • 29
  • 1
  • 5
0
votes
1 answer

How do I load gigabytes of data from Google Cloud Storage into a pandas dataframe?

I am trying to load gigabytes of data from Google Cloud Storage or Google BigQuery into pandas dataframe so that I can attempt to run scikit's OneClassSVM and Isolation Forest (or any other unary or PU classification). So I tried pandas-gbq but…
Flair
  • 2,609
  • 1
  • 29
  • 41
0
votes
1 answer

How to convert Fantasy Premier League Data from JSON to CSV?

I am new to python and as per my thesis work I am trying to convert JSON to csv.I am able to download data in JSON but when I am writing it back using dictionaries it is not converting JSON to CSV with every column. import pandas as pd …
0
votes
3 answers

How do I use use scikit LabelEncoder for new labels?

So my code like is: >>> le = preprocessing.LabelEncoder() >>> le.fit(train["capital city"]) LabelEncoder() >>> list(le.classes_) ['amsterdam', 'paris', 'tokyo'] >>> le.transform(["tokyo", "tokyo", "paris"]) array([2, 2, 1]) >>>…
Flair
  • 2,609
  • 1
  • 29
  • 41