Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
0
votes
1 answer

SVM: From The Scratch-Generate Model after training

How can I generate my model after training? I didn't use sklearn package for my fit and predict. My code looks like this: class SVM(object): def __init__(self, kernel=polynomial_kernel, C=None): self.kernel = kernel self.C = C if self.C…
0
votes
1 answer

In which order PCA components is printed? I need the parameters to solve pca formula. How do I know who the beta values are?

I'm using sklearn PCA technique. I need to solve: pca1 = beta1. c1 + beta2. c2 + beta3. c3 + beta4. c4 + beta5. c5 I read in the documentation that The components are sorted by explained_variance_. How do I know who the beta values are? d = {'c1':…
Thaise
  • 1,043
  • 3
  • 16
  • 28
0
votes
1 answer

error when using `dataframemapper` class from pickle

I am trying to save a DataFramMapper object to use on new data for a model. mapper = DataFrameMapper([ (['price', 'Argentina', 'Canada', 'Australia', 'barcat_numeric'], None), ('TTL',CountVectorizer( ngram_range=(1, 2))), …
eliavs
  • 2,306
  • 4
  • 23
  • 33
0
votes
1 answer

Unknown label type error while I'm trying to fit x_train and y_train to Perceptron and MLPClassifier using Sklearn

This is a snippet of my code, I can't add more for some reason but, per = Perceptron() per.fit(x_train,y_train) and this is the following error ValueError: Unknown label type: (array([0.055, 0.09 , 0.095, 0.1 , 0.105, 0.11 , 0.115, 0.12 , 0.125, …
0
votes
0 answers

set parameters for BayesianRidge

What is the difference between alpha and lambda in linear_model.BayesianRidge() of sklearn? I would like to estimate a linear regression y = w_0 + w_1 x_1 + w_2 x_2 + e with priors for w_0, w_1, w_2 to be normally distributed. w_0 = N(0, sigma0), w1…
zmicer
  • 1
  • 2
0
votes
1 answer

Sklearn: how to get mean squared error on classifying training data

I'm trying to do some classification problems using sklearn for the first time in Python, and was wondering what was the best way to go about calculating the error of my classifier (like a SVM) solely on the training data. My sample code for…
Joe J.
  • 119
  • 1
  • 7
0
votes
1 answer

How to tell Pandas/Scikit-Learn how one field impacts predictive model

I am trying to create/validate a predictive model using a fictitious dataset, using Phyton with sklearn, following this tutorial. The dataset contains information about baseball pitcher throws, and these are the most important fields: Result…
Irina
  • 1,333
  • 3
  • 17
  • 37
0
votes
1 answer

pandas return index of rows having more than one 'NA' value

my code: import pandas as pd from sklearn.preprocessing import LabelEncoder column_names =…
Pratik Kumar
  • 2,211
  • 1
  • 17
  • 41
0
votes
1 answer

When trying to perform GaussianNB on data get TypeError - python beginner

i'm trying to build a prediction model using GaussianNB. I have a csv file that looks like this: csv data My code looks like as follows: encoded_df = pd.read_csv('path to file') y = encoded_df.iloc[:,12] X = encoded_df.iloc[:,0:12] model =…
0
votes
0 answers

PermissionError when loading fetch_20newsgroups from sklear.dataset

from sklearn.datasets import fetch_20newsgroups data = fetch_20newsgroups() data.target_names PermissionError: [WinError 5] Access is denied: 'C:\Users\liu.h\scikit_learn_data\20news_home\20news-bydate-test\sci.crypt'
0
votes
0 answers

Replace column with rows pandas

How do I reshape pivot(using pandas): 0 1 \ trans -0.521058 -0.521058 serie -0.521816 -0.521816 recor -0.468133 -0.468133 to: trans serie recor 0 -0.521058 -0.521816 …
0
votes
0 answers

LabelEncoder in sklearn_pandas mapper with pipeline after cross_val_score returns error

I have a strange error, that I could not understand. I have a data: import numpy as np import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.cross_validation import cross_val_score from sklearn.pipeline import…
Shin
  • 251
  • 1
  • 3
  • 8
0
votes
2 answers

Error using KFold (from sklearn.model_selection import KFold)

I am getting an error while using from sklearn.model_selection import KFold in my jupyter notebook. The error says "No module named 'sklearn.model_selection'". When I printed print(sklearn.__version__) I got the version to be 0.17.1. Can anyone…
Khan
  • 81
  • 2
  • 7
0
votes
1 answer

Counting matrix pairs using a threshold

I have a folder with hundreds of txt files I need to analyse for similarity. Below is an example of a script I use to run similarity analysis. In the end I get an array or a matrix I can plot etc. I would like to see how many pairs there are with…
aviss
  • 2,179
  • 7
  • 29
  • 52
0
votes
1 answer

LabelEncoding to multiple columns in pandas

I'm currently working on Titanic dataset. It consists of 4-5 non numeric columns. I want to apply sklearn.LabelEncoder class to get encoded values for these non-numeric columns. I can, no doubt, apply this method one by one to each column. But the…