Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
0
votes
1 answer

Error in making train - test sets from iris data by sklearn.train_test_split()

I'm trying to use simple command: train_test_split on iris dataset and use svm for prediction but when I use "fit" as follows: dat_iris = datasets.load_iris() x1 = dat_iris.data[:,2] y1 = dat_iris.target x_train,y_train,x_test,y_test =…
user3111496
0
votes
1 answer

sklearn tfidfvectorizer: how to intersect a tfidf frame on a column?

In R, I can extract rows (documents) which contain a particular term, say 'toyota' by intersecting a document term matrix (dtm) with required column name like so: dtm <- DocumentTermMatrix(mycorpus, control = list(tokenize =…
Pradeep
  • 350
  • 3
  • 16
0
votes
0 answers

impose node using DecisionTreeClassifier

I'm using classifier tree to explore the mnist dataset. The data to create the tree are currently composed with the 26x26 pixs of each images. My idea is to compute the number of connexe part for each image and to add this result to the data. I…
razzi
  • 11
  • 1
0
votes
2 answers

increase accuracy of model in sklearn

The decision tree classification gives an accuracy of 0.52 but I want to increase the accuracy. How can I increase the accuracy by using any of the classification model available in sklearn. I have used knn, decision tree, and cross-validation but…
0
votes
1 answer

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0

import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt import seaborn as sn from sklearn import preprocessing, cross_validation, svm from sklearn.linear_model import LogisticRegression from…
0
votes
1 answer

my first deep network

this is my first deep neural network and I have a problem in this code while implementing it. beside the error the code is slow and it is another thing. Here is the code: import matplotlib.pyplot as plt import pandas as pd import numpy as np from…
Sayed Gouda
  • 605
  • 3
  • 9
  • 22
0
votes
1 answer

Logistic regression sklearn - train and apply model

I'm new to machine learning and trying Sklearn for the first time. I have two dataframes, one with data to train a logistic regression model (with 10-fold cross-validation) and another one to predict classes ('0,1') using that model. Here's my code…
0
votes
1 answer

Create new column in pandas dataframe based on if/elif/and functions

I have searched for my exact issue to no avail. These two threads Creating a new column based on if-elif-else condition and create new pandas dataframe column based on if-else condition with a lookup guided my code though my code fails to execute.…
wrangler
  • 3
  • 3
0
votes
1 answer

Python sklearn-pandas Transform Multiple Columns at the same time error

I am using python with pandas and sklearn and trying to use the new and very convenient sklearn-pandas. I have a big data frame and need to transform multiple columns in a similar way. I have multiple column names in the variable other the source…
thebeancounter
  • 4,261
  • 8
  • 61
  • 109
0
votes
0 answers

get_dummies not working properly in python

i dont know why but im getting this error ? GetDummies is removing one column for unknown reason. I want both 'train' and 'test' data to have same no of columns. data = pd.read_csv('data/trainData.csv') train , test = train_test_split(data ,…
RAM
  • 211
  • 1
  • 4
  • 14
0
votes
0 answers

pandas - non-aligned dataframes

I have two data frames: df_train Data types in the dataset: ['uint8', 'int64', 'float64'] Number of features: 233 Shape: (1457, 233) df_test Data types in the dataset: ['uint8', 'int64', 'float64'] Number of features: 216 Shape: (1447,…
0
votes
1 answer

python JupyterNotebook with pandas matrix()

Hi there this is my code: When I try to run this I get an error. df = pd.read_csv(file, sep='|', encoding='latin-1') arreglox = df[df.columns['id':'date_in':'date_out':'objetive':'comments']].as_matrix() arregloy =…
kenny
  • 1
0
votes
0 answers

memory error in python while using counvectorizer() for pandas dataframe

I am using below code to construct document term matrix in python. # Importing the libraries import pandas as pd import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize from nltk.stem.wordnet import…
0
votes
1 answer

How to to tie ngram frequency of a column back to the original data frame?

I have a pandas data frame that has account information and a reason for canceling. I have cleaned the data/lemmatized/removed my own stop words to come up with n grams and frequency. How do I add all of the ngrams back to the original data set so…
0
votes
1 answer

Machine leaning OneHotEncoding in Python

I am new to machine learning scikit-learn. I was going through the documentation and tried OneHotEncoder() with some sample data. Can someone please explain what is happening from encoder.feature_indices_ and how i get the output of …