Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
0
votes
1 answer
Error in making train - test sets from iris data by sklearn.train_test_split()
I'm trying to use simple command: train_test_split on iris dataset and use svm for prediction but when I use "fit" as follows:
dat_iris = datasets.load_iris()
x1 = dat_iris.data[:,2]
y1 = dat_iris.target
x_train,y_train,x_test,y_test =…
user3111496
0
votes
1 answer
sklearn tfidfvectorizer: how to intersect a tfidf frame on a column?
In R, I can extract rows (documents) which contain a particular term, say 'toyota' by intersecting a document term matrix (dtm) with required column name like so:
dtm <- DocumentTermMatrix(mycorpus, control = list(tokenize =…

Pradeep
- 350
- 3
- 16
0
votes
0 answers
impose node using DecisionTreeClassifier
I'm using classifier tree to explore the mnist dataset.
The data to create the tree are currently composed with the 26x26 pixs of each images.
My idea is to compute the number of connexe part for each image and to add this result to the data. I…

razzi
- 11
- 1
0
votes
2 answers
increase accuracy of model in sklearn
The decision tree classification gives an accuracy of 0.52 but I want to increase the accuracy. How can I increase the accuracy by using any of the classification model available in sklearn.
I have used knn, decision tree, and cross-validation but…

bibek
- 167
- 1
- 4
- 12
0
votes
1 answer
ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sn
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LogisticRegression
from…

Osro_db40
- 23
- 9
0
votes
1 answer
my first deep network
this is my first deep neural network and I have a problem in this code
while implementing it.
beside the error the code is slow and it is another thing.
Here is the code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from…

Sayed Gouda
- 605
- 3
- 9
- 22
0
votes
1 answer
Logistic regression sklearn - train and apply model
I'm new to machine learning and trying Sklearn for the first time. I have two dataframes, one with data to train a logistic regression model (with 10-fold cross-validation) and another one to predict classes ('0,1') using that model.
Here's my code…

Marcos Santana
- 911
- 5
- 12
- 21
0
votes
1 answer
Create new column in pandas dataframe based on if/elif/and functions
I have searched for my exact issue to no avail. These two threads Creating a new column based on if-elif-else condition and
create new pandas dataframe column based on if-else condition with a lookup guided my code though my code fails to execute.…

wrangler
- 3
- 3
0
votes
1 answer
Python sklearn-pandas Transform Multiple Columns at the same time error
I am using python with pandas and sklearn and trying to use the new and very convenient sklearn-pandas.
I have a big data frame and need to transform multiple columns in a similar way.
I have multiple column names in the variable other
the source…

thebeancounter
- 4,261
- 8
- 61
- 109
0
votes
0 answers
get_dummies not working properly in python
i dont know why but im getting this error ? GetDummies is removing one column for unknown reason. I want both 'train' and 'test' data to have same no of columns.
data = pd.read_csv('data/trainData.csv')
train , test = train_test_split(data ,…

RAM
- 211
- 1
- 4
- 14
0
votes
0 answers
pandas - non-aligned dataframes
I have two data frames:
df_train
Data types in the dataset: ['uint8', 'int64', 'float64']
Number of features: 233
Shape: (1457, 233)
df_test
Data types in the dataset: ['uint8', 'int64', 'float64']
Number of features: 216
Shape: (1447,…

P. Prunesquallor
- 561
- 1
- 10
- 26
0
votes
1 answer
python JupyterNotebook with pandas matrix()
Hi there this is my code:
When I try to run this I get an error.
df = pd.read_csv(file, sep='|', encoding='latin-1')
arreglox = df[df.columns['id':'date_in':'date_out':'objetive':'comments']].as_matrix()
arregloy =…

kenny
- 1
0
votes
0 answers
memory error in python while using counvectorizer() for pandas dataframe
I am using below code to construct document term matrix in python.
# Importing the libraries
import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import…

Ranjana Girish
- 473
- 7
- 17
0
votes
1 answer
How to to tie ngram frequency of a column back to the original data frame?
I have a pandas data frame that has account information and a reason for canceling. I have cleaned the data/lemmatized/removed my own stop words to come up with n grams and frequency. How do I add all of the ngrams back to the original data set so…

mpmartin618
- 21
- 6
0
votes
1 answer
Machine leaning OneHotEncoding in Python
I am new to machine learning scikit-learn. I was going through the documentation and tried OneHotEncoder() with some sample data. Can someone please explain what is happening from encoder.feature_indices_ and how i get the output of …

Sumi
- 157
- 1
- 3
- 15