Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
-1
votes
1 answer
Evaluating logistic regression using cross validation and ROC
I am trying to evaluate logistic regression using the AUROC curve and and cross-validate my scores. When I don't cross-validate I have no issues, but I really want to use cross validation to help decrease bias in my method.
Anyway, below is the code…

jmoore00
- 277
- 1
- 3
- 10
-1
votes
1 answer
read word from each row in a dataframe
I want to read the word "risk" from every row in dataframe. If a row have the word risk in it then the dataframe should make a new column which will put 1 in it else 0. How can I achieve this ?

chetan parmar
- 73
- 1
- 7
-1
votes
1 answer
Function does not finish executing in `hist` function only on second time
In Python DataFrame, Im trying to generate histogram, it gets generated the first time when the function is called. However, when the create_histogram function is called second time it gets stuck at h = df.hist(bins=3, column="amount"). When I say…

Temp O'rary
- 5,366
- 13
- 49
- 109
-1
votes
1 answer
Python 3 Cosine Nearest Neighbor Format
I am working on some data mining self-learning from a free online resource I found. Basically I got a csv file with a bunch of names, movie titles, and what each person rated it. I'm trying to get the K-Nearest Neighbor from it using a cosine metric…

Bucketman86
- 9
- 2
-1
votes
2 answers
encoder gives value error when I call function on the data frame
I am trying to onehotencode one column of my data frame and the remaining columns are label encoded. I am using the code as below:
def OneHotEncoder(repair,field):
oe=preprocessing.OneHotEncoder()
oe.fit(repair[field])
…

sayo
- 207
- 4
- 18
-1
votes
1 answer
word count in graphlab vs sklearn
Is there any function in pandas or sklearn like in graphlab-create "graphlab.text analytics.count_words" to count words of every row and make a new column in csv data sheet of word count ?

Neeraj Singh
- 11
-1
votes
1 answer
Working with the sklearn Boston Housing Dataset: Trying to create dataframe for coefficients
I've ran the following lines of code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.datasets import load_boston
boston = load_boston()
print(boston.data.shape)
from…

Ian M.
- 1
- 2
-1
votes
1 answer
Merge different columns Values - Pandas
I have Nine columns 'instlevel1','instlevel2','instlevel3', 'instlevel4', 'instlevel5','instlevel6','instlevel7','instlevel8','instlevel9'
the values on this column are populated as follow : if instlevel1 value is 1, all others values for are 0, if…
-1
votes
1 answer
OneHotEncoding method is failing in sklearn
I have a data frame which i will denote df for now and i obtain an ndarray as follows
X=df.iloc[:,5:].values
which i want to use for a machine learning model. I need to one-hot-encode the 12th column of X.
Using sklearn i first labelencoded it as…

Iltl
- 153
- 9
-1
votes
1 answer
ShuffleSplit of Sklearn issue
I have a data set named df_noyau_yes and I want to apply a ShuffleSplit to split it into train and test sets to train an autoencoder.
The problem is that this functions returns indices of the shuffled data, I tried to extract the data of these…

Mari
- 69
- 1
- 8
-1
votes
1 answer
Why is my y_pred model only close to zero?
I am new to python and also learning machine learning. I got a data-set for titanic and trying to predict who survived and who did not. But my code seems to have an issue with the y_pred, as none of them is close to 1 or above one. Find attached…

Banky
- 1
- 1
-1
votes
1 answer
How to binary encode tow mixed features?
I have a dataset looking like this one:
import pandas as pd
pd.DataFrame({"A": [2, 2, 1, 0, 5, 3, 0, 4, 5], "B": [1, 0, 0, 0, 1, 1, 1, 0, 0]})
A B
0 2 1
1 2 0
2 1 0
3 0 0
4 5 1
5 3 1
6 0 1
7 4 0
(I know that A is between 0 and…

stellasia
- 5,372
- 4
- 23
- 43
-1
votes
1 answer
Python sklearn df issue - Field Cady sample code issue
I'm working through Field Cady's "The Data Science Handbook", with sample code here: https://github.com/field-cady/the_data_science_handbook/blob/master/chapter08_classifiers/example.py
I get syntax error from line 23 of this code, namely:
File…

justdata
- 153
- 1
- 7
-1
votes
1 answer
Python Sklearn Predicting values on an unseen data set
I have a set of football data in a database that I am trying to predict values for.
import MySQLdb
import pandas as pd
from sklearn.feature_selection import RFE
from sqlalchemy import create_engine
import mysql.connector
from matplotlib import…

Nick Stanford
- 1
- 1
- 3
-1
votes
2 answers
Pandas how update values with counts greater x
I have a pandas column that contains a lot of string that appear less than 5 times, I do not to remove these values however I do want to replace them with a placeholder string called "pruned". What is the best way to do this?
df=…

Ari
- 563
- 2
- 17