Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
0
votes
1 answer
Learning_curve error
I tried to use plot_learning_curve to plot logistic regression below, but got error. Could anyone help?
from sklearn.linear_model import LogisticRegression
lg = LogisticRegression(random_state=42, penalty='l1')
parameters = {'C':[0.5]}
# Use…

Frank Hee
- 17
- 6
0
votes
0 answers
sklearn.feature_selection and RFECV
import pandas as pd
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import SelectPercentile
a = pd.read_csv('NCAA_2003-2016_with_diff.csv')
logreg = lm.LogisticRegression()
rfecv = RFECV(estimator=logreg,…

Hong
- 1
0
votes
0 answers
type wrong when using sklearn and pandas.Dataframe
I want to use sklearn to do some predict and i stored my data in a Dataframe.
Data = DataFrame(columns = columns,index = range(1,501))
The data has no problem.
from sklearn.cross_validation import train_test_split
Xtrain,Xtest,Ytrain,Ytest =…

Mengyang LIU
- 3
- 2
0
votes
1 answer
Does binary log loss exclude one part of equation based on y?
Assuming the log loss equation to be:
logLoss=−(1/N)*∑_{i=1}^N (yi(log(pi))+(1−yi)log(1−pi))
where N is number of samples, yi...yiN is the actual value of the dependent variable, and pi...piN is the predicted likelihood from logistic regression
How…

Liam Hanninen
- 1,525
- 2
- 19
- 37
0
votes
2 answers
Receiving a value error when using OneHotEncoder and fitting data
I'm working on an assignment and we are using OneHotEncoder in scikit-learn to make all categories print out. Here is the a sample of the data and the code I used to transform it:
grade sub_grade short_emp emp_length_num home_ownership …

macshaggy
- 357
- 1
- 4
- 17
0
votes
0 answers
Using train_test_split to generate test and train data causes changes in underlying data
I am using trai_test_split from sklearn.cross_validation to split the source CSV data file into training and test data using simple Python code like this:
from sklearn.cross_validation import train_test_split
import pandas as pd
dataset =…

VS_FF
- 2,353
- 3
- 16
- 34
0
votes
1 answer
Error in implementing SVC in sklearn
I am trying to implement svc for predicting a continuous variable:
print("X_train_dtm type ", type(X_train_dtm))
print("y_train type ", type(y_train))
svc = svm.SVC(kernel='linear', C=C).fit(X_train_dtm, y_train)
However I am getting the following…

Bonson
- 1,418
- 4
- 18
- 38
0
votes
0 answers
Python - sklearn making predictions on the wrong column
I'm currently trying to make predictions for the next months worth of business days for stock prices pulled from Quandl, I got this idea from a tutorial on pythonprogramming.net (which heavily influences the structure of the code here), however when…

Connor McCluskey
- 43
- 8
0
votes
1 answer
Difference between statsmodel OLS and scikit linear regression; different models give different r square
I am new to python and trying to calculate a simple linear regression. My model has one dependent variable and one independent variable. I am using linear_model.LinearRegression() from sklearn package. I got an R square value of .16
Then I used…

SAM244776
- 1,375
- 6
- 18
- 26
0
votes
1 answer
Divide dataframe into two sets according to a column
I have Dataframe df i choosed some coulmns of it and i want to divide them into xtrain and xtest accoring to a coulmn called Sevrice. So that raws with 1 and o into the xtrain and nan into xtest.
Service
1
0
0
1
Nan
Nan
xtarin =…
user7308269
0
votes
1 answer
What's the difference between importing a whole module vs importing just the required method from the module in python?
When using scikit learn or other similar Python libraries, what's the difference between doing:
import sklearn.cluster as sk
model = sk.KMeans(n_clusters=n)
And
from sklearn.cluster import KMeans
model = KMeans(n_clusters=n)
Is there any…

Alex Kinman
- 2,437
- 8
- 32
- 51
0
votes
1 answer
mapping back any sklearn result to the original dataframe
I'd like to analyze the predicted values of my random forest results in excel with the original test data as a reference.
The predicted result comes in an array as i use this:
predict = rf.predict(test[columns])
how do I map back the predicted…

galeej
- 535
- 9
- 23
0
votes
1 answer
Converting dataframe column of years to month day year
I'm doing this for homework.
My goal is to have an entirely new column with just the days elapsed. There are 500,000+ rows of this...so my goal is to:
In the Pandas dataframe, I have these two date columns which are in different formats. I'd like…

jhub1
- 611
- 3
- 7
- 19
0
votes
1 answer
Unable to fit_transform data from csv file in sklearn
I am trying to learn some classification in Scikit-learn. However, I couldn't figure out what this error means.
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
data_frame = pd.read_csv('data.csv', header=0)…

Jhooma
- 3
- 3
0
votes
2 answers
Does use dummy value make model's performance better?
I see many feature engineering has the get_dummies step on the object features. For example, dummy the sex column which contains 'M' and 'F' to two columns and label them in one-hot representation.
Why we not directly make the 'M' and 'F' as 0 and…

yanachen
- 3,401
- 8
- 32
- 64