Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
0
votes
1 answer

Learning_curve error

I tried to use plot_learning_curve to plot logistic regression below, but got error. Could anyone help? from sklearn.linear_model import LogisticRegression lg = LogisticRegression(random_state=42, penalty='l1') parameters = {'C':[0.5]} # Use…
Frank Hee
  • 17
  • 6
0
votes
0 answers

sklearn.feature_selection and RFECV

import pandas as pd from sklearn.cross_validation import StratifiedKFold from sklearn.feature_selection import SelectPercentile a = pd.read_csv('NCAA_2003-2016_with_diff.csv') logreg = lm.LogisticRegression() rfecv = RFECV(estimator=logreg,…
Hong
  • 1
0
votes
0 answers

type wrong when using sklearn and pandas.Dataframe

I want to use sklearn to do some predict and i stored my data in a Dataframe. Data = DataFrame(columns = columns,index = range(1,501)) The data has no problem. from sklearn.cross_validation import train_test_split Xtrain,Xtest,Ytrain,Ytest =…
0
votes
1 answer

Does binary log loss exclude one part of equation based on y?

Assuming the log loss equation to be: logLoss=−(1/N)*∑_{i=1}^N (yi(log(pi))+(1−yi)log(1−pi)) where N is number of samples, yi...yiN is the actual value of the dependent variable, and pi...piN is the predicted likelihood from logistic regression How…
0
votes
2 answers

Receiving a value error when using OneHotEncoder and fitting data

I'm working on an assignment and we are using OneHotEncoder in scikit-learn to make all categories print out. Here is the a sample of the data and the code I used to transform it: grade sub_grade short_emp emp_length_num home_ownership …
macshaggy
  • 357
  • 1
  • 4
  • 17
0
votes
0 answers

Using train_test_split to generate test and train data causes changes in underlying data

I am using trai_test_split from sklearn.cross_validation to split the source CSV data file into training and test data using simple Python code like this: from sklearn.cross_validation import train_test_split import pandas as pd dataset =…
VS_FF
  • 2,353
  • 3
  • 16
  • 34
0
votes
1 answer

Error in implementing SVC in sklearn

I am trying to implement svc for predicting a continuous variable: print("X_train_dtm type ", type(X_train_dtm)) print("y_train type ", type(y_train)) svc = svm.SVC(kernel='linear', C=C).fit(X_train_dtm, y_train) However I am getting the following…
Bonson
  • 1,418
  • 4
  • 18
  • 38
0
votes
0 answers

Python - sklearn making predictions on the wrong column

I'm currently trying to make predictions for the next months worth of business days for stock prices pulled from Quandl, I got this idea from a tutorial on pythonprogramming.net (which heavily influences the structure of the code here), however when…
0
votes
1 answer

Difference between statsmodel OLS and scikit linear regression; different models give different r square

I am new to python and trying to calculate a simple linear regression. My model has one dependent variable and one independent variable. I am using linear_model.LinearRegression() from sklearn package. I got an R square value of .16 Then I used…
SAM244776
  • 1,375
  • 6
  • 18
  • 26
0
votes
1 answer

Divide dataframe into two sets according to a column

I have Dataframe df i choosed some coulmns of it and i want to divide them into xtrain and xtest accoring to a coulmn called Sevrice. So that raws with 1 and o into the xtrain and nan into xtest. Service 1 0 0 1 Nan Nan xtarin =…
user7308269
0
votes
1 answer

What's the difference between importing a whole module vs importing just the required method from the module in python?

When using scikit learn or other similar Python libraries, what's the difference between doing: import sklearn.cluster as sk model = sk.KMeans(n_clusters=n) And from sklearn.cluster import KMeans model = KMeans(n_clusters=n) Is there any…
Alex Kinman
  • 2,437
  • 8
  • 32
  • 51
0
votes
1 answer

mapping back any sklearn result to the original dataframe

I'd like to analyze the predicted values of my random forest results in excel with the original test data as a reference. The predicted result comes in an array as i use this: predict = rf.predict(test[columns]) how do I map back the predicted…
galeej
  • 535
  • 9
  • 23
0
votes
1 answer

Converting dataframe column of years to month day year

I'm doing this for homework. My goal is to have an entirely new column with just the days elapsed. There are 500,000+ rows of this...so my goal is to: In the Pandas dataframe, I have these two date columns which are in different formats. I'd like…
jhub1
  • 611
  • 3
  • 7
  • 19
0
votes
1 answer

Unable to fit_transform data from csv file in sklearn

I am trying to learn some classification in Scikit-learn. However, I couldn't figure out what this error means. import pandas as pd from sklearn.feature_extraction.text import CountVectorizer data_frame = pd.read_csv('data.csv', header=0)…
0
votes
2 answers

Does use dummy value make model's performance better?

I see many feature engineering has the get_dummies step on the object features. For example, dummy the sex column which contains 'M' and 'F' to two columns and label them in one-hot representation. Why we not directly make the 'M' and 'F' as 0 and…
yanachen
  • 3,401
  • 8
  • 32
  • 64