Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
2
votes
1 answer

Accessing attributes in sklearn pipeline

I'm having trouble accessing attributes of intermediate steps in my sklearn pipeline. Here's my code: from sklearn.pipeline import make_pipeline, make_union from sklearn.compose import make_column_transformer from sklearn.impute import…
mrgoldtech
  • 73
  • 1
  • 4
2
votes
0 answers

Multinomial naive bayes ValueError: shapes not aligned, only when using chi2 test

I'm trying to make a pos/neg review classifier and wanted to use Multinomial naive bayes (or regular naive bayes). If I don't feature select using SelectKbest Chi2, it works fine. But if I do, I get the following error: Traceback (most recent call…
2
votes
1 answer

Error Making prediction with python onnxruntime

I have created an very basic decision tree using the sklearn library. This tree is trained based on 4 features: feat1 INT feat2 INT feat3 FLOAT feat4 FLOAT And the label/target feature is a boolean value (0 or 1). I converted the tree into a ONNX…
user7432713
  • 197
  • 3
  • 17
2
votes
1 answer

How to choose data columns and target columns in a dataframe for test_train_split?

I'm trying to set up a test_train_split with data I have read from a csv into a pandas dataframe. The book I am reading says I should separate into x_train as the data and y_train as the target, but how can I define which column is the target and…
James
  • 395
  • 2
  • 8
  • 16
2
votes
2 answers

Get prediction confidence through Decision Tree Regression in sklearn

Is there a way I can attach some sort of confidence with my predictions from Decision Tree Regression output in python? from sklearn.tree import DecisionTreeRegressor dt = DecisionTreeRegressor(random_state=0, criterion="mae") dt_fit =…
2
votes
1 answer

"A column-vector y was passed when a 1d array was expected" error message

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis clf = LinearDiscriminantAnalysis() clf.fit(np.matrix(X_train), np.matrix(y_train)) but I get the error message. Specified above. I checked the shape of y_train but it's…
financial_physician
  • 1,672
  • 1
  • 14
  • 34
2
votes
2 answers

Pandas: get the cumulative sum of a column only if the timestamp is greater than that of another column

For each customer, I would like to get the cumulative sum of a column (Dollar Value) only when Timestamp 1 is less than Timestamp 2. I could do a cartesian join of the values based on Customer or iterate through the dataframe, but wanted to see if…
minnymate
  • 55
  • 6
2
votes
2 answers

Optimize K-Nearest Neighbors Algorithm on 50 variables x 100k row dataset

I want to optimize a piece of code that helps me to calculate a nearest neighbour for every item in a given dataset with 100k rows. The dataset contains 50 variable-columns, which helps to describe each row-item and most of cells contains a…
d_-
  • 1,391
  • 2
  • 19
  • 37
2
votes
2 answers

Pandas Dataframe apply custom function to certain rows with NULL columns

I have a Dataframe that looks like: ------------------------------ |Date | Deal | Country | ------------------------------ |2019-01-02 | ABC | US | ------------------------------ |2019-02-01 | ABC | US …
CodeSsscala
  • 729
  • 3
  • 11
  • 23
2
votes
4 answers

Whats does X of imputer = imputer.fit(X[:,1:3]) stand for, whats the meaning of imputer.fit(X[:,1:3])?

I m working on a preprocessing a data set, i get the error cause of the line imputer = imputer.fit(X[:,1:3]). Which i dont get? I understand imputer = Imputer(missing_values = "NaN", strategy = "mean"), means replace missing values with mean value…
Dulangi_Kanchana
  • 1,135
  • 10
  • 21
2
votes
1 answer

NameError : name 'metrics' is not defined

It gives error in calculating accuracy of metrics. I imported the library to calculate accuracy metrics but it still giving me error that metrics name is not defined from sklearn.feature_extraction.text import TfidfVectorizer tf_idf_vect =…
Hafsa Naveed
  • 33
  • 1
  • 1
  • 4
2
votes
1 answer

Split list into columns in pandas

I have a dataframe like this df = (pd.DataFrame({'ID': ['ID1', 'ID2', 'ID3'], 'Values': [['AB', 'BC'], np.NaN, ['AB', 'CD']]})) df ID Values 0 ID1 [AB, BC] 1 ID2 NaN 2 ID3 [AB, CD] I want to split the item inside…
Hardik Gupta
  • 4,700
  • 9
  • 41
  • 83
2
votes
1 answer

How to fix Value Error with train_test_split in Python Numpy

I am using sklearn with a numpy array. I have 2 arrays (x, y) and they should be: test_size=0.2 train_size=0.8 This is my current code: def predict(): sample_data = pd.read_csv("includes\\csv.csv") x = np.array(sample_data["day"]) y =…
python_beginner
  • 105
  • 2
  • 4
  • 12
2
votes
2 answers

After choosing K-components in PCA how do we find out which components(names of the columns) have algorithm selected?

I am new to Data Science and I need some help to understand PCA.I know that each of columns constitute one axis,but when PCA is done and components are reduced to some k value,How to know which all columns got selected?
Ravi Biradar
  • 61
  • 3
  • 7
2
votes
1 answer

How to load this kind of data in pandas

Background: I have logs which are generated during the testing of the devices after manufacture. Each device has a serial number and a corresponding csv log file with all the data. Something like this. DATE,TESTSTEP,READING,LIMIT,RESULT 01/01/2019…
NotAgain
  • 1,927
  • 3
  • 26
  • 42