Highest Voted 'sklearn-pandas' Questions

2

votes

1 answer

Accessing attributes in sklearn pipeline

I'm having trouble accessing attributes of intermediate steps in my sklearn pipeline. Here's my code: from sklearn.pipeline import make_pipeline, make_union from sklearn.compose import make_column_transformer from sklearn.impute import…

python scikit-learn sklearn-pandas

asked Dec 26 '19 at 05:55

mrgoldtech

73
1
4

2

votes

0 answers

Multinomial naive bayes ValueError: shapes not aligned, only when using chi2 test

I'm trying to make a pos/neg review classifier and wanted to use Multinomial naive bayes (or regular naive bayes). If I don't feature select using SelectKbest Chi2, it works fine. But if I do, I get the following error: Traceback (most recent call…

python machine-learning scikit-learn sklearn-pandas

asked Dec 18 '19 at 00:53

user12195705

147
2
10

2

votes

1 answer

Error Making prediction with python onnxruntime

I have created an very basic decision tree using the sklearn library. This tree is trained based on 4 features: feat1 INT feat2 INT feat3 FLOAT feat4 FLOAT And the label/target feature is a boolean value (0 or 1). I converted the tree into a ONNX…

python scikit-learn sklearn-pandas onnx onnxruntime

asked Nov 26 '19 at 09:08

user7432713

197
3
17

2

votes

1 answer

How to choose data columns and target columns in a dataframe for test_train_split?

I'm trying to set up a test_train_split with data I have read from a csv into a pandas dataframe. The book I am reading says I should separate into x_train as the data and y_train as the target, but how can I define which column is the target and…

python machine-learning scikit-learn sklearn-pandas

asked Nov 04 '19 at 16:46

James

395
2
8
16

2

votes

2 answers

Get prediction confidence through Decision Tree Regression in sklearn

Is there a way I can attach some sort of confidence with my predictions from Decision Tree Regression output in python? from sklearn.tree import DecisionTreeRegressor dt = DecisionTreeRegressor(random_state=0, criterion="mae") dt_fit =…

scikit-learn regression decision-tree sklearn-pandas confidence-interval

asked Oct 28 '19 at 20:54

ayadav

75
8

2

votes

1 answer

"A column-vector y was passed when a 1d array was expected" error message

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis clf = LinearDiscriminantAnalysis() clf.fit(np.matrix(X_train), np.matrix(y_train)) but I get the error message. Specified above. I checked the shape of y_train but it's…

sklearn-pandas

asked Oct 10 '19 at 00:23

financial_physician

1,672
1
14
34

2

votes

2 answers

Pandas: get the cumulative sum of a column only if the timestamp is greater than that of another column

For each customer, I would like to get the cumulative sum of a column (Dollar Value) only when Timestamp 1 is less than Timestamp 2. I could do a cartesian join of the values based on Customer or iterate through the dataframe, but wanted to see if…

python pandas sklearn-pandas

asked Sep 29 '19 at 01:45

minnymate

55
6

2

votes

2 answers

Optimize K-Nearest Neighbors Algorithm on 50 variables x 100k row dataset

I want to optimize a piece of code that helps me to calculate a nearest neighbour for every item in a given dataset with 100k rows. The dataset contains 50 variable-columns, which helps to describe each row-item and most of cells contains a…

python scikit-learn knn sklearn-pandas euclidean-distance

asked Sep 24 '19 at 14:08

d_-

1,391
2
19
37

2

votes

2 answers

Pandas Dataframe apply custom function to certain rows with NULL columns

I have a Dataframe that looks like: ------------------------------ |Date | Deal | Country | ------------------------------ |2019-01-02 | ABC | US | ------------------------------ |2019-02-01 | ABC | US …

python pandas pandas-groupby sklearn-pandas django-pandas

asked Sep 16 '19 at 17:38

CodeSsscala

729
3
11
23

2

votes

4 answers

Whats does X of imputer = imputer.fit(X[:,1:3]) stand for, whats the meaning of imputer.fit(X[:,1:3])?

I m working on a preprocessing a data set, i get the error cause of the line imputer = imputer.fit(X[:,1:3]). Which i dont get? I understand imputer = Imputer(missing_values = "NaN", strategy = "mean"), means replace missing values with mean value…

python-3.x pandas data-science sklearn-pandas

asked Sep 12 '19 at 01:39

Dulangi_Kanchana

1,135
10
21

2

votes

1 answer

NameError : name 'metrics' is not defined

It gives error in calculating accuracy of metrics. I imported the library to calculate accuracy metrics but it still giving me error that metrics name is not defined from sklearn.feature_extraction.text import TfidfVectorizer tf_idf_vect =…

python-3.x sklearn-pandas

asked Aug 27 '19 at 09:07

Hafsa Naveed

33
1
1
4

2

votes

1 answer

Split list into columns in pandas

I have a dataframe like this df = (pd.DataFrame({'ID': ['ID1', 'ID2', 'ID3'], 'Values': [['AB', 'BC'], np.NaN, ['AB', 'CD']]})) df ID Values 0 ID1 [AB, BC] 1 ID2 NaN 2 ID3 [AB, CD] I want to split the item inside…

python pandas dataframe sklearn-pandas

asked Jul 17 '19 at 11:06

Hardik Gupta

4,700
9
41
83

2

votes

1 answer

How to fix Value Error with train_test_split in Python Numpy

I am using sklearn with a numpy array. I have 2 arrays (x, y) and they should be: test_size=0.2 train_size=0.8 This is my current code: def predict(): sample_data = pd.read_csv("includes\\csv.csv") x = np.array(sample_data["day"]) y =…

python pandas numpy sklearn-pandas

asked May 31 '19 at 14:27

python_beginner

105
2
4
12

2

votes

2 answers

After choosing K-components in PCA how do we find out which components(names of the columns) have algorithm selected?

I am new to Data Science and I need some help to understand PCA.I know that each of columns constitute one axis,but when PCA is done and components are reduced to some k value,How to know which all columns got selected?

python-3.x k-means pca sklearn-pandas

asked May 26 '19 at 20:25

Ravi Biradar

61
3
7

2

votes

1 answer

How to load this kind of data in pandas

Background: I have logs which are generated during the testing of the devices after manufacture. Each device has a serial number and a corresponding csv log file with all the data. Something like this. DATE,TESTSTEP,READING,LIMIT,RESULT 01/01/2019…

python pandas scikit-learn sklearn-pandas

asked Apr 30 '19 at 06:18

NotAgain

1,927
3
26
42

Questions tagged [sklearn-pandas]

Resources

Accessing attributes in sklearn pipeline

Multinomial naive bayes ValueError: shapes not aligned, only when using chi2 test

Error Making prediction with python onnxruntime

How to choose data columns and target columns in a dataframe for test_train_split?

Get prediction confidence through Decision Tree Regression in sklearn

"A column-vector y was passed when a 1d array was expected" error message

Pandas: get the cumulative sum of a column only if the timestamp is greater than that of another column

Optimize K-Nearest Neighbors Algorithm on 50 variables x 100k row dataset

Pandas Dataframe apply custom function to certain rows with NULL columns

Whats does X of imputer = imputer.fit(X[:,1:3]) stand for, whats the meaning of imputer.fit(X[:,1:3])?

NameError : name 'metrics' is not defined

Split list into columns in pandas

How to fix Value Error with train_test_split in Python Numpy

After choosing K-components in PCA how do we find out which components(names of the columns) have algorithm selected?

How to load this kind of data in pandas