Highest Voted 'sklearn-pandas' Questions

3

votes

1 answer

AttributeError: 'Series' object has no attribute 'to_coo'

I am trying to use a Naive Bayes classifier from the sklearn module to classify whether movie reviews are positive. I am using a bag of words as the features for each review and a large dataset with sentiment scores attached to reviews. df_bows =…

asked Jul 21 '20 at 19:00

Luke Turvey

31
3

3

votes

1 answer

Train Test Split sklearn based on group variable

My X is as follows: EDIT1: Unique ID. Exp start date. Value. Status. 001 01/01/2020. 4000. Closed 001 12/01/2019 4000. Archived 002 01/01/2020. 5000. Closed 002 12/01/2019 …

python scikit-learn sklearn-pandas train-test-split

asked May 15 '20 at 19:08

Zee

81
1
8

3

votes

2 answers

Generate Centriods of Kmeans in Ascending Order

I am trying to use Kmean algorithm in Python using Sklearn library. My question is, that is there any way in which I can generate centriods in ascending orders. for example here is my code: kmeanDataFrame = pd.DataFrame({'x':X,'y':Y}) kmean =…

python algorithm scikit-learn k-means sklearn-pandas

asked Mar 01 '20 at 12:03

Imran Ahmad Shahid

793
8
29

3

votes

0 answers

Pyinstaller: Permission Error when using --hiddenimports flag

I am trying to create a single file executable using pyinstaller. I used cython to convert my source code, source_code.pyx, into a DLL called source_code.cp37-win_amd64.pyd written in python bytecode. I then wrote a python script,…

python numpy pyinstaller sklearn-pandas

asked Feb 27 '20 at 16:20

abasi

31
3

3

votes

1 answer

Temporal Disaggregation of Time Series in Python

I am trying to find a package that enables temporal disaggregation of timeseries. There is a package in R called tempdisagg. https://journal.r-project.org/archive/2013/RJ-2013-028/RJ-2013-028.pdf Is there any similar package in python anyone is…

r python-3.x time-series rpy2 sklearn-pandas

asked Feb 04 '20 at 13:05

rsc05

3,626
2
36
57

3

votes

1 answer

Is it possible to change pandas column data type within a sklearn pipeline?

Sklearn pipeline I am using has multiple transformers but one of the initial transformers returns numerical type and the consecutive one takes object type variables. Basically I need squeeze in a: data[col] = data[col].astype(object) for the…

machine-learning scikit-learn sklearn-pandas

asked Dec 25 '19 at 07:30

k92

375
3
15

3

votes

1 answer

How do I Label Encode using the Pipeline API?

I want to incorporate Label Encoding through the scikit learn pipeline. Unfortunately, LabelEncoder() is broken with the pipeline API so that's not an option right now. I tried creating my own class which calls .map() to map categories to…

python pandas scikit-learn pipeline sklearn-pandas

asked Dec 20 '19 at 09:58

mrgoldtech

73
1
4

3

votes

1 answer

ModuleNotFoundError: No module named 'sklearn.neighbours'

I am beginner in machine learning and did find what's going wrong in this module error... from sklearn.neighbours import KNeighborsClassifier ModuleNotFoundError: No module named 'sklearn.neighbours'

python django scikit-learn module sklearn-pandas

asked Dec 19 '19 at 05:48

Sandeep Agrawal

175
1
8

3

votes

1 answer

Jupyter Notebook PySpark OSError [WinError 123] The filename, directory name, or volume label syntax is incorrect:

System Configuration: Operating System: Windows 10 Python Version: 3.7 Spark Version: 2.4.4 SPARK_HOME: C:\spark\spark-2.4.4-bin-hadoop2.7 Problem I am using PySpark to do parallel computations on all the columns of a row in a dataframe. I convert…

python pyspark anaconda rdd sklearn-pandas

asked Nov 28 '19 at 12:07

Mahima

132
9

3

votes

1 answer

How to find optimal parametrs for DBSCAN?

Is there any tool which calculates optimal value for minpts and eps for DBSCAN algorithm? Currently i use sklearn library to apply DBSCAN algorithm from sklearn.cluster import DBSCAN I tried algorithm with several minpts and eps but without any…

python parameters sklearn-pandas dbscan

asked Nov 21 '19 at 20:50

Sascha

687
1
8
22

3

votes

2 answers

Imputing missing values using sklearn IterativeImputer class for MICE

I'm trying to learn how to implement MICE in imputing missing values for my datasets. I've heard about fancyimpute's MICE, but I also read that sklearn's IterativeImputer class can accomplish similar results. From sklearn's docs: Our implementation…

python dataframe missing-data sklearn-pandas

asked Oct 29 '19 at 18:03

Glenn G.

419
3
7
18

3

votes

3 answers

Transform users (repeated over multiple rows) and items in a dataframe into a label binarized dataframe

I have a DataFrame that looks like this df = pd.DataFrame([ ['a', 1], ['b', 1], ['c', 1], ['a', 2], ['c', 3], ['b', 4], ['c', 4] ], columns=['item', 'user']) Where each user is repeated across multiple rows (with…

python pandas scikit-learn sklearn-pandas

asked Oct 16 '19 at 20:36

emehex

9,874
10
54
100

3

votes

1 answer

How to get predicted values along with test data, and visualize actual vs predicted?

from sklearn import datasets import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import Perceptron data = pd.read_csv('student_selection.csv') x =…

python pandas numpy scikit-learn sklearn-pandas

asked Sep 17 '19 at 16:52

Output Scream

27
1
4

3

votes

0 answers

Input contains infinity or a value too large for dtype('float64')

I've seen many similar questions here, but none of the answers solved my problem. I am trying to do a Power Transform in my dataset, but I am still obtaining such error. The dataset does not contain inf or nan values, and I make sure that they are…

pandas scikit-learn sklearn-pandas

asked Jun 16 '19 at 18:25

Gabriela Pontes

31
4

3

votes

3 answers

get_feature_names not found in countvectorizer()

I'm mining the Stack Overflow data dump of posts about deep learning libraries. I'd like to identify stop words in my corpus (like 'python' for instance). I want to get my feature names so I can identify the words with highest term frequencies. I…

python pandas sklearn-pandas countvectorizer

asked Apr 04 '19 at 19:21

maddie

1,854
4
30
66

Questions tagged [sklearn-pandas]

Resources

AttributeError: 'Series' object has no attribute 'to_coo'

Train Test Split sklearn based on group variable

Generate Centriods of Kmeans in Ascending Order

Pyinstaller: Permission Error when using --hiddenimports flag

Temporal Disaggregation of Time Series in Python

Is it possible to change pandas column data type within a sklearn pipeline?

How do I Label Encode using the Pipeline API?

ModuleNotFoundError: No module named 'sklearn.neighbours'

Jupyter Notebook PySpark OSError [WinError 123] The filename, directory name, or volume label syntax is incorrect:

How to find optimal parametrs for DBSCAN?

Imputing missing values using sklearn IterativeImputer class for MICE

Transform users (repeated over multiple rows) and items in a dataframe into a label binarized dataframe

How to get predicted values along with test data, and visualize actual vs predicted?

Input contains infinity or a value too large for dtype('float64')

get_feature_names not found in countvectorizer()