Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
3
votes
1 answer
AttributeError: 'Series' object has no attribute 'to_coo'
I am trying to use a Naive Bayes classifier from the sklearn module to classify whether movie reviews are positive. I am using a bag of words as the features for each review and a large dataset with sentiment scores attached to reviews.
df_bows =…

Luke Turvey
- 31
- 3
3
votes
1 answer
Train Test Split sklearn based on group variable
My X is as follows:
EDIT1:
Unique ID. Exp start date. Value. Status.
001 01/01/2020. 4000. Closed
001 12/01/2019 4000. Archived
002 01/01/2020. 5000. Closed
002 12/01/2019 …

Zee
- 81
- 1
- 8
3
votes
2 answers
Generate Centriods of Kmeans in Ascending Order
I am trying to use Kmean algorithm in Python using Sklearn library. My question is, that is there any way in which I can generate centriods in ascending orders.
for example here is my code:
kmeanDataFrame = pd.DataFrame({'x':X,'y':Y})
kmean =…

Imran Ahmad Shahid
- 793
- 8
- 29
3
votes
0 answers
Pyinstaller: Permission Error when using --hiddenimports flag
I am trying to create a single file executable using pyinstaller. I used cython to convert my source code, source_code.pyx, into a DLL called source_code.cp37-win_amd64.pyd written in python bytecode. I then wrote a python script,…

abasi
- 31
- 3
3
votes
1 answer
Temporal Disaggregation of Time Series in Python
I am trying to find a package that enables temporal disaggregation of timeseries. There is a package in R called tempdisagg.
https://journal.r-project.org/archive/2013/RJ-2013-028/RJ-2013-028.pdf
Is there any similar package in python anyone is…

rsc05
- 3,626
- 2
- 36
- 57
3
votes
1 answer
Is it possible to change pandas column data type within a sklearn pipeline?
Sklearn pipeline I am using has multiple transformers but one of the initial transformers returns numerical type and the consecutive one takes object type variables.
Basically I need squeeze in a:
data[col] = data[col].astype(object)
for the…

k92
- 375
- 3
- 15
3
votes
1 answer
How do I Label Encode using the Pipeline API?
I want to incorporate Label Encoding through the scikit learn pipeline. Unfortunately, LabelEncoder() is broken with the pipeline API so that's not an option right now. I tried creating my own class which calls .map() to map categories to…

mrgoldtech
- 73
- 1
- 4
3
votes
1 answer
ModuleNotFoundError: No module named 'sklearn.neighbours'
I am beginner in machine learning and did find what's going wrong in
this module
error...
from sklearn.neighbours import KNeighborsClassifier
ModuleNotFoundError: No module named 'sklearn.neighbours'

Sandeep Agrawal
- 175
- 1
- 8
3
votes
1 answer
Jupyter Notebook PySpark OSError [WinError 123] The filename, directory name, or volume label syntax is incorrect:
System Configuration:
Operating System: Windows 10
Python Version: 3.7
Spark Version: 2.4.4
SPARK_HOME: C:\spark\spark-2.4.4-bin-hadoop2.7
Problem
I am using PySpark to do parallel computations on all the columns of a row in a dataframe. I convert…

Mahima
- 132
- 9
3
votes
1 answer
How to find optimal parametrs for DBSCAN?
Is there any tool which calculates optimal value for minpts and eps for DBSCAN algorithm?
Currently i use sklearn library to apply DBSCAN algorithm
from sklearn.cluster import DBSCAN
I tried algorithm with several minpts and eps but without any…

Sascha
- 687
- 1
- 8
- 22
3
votes
2 answers
Imputing missing values using sklearn IterativeImputer class for MICE
I'm trying to learn how to implement MICE in imputing missing values for my datasets. I've heard about fancyimpute's MICE, but I also read that sklearn's IterativeImputer class can accomplish similar results. From sklearn's docs:
Our implementation…

Glenn G.
- 419
- 3
- 7
- 18
3
votes
3 answers
Transform users (repeated over multiple rows) and items in a dataframe into a label binarized dataframe
I have a DataFrame that looks like this
df = pd.DataFrame([
['a', 1],
['b', 1],
['c', 1],
['a', 2],
['c', 3],
['b', 4],
['c', 4]
], columns=['item', 'user'])
Where each user is repeated across multiple rows (with…

emehex
- 9,874
- 10
- 54
- 100
3
votes
1 answer
How to get predicted values along with test data, and visualize actual vs predicted?
from sklearn import datasets
import numpy as np
import pandas as pd from sklearn.model_selection
import train_test_split
from sklearn.linear_model import Perceptron
data = pd.read_csv('student_selection.csv')
x =…

Output Scream
- 27
- 1
- 4
3
votes
0 answers
Input contains infinity or a value too large for dtype('float64')
I've seen many similar questions here, but none of the answers solved my problem.
I am trying to do a Power Transform in my dataset, but I am still obtaining such error.
The dataset does not contain inf or nan values, and I make sure that they are…

Gabriela Pontes
- 31
- 4
3
votes
3 answers
get_feature_names not found in countvectorizer()
I'm mining the Stack Overflow data dump of posts about deep learning libraries. I'd like to identify stop words in my corpus (like 'python' for instance). I want to get my feature names so I can identify the words with highest term frequencies.
I…

maddie
- 1,854
- 4
- 30
- 66