Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
3
votes
1 answer

AttributeError: 'Series' object has no attribute 'to_coo'

I am trying to use a Naive Bayes classifier from the sklearn module to classify whether movie reviews are positive. I am using a bag of words as the features for each review and a large dataset with sentiment scores attached to reviews. df_bows =…
3
votes
1 answer

Train Test Split sklearn based on group variable

My X is as follows: EDIT1: Unique ID. Exp start date. Value. Status. 001 01/01/2020. 4000. Closed 001 12/01/2019 4000. Archived 002 01/01/2020. 5000. Closed 002 12/01/2019 …
Zee
  • 81
  • 1
  • 8
3
votes
2 answers

Generate Centriods of Kmeans in Ascending Order

I am trying to use Kmean algorithm in Python using Sklearn library. My question is, that is there any way in which I can generate centriods in ascending orders. for example here is my code: kmeanDataFrame = pd.DataFrame({'x':X,'y':Y}) kmean =…
3
votes
0 answers

Pyinstaller: Permission Error when using --hiddenimports flag

I am trying to create a single file executable using pyinstaller. I used cython to convert my source code, source_code.pyx, into a DLL called source_code.cp37-win_amd64.pyd written in python bytecode. I then wrote a python script,…
abasi
  • 31
  • 3
3
votes
1 answer

Temporal Disaggregation of Time Series in Python

I am trying to find a package that enables temporal disaggregation of timeseries. There is a package in R called tempdisagg. https://journal.r-project.org/archive/2013/RJ-2013-028/RJ-2013-028.pdf Is there any similar package in python anyone is…
rsc05
  • 3,626
  • 2
  • 36
  • 57
3
votes
1 answer

Is it possible to change pandas column data type within a sklearn pipeline?

Sklearn pipeline I am using has multiple transformers but one of the initial transformers returns numerical type and the consecutive one takes object type variables. Basically I need squeeze in a: data[col] = data[col].astype(object) for the…
k92
  • 375
  • 3
  • 15
3
votes
1 answer

How do I Label Encode using the Pipeline API?

I want to incorporate Label Encoding through the scikit learn pipeline. Unfortunately, LabelEncoder() is broken with the pipeline API so that's not an option right now. I tried creating my own class which calls .map() to map categories to…
mrgoldtech
  • 73
  • 1
  • 4
3
votes
1 answer

ModuleNotFoundError: No module named 'sklearn.neighbours'

I am beginner in machine learning and did find what's going wrong in this module error... from sklearn.neighbours import KNeighborsClassifier ModuleNotFoundError: No module named 'sklearn.neighbours'
Sandeep Agrawal
  • 175
  • 1
  • 8
3
votes
1 answer

Jupyter Notebook PySpark OSError [WinError 123] The filename, directory name, or volume label syntax is incorrect:

System Configuration: Operating System: Windows 10 Python Version: 3.7 Spark Version: 2.4.4 SPARK_HOME: C:\spark\spark-2.4.4-bin-hadoop2.7 Problem I am using PySpark to do parallel computations on all the columns of a row in a dataframe. I convert…
Mahima
  • 132
  • 9
3
votes
1 answer

How to find optimal parametrs for DBSCAN?

Is there any tool which calculates optimal value for minpts and eps for DBSCAN algorithm? Currently i use sklearn library to apply DBSCAN algorithm from sklearn.cluster import DBSCAN I tried algorithm with several minpts and eps but without any…
Sascha
  • 687
  • 1
  • 8
  • 22
3
votes
2 answers

Imputing missing values using sklearn IterativeImputer class for MICE

I'm trying to learn how to implement MICE in imputing missing values for my datasets. I've heard about fancyimpute's MICE, but I also read that sklearn's IterativeImputer class can accomplish similar results. From sklearn's docs: Our implementation…
Glenn G.
  • 419
  • 3
  • 7
  • 18
3
votes
3 answers

Transform users (repeated over multiple rows) and items in a dataframe into a label binarized dataframe

I have a DataFrame that looks like this df = pd.DataFrame([ ['a', 1], ['b', 1], ['c', 1], ['a', 2], ['c', 3], ['b', 4], ['c', 4] ], columns=['item', 'user']) Where each user is repeated across multiple rows (with…
emehex
  • 9,874
  • 10
  • 54
  • 100
3
votes
1 answer

How to get predicted values along with test data, and visualize actual vs predicted?

from sklearn import datasets import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import Perceptron data = pd.read_csv('student_selection.csv') x =…
Output Scream
  • 27
  • 1
  • 4
3
votes
0 answers

Input contains infinity or a value too large for dtype('float64')

I've seen many similar questions here, but none of the answers solved my problem. I am trying to do a Power Transform in my dataset, but I am still obtaining such error. The dataset does not contain inf or nan values, and I make sure that they are…
3
votes
3 answers

get_feature_names not found in countvectorizer()

I'm mining the Stack Overflow data dump of posts about deep learning libraries. I'd like to identify stop words in my corpus (like 'python' for instance). I want to get my feature names so I can identify the words with highest term frequencies. I…
maddie
  • 1,854
  • 4
  • 30
  • 66