Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
3
votes
2 answers
sparse matrix length is ambiguous
I'm very new to machine learning so this question might sound stupid.
i'm following a tutorial on Text Classification but I'm facing an error that I don't have any idea about how to solve.
This is the code I have (it is basically what it is found in…

fatmau
- 87
- 1
- 8
3
votes
0 answers
TypeError: can only perform ops with scalar values
I would appreciate if you could let me know how to plot some informative charts for the table provided here.
For example, I need a bar chart for the column named "Domestic unlisted companies:Use of IFRSs by unlisted companies" which shows in how…

ebrahimi
- 912
- 2
- 13
- 32
3
votes
1 answer
Pandas JSON_Normalize only specific columns
I have a nested JSON structure which I need to flatten. On using JSON normalize it flattens all the keys. But, I want to flatten specific keys while preserving the other keys nested. How to achieve this with JSON normalize. The detail description of…

Bhavani Ravi
- 2,130
- 3
- 18
- 41
3
votes
1 answer
ValueError: Number of features of the model must match the input (sklearn)
I am trying to run a classifier on some movie review data. The data had already been separated into reviews_train.txt and reviews_test.txt. I then loaded the data in and separated each into review and label (either positive (0) or negative (1)) and…

C.G
- 87
- 1
- 7
3
votes
1 answer
Extract DataFrame from a list of indices of another DataFrame
I've a DataFrame "A" and a list of indices "I". I want to generate/get a DataFrame "B" which contains only the data in those indices "I" of the original DataFrame "A". How can I achieve this?
Assuming I = [1, 3] , I tried this A.filter(items=I,…

Temp O'rary
- 5,366
- 13
- 49
- 109
3
votes
3 answers
Sklearn_pandas in a pipeline returns TypeError: 'builtin_function_or_method' object is not iterable
I have a data set with categorical and numerical features on which I want to apply some transformations followed by XGBClassifier.
Link to data set: https://www.kaggle.com/blastchar/telco-customer-churn
As the transformations are different for the…

Bert Carremans
- 1,623
- 4
- 23
- 47
3
votes
1 answer
In sklearn.preprocessing module I get ValueError: Found array with 0 feature(s)
I saw a bunch of questions have this error but I could not understand the relation with my code or problem.
I am trying to fix the NaN values in the data which I got from a sample CSV file that I found on the internet. My code is very simple…

teoman
- 959
- 2
- 10
- 21
3
votes
2 answers
Extract rule path of data point through decision tree with sklearn python
I'm using decision tree model and I want to extract the decision path for each data point in order to understand what caused the Y rather than to predict it.
How can I do that? Couldn't find any documentation.

Adi Cohen
- 31
- 1
- 3
3
votes
2 answers
Scikit learn GaussianProcessClassifier memory error when using fit() function
I have X_train and y_train as 2 numpy.ndarrays of size (32561, 108) and (32561,) respectively.
I am receiving a memory error every time I call fit for my GaussianProcessClassifier.
>>> import pandas as pd
>>> import numpy as np
>>> from…

yalpsid eman
- 3,064
- 6
- 45
- 71
3
votes
2 answers
decision tree repeating class names
I have a very simple sample of data/labels, the problem I'm having is that the decision tree generated (pdf) is repeating the class name:
from sklearn import tree
from sklearn.externals.six import StringIO
import pydotplus
features_names =…

Hula Hula
- 553
- 8
- 20
3
votes
1 answer
Calculating accuracy scores of predicted continuous values
from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred)
I believe this code will return the accuracy of our predictions. However, I am comparing predicted and actual values of continuous values and I believe that most of them are…

Aditya
- 89
- 1
- 1
- 6
3
votes
0 answers
Runtime crashes on Google Colab
Why does the runtime keep crashing on Google Colab.
I have a simple MLP code that runs on my machine. I tried running the same code on Colab but it crashes immediately after loading the data files.
The data files are around 3GB total. The CPU and…

user3828311
- 907
- 4
- 11
- 20
3
votes
3 answers
How to make polynomial features using sparse matrix in Scikit-learn
I am using Scikit-learn for converting my train data to polynomials features and then fit it to a linear model.
model = Pipeline([('poly', PolynomialFeatures(degree=3)),
('linear', LinearRegression(fit_intercept=False))])
model.fit(X,…

Niyamat Ullah
- 2,384
- 1
- 16
- 26
3
votes
3 answers
Expected 2D array, got 1D array instead, Reshape Data
I'm really stuck on this problem. I'm trying to use OneHotEncoder to encode my data into a matrix after using LabelEncoder but getting this error: Expected 2D array, got 1D array instead.
At the end of the error message(included below) it said to…

wolfbagel
- 468
- 2
- 11
- 21
3
votes
2 answers
AttributeError: module 'sklearn.datasets' has no attribute 'load_titanic'
I am trying to load the file titanic and I face the following problem. My code is:
from sklearn import datasets
titanic = datasets.load_titanic()
I get the following:
AttributeError: module 'sklearn.datasets' has no attribute 'load_titanic'
While…

mar
- 43
- 1
- 3