Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
3
votes
2 answers

sparse matrix length is ambiguous

I'm very new to machine learning so this question might sound stupid. i'm following a tutorial on Text Classification but I'm facing an error that I don't have any idea about how to solve. This is the code I have (it is basically what it is found in…
fatmau
  • 87
  • 1
  • 8
3
votes
0 answers

TypeError: can only perform ops with scalar values

I would appreciate if you could let me know how to plot some informative charts for the table provided here. For example, I need a bar chart for the column named "Domestic unlisted companies:Use of IFRSs by unlisted companies" which shows in how…
ebrahimi
  • 912
  • 2
  • 13
  • 32
3
votes
1 answer

Pandas JSON_Normalize only specific columns

I have a nested JSON structure which I need to flatten. On using JSON normalize it flattens all the keys. But, I want to flatten specific keys while preserving the other keys nested. How to achieve this with JSON normalize. The detail description of…
Bhavani Ravi
  • 2,130
  • 3
  • 18
  • 41
3
votes
1 answer

ValueError: Number of features of the model must match the input (sklearn)

I am trying to run a classifier on some movie review data. The data had already been separated into reviews_train.txt and reviews_test.txt. I then loaded the data in and separated each into review and label (either positive (0) or negative (1)) and…
3
votes
1 answer

Extract DataFrame from a list of indices of another DataFrame

I've a DataFrame "A" and a list of indices "I". I want to generate/get a DataFrame "B" which contains only the data in those indices "I" of the original DataFrame "A". How can I achieve this? Assuming I = [1, 3] , I tried this A.filter(items=I,…
Temp O'rary
  • 5,366
  • 13
  • 49
  • 109
3
votes
3 answers

Sklearn_pandas in a pipeline returns TypeError: 'builtin_function_or_method' object is not iterable

I have a data set with categorical and numerical features on which I want to apply some transformations followed by XGBClassifier. Link to data set: https://www.kaggle.com/blastchar/telco-customer-churn As the transformations are different for the…
3
votes
1 answer

In sklearn.preprocessing module I get ValueError: Found array with 0 feature(s)

I saw a bunch of questions have this error but I could not understand the relation with my code or problem. I am trying to fix the NaN values in the data which I got from a sample CSV file that I found on the internet. My code is very simple…
teoman
  • 959
  • 2
  • 10
  • 21
3
votes
2 answers

Extract rule path of data point through decision tree with sklearn python

I'm using decision tree model and I want to extract the decision path for each data point in order to understand what caused the Y rather than to predict it. How can I do that? Couldn't find any documentation.
Adi Cohen
  • 31
  • 1
  • 3
3
votes
2 answers

Scikit learn GaussianProcessClassifier memory error when using fit() function

I have X_train and y_train as 2 numpy.ndarrays of size (32561, 108) and (32561,) respectively. I am receiving a memory error every time I call fit for my GaussianProcessClassifier. >>> import pandas as pd >>> import numpy as np >>> from…
yalpsid eman
  • 3,064
  • 6
  • 45
  • 71
3
votes
2 answers

decision tree repeating class names

I have a very simple sample of data/labels, the problem I'm having is that the decision tree generated (pdf) is repeating the class name: from sklearn import tree from sklearn.externals.six import StringIO import pydotplus features_names =…
Hula Hula
  • 553
  • 8
  • 20
3
votes
1 answer

Calculating accuracy scores of predicted continuous values

from sklearn.metrics import accuracy_score accuracy_score(y_true, y_pred) I believe this code will return the accuracy of our predictions. However, I am comparing predicted and actual values of continuous values and I believe that most of them are…
Aditya
  • 89
  • 1
  • 1
  • 6
3
votes
0 answers

Runtime crashes on Google Colab

Why does the runtime keep crashing on Google Colab. I have a simple MLP code that runs on my machine. I tried running the same code on Colab but it crashes immediately after loading the data files. The data files are around 3GB total. The CPU and…
3
votes
3 answers

How to make polynomial features using sparse matrix in Scikit-learn

I am using Scikit-learn for converting my train data to polynomials features and then fit it to a linear model. model = Pipeline([('poly', PolynomialFeatures(degree=3)), ('linear', LinearRegression(fit_intercept=False))]) model.fit(X,…
3
votes
3 answers

Expected 2D array, got 1D array instead, Reshape Data

I'm really stuck on this problem. I'm trying to use OneHotEncoder to encode my data into a matrix after using LabelEncoder but getting this error: Expected 2D array, got 1D array instead. At the end of the error message(included below) it said to…
wolfbagel
  • 468
  • 2
  • 11
  • 21
3
votes
2 answers

AttributeError: module 'sklearn.datasets' has no attribute 'load_titanic'

I am trying to load the file titanic and I face the following problem. My code is: from sklearn import datasets titanic = datasets.load_titanic() I get the following: AttributeError: module 'sklearn.datasets' has no attribute 'load_titanic' While…
mar
  • 43
  • 1
  • 3