Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
-1
votes
3 answers
Python Sklearn linear regression not callable
I am implementing simple linear regression and multiple linear regression using pandas and sklearn
My code is as follows
import pandas as pd
import numpy as np
import scipy.stats
from sklearn import linear_model
from sklearn.metrics import…

fashioncoder
- 79
- 2
- 8
-1
votes
1 answer
What does TruncatedSVD get_params([deep]) really do?
I don't understand the get_params([deep]) method available for TruncatedSVD in sklearn. Can some please explain it to me?

user77005
- 1,769
- 4
- 18
- 26
-1
votes
1 answer
Pandas - find the index satisfying conditions of each row
I tried to find the index which satisfy certain conditions in pandas DataFrame.
For example, we have the following dataframe
and find the index such that
argmin(j) df['A'].iloc[j] >= (df['A'].iloc[i] + 3 ) for all i
so the result will be given…

user155214
- 115
- 1
- 5
-1
votes
1 answer
How to get per classification accuracy for a given data set using NaivebayesClassifier
I am very much new to machine learning. I have a problem to solve using supervised machine learning;
Problem: Learn from the training data and understand the labels (I have got training data in .csv formet where column1 is data and column2 is…

Abhinaw Kaushik
- 607
- 6
- 18
-1
votes
1 answer
sklearn.neighbors.KNeighborsClassifier could not convert string to float
I am trying to clean my data in python using sklearn.neighbors.KNeighborsClassifier. In the fit function of classifier I have provide training data in the form of a DataTable generated by Pandas from a csv file.
The fit function throws an error…

Zeshan Khan
- 294
- 2
- 15
-1
votes
1 answer
Impute values of a vector using Cosine similarity in Python
The Scenario
I have a Dataset whose last column has NaN values in it, which need to be imputed using only Vector Cosine & Pearson Correlation; after which the data will be further taken for Clustering.
The Problem
It is mandatory for my case to use…

T3J45
- 717
- 3
- 12
- 32
-1
votes
1 answer
Grouping arrays with common classes for classification in CNN
I have a data set with three columns,the first two columns are the features and the third column contain classes,there are 4 classes,part of it can be seen here.
The data set is big,lets say 100,000 rows and 3 columns(two column features and one…

dm5
- 350
- 1
- 6
- 18
-1
votes
1 answer
error in calculating AUC ROC in python
I am implementing linear regression in python using sklearn.
I have successfully trained model using linear_model.LinearRregression() function.
Now, I want to measure goodnessoffit of the model using AUC ROC method.
I am using following code for…

KrunalParmar
- 1,062
- 2
- 18
- 31
-1
votes
1 answer
Extending the column name in pandas DataFrame
I have a data frame which contains 34 rows and 10 columns. I called the data frame "comp" now I did "invcomp = 1/comp", So the values changed but column name will be same. I want to replace or rename my column names, suppose the earlier name of my…

Avanish Mishra
- 163
- 1
- 1
- 7
-1
votes
1 answer
Number of features of the model must match the input. Model n_features is 40 and input n_features is 38
i am getting this error.please give me any suggestion to resolve it.here is my code.i am taking traing data from train.csv and testing data from another file test.csv.i am new to machine learning so i could not understand what is the problem.give me…

Shiv
- 105
- 6
-1
votes
1 answer
Pandas IOError: [Errno 13] Permission denied
I've been trying to run pandas using python 2.7 on a macbook pro and keep getting the following error:
File "/Users/Hofstadter/anaconda/lib/python2.7/site-packages/pandas/io/common.py", line 376, in _get_handle
f = open(path_or_buf,…

114
- 876
- 3
- 25
- 51
-1
votes
1 answer
DataFrameMapper scikit-learn ValueError: all the input array dimensions except for the concatenation axis must match exactly
I have been trying to use DataFrameMapper to add multiple pre-processing transformations on my dataframe into my scikit-learn Pipeline.
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data"
names = ['Sex', 'Length',…

Larissa Leite
- 1,358
- 3
- 21
- 36
-1
votes
1 answer
List still being treated as a set even after converting
So i have an instance where even after converting my sets to lists, they aren't recognized as lists.
So the idea is to delete extra columns from a data frame comparing with columns in another. I have two data frames say df_test and df_train . I…

Kris
- 21
- 5
-1
votes
1 answer
How to get the top N frequent words in each cluster? Sklearn
I have a text corpus that contains 1000+ articles each in a separate line. I used Hierarchy Clustering using Sklearn in python to produce clusters of related articles. This is the code I used to do the clustering
Note: X, is a sparse NumPy 2D array…

user6872853
- 53
- 2
- 7
-1
votes
2 answers
Avoid collision in importing data in R
I faced an error trying to import a CSV into R which had multiple duplicate columns, is there a way I can ignore those columns?
It's easy to do that in case of small files and small number of columns but mine is a big one ~3k columns and 10M rows.

Ayush
- 479
- 2
- 9
- 24