Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
-1
votes
3 answers

Python Sklearn linear regression not callable

I am implementing simple linear regression and multiple linear regression using pandas and sklearn My code is as follows import pandas as pd import numpy as np import scipy.stats from sklearn import linear_model from sklearn.metrics import…
-1
votes
1 answer

What does TruncatedSVD get_params([deep]) really do?

I don't understand the get_params([deep]) method available for TruncatedSVD in sklearn. Can some please explain it to me?
user77005
  • 1,769
  • 4
  • 18
  • 26
-1
votes
1 answer

Pandas - find the index satisfying conditions of each row

I tried to find the index which satisfy certain conditions in pandas DataFrame. For example, we have the following dataframe and find the index such that argmin(j) df['A'].iloc[j] >= (df['A'].iloc[i] + 3 ) for all i so the result will be given…
user155214
  • 115
  • 1
  • 5
-1
votes
1 answer

How to get per classification accuracy for a given data set using NaivebayesClassifier

I am very much new to machine learning. I have a problem to solve using supervised machine learning; Problem: Learn from the training data and understand the labels (I have got training data in .csv formet where column1 is data and column2 is…
-1
votes
1 answer

sklearn.neighbors.KNeighborsClassifier could not convert string to float

I am trying to clean my data in python using sklearn.neighbors.KNeighborsClassifier. In the fit function of classifier I have provide training data in the form of a DataTable generated by Pandas from a csv file. The fit function throws an error…
Zeshan Khan
  • 294
  • 2
  • 15
-1
votes
1 answer

Impute values of a vector using Cosine similarity in Python

The Scenario I have a Dataset whose last column has NaN values in it, which need to be imputed using only Vector Cosine & Pearson Correlation; after which the data will be further taken for Clustering. The Problem It is mandatory for my case to use…
T3J45
  • 717
  • 3
  • 12
  • 32
-1
votes
1 answer

Grouping arrays with common classes for classification in CNN

I have a data set with three columns,the first two columns are the features and the third column contain classes,there are 4 classes,part of it can be seen here. The data set is big,lets say 100,000 rows and 3 columns(two column features and one…
dm5
  • 350
  • 1
  • 6
  • 18
-1
votes
1 answer

error in calculating AUC ROC in python

I am implementing linear regression in python using sklearn. I have successfully trained model using linear_model.LinearRregression() function. Now, I want to measure goodnessoffit of the model using AUC ROC method. I am using following code for…
KrunalParmar
  • 1,062
  • 2
  • 18
  • 31
-1
votes
1 answer

Extending the column name in pandas DataFrame

I have a data frame which contains 34 rows and 10 columns. I called the data frame "comp" now I did "invcomp = 1/comp", So the values changed but column name will be same. I want to replace or rename my column names, suppose the earlier name of my…
Avanish Mishra
  • 163
  • 1
  • 1
  • 7
-1
votes
1 answer

Number of features of the model must match the input. Model n_features is 40 and input n_features is 38

i am getting this error.please give me any suggestion to resolve it.here is my code.i am taking traing data from train.csv and testing data from another file test.csv.i am new to machine learning so i could not understand what is the problem.give me…
-1
votes
1 answer

Pandas IOError: [Errno 13] Permission denied

I've been trying to run pandas using python 2.7 on a macbook pro and keep getting the following error: File "/Users/Hofstadter/anaconda/lib/python2.7/site-packages/pandas/io/common.py", line 376, in _get_handle f = open(path_or_buf,…
114
  • 876
  • 3
  • 25
  • 51
-1
votes
1 answer

DataFrameMapper scikit-learn ValueError: all the input array dimensions except for the concatenation axis must match exactly

I have been trying to use DataFrameMapper to add multiple pre-processing transformations on my dataframe into my scikit-learn Pipeline. url = "https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data" names = ['Sex', 'Length',…
Larissa Leite
  • 1,358
  • 3
  • 21
  • 36
-1
votes
1 answer

List still being treated as a set even after converting

So i have an instance where even after converting my sets to lists, they aren't recognized as lists. So the idea is to delete extra columns from a data frame comparing with columns in another. I have two data frames say df_test and df_train . I…
Kris
  • 21
  • 5
-1
votes
1 answer

How to get the top N frequent words in each cluster? Sklearn

I have a text corpus that contains 1000+ articles each in a separate line. I used Hierarchy Clustering using Sklearn in python to produce clusters of related articles. This is the code I used to do the clustering Note: X, is a sparse NumPy 2D array…
-1
votes
2 answers

Avoid collision in importing data in R

I faced an error trying to import a CSV into R which had multiple duplicate columns, is there a way I can ignore those columns? It's easy to do that in case of small files and small number of columns but mine is a big one ~3k columns and 10M rows.
Ayush
  • 479
  • 2
  • 9
  • 24