Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
0
votes
1 answer

Inverse Transform Predicted Results

I have a training data CSV with three columns (two for data and a third for targets) and I successfully predicted the target column for my test CSV. The problem is I need to inverse transform the results back to strings for further analysis. Below…
0
votes
1 answer

Visualize Sparse Input from SKlearn Kmeans with MatplotLib

from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.cluster import KMeans cc_tfid = TfidfVectorizer().fit_transform(cc_corpus) cc_km = KMeans(n_clusters = 3, init = 'k-means++', max_iter = 99, n_init = 4, verbose = False…
SarahJessica
  • 473
  • 1
  • 7
  • 18
0
votes
0 answers

Same number of outliers in LOF

I am running lof algorithm for around 100k 2d points. Each time, I run the lof algorithm with different n_neighbours parameter, I get the same number of points as outliers. It's always 10% of the points as outliers. Is this how this algorithm is…
0
votes
0 answers

my from sklearn.decomposition import PCA have an error

I tried using "from sklearn.decomposition import PCA" on windows python 2.7 to my program, but the result was an error and it said like this: Traceback (most recent call last): File "", line 1, in from sklearn.decomposition…
Nomad
  • 1
  • 3
0
votes
1 answer

How to match and merge the pandas Dataframe with the list?

I have a simple pandas data frame and list which is as fallows import pandas as pd frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']}) mylist =['cat blue', 'sky green', 'dog black'] how to find the match…
0
votes
0 answers

T-SNE memory error

I running tsne on a dataset which has 314k records. I took one column from the dataset which is text column and converted into bag of words. When I am running it is giving me the memory error. Could anyone help how to solve it? from sklearn.manifold…
merklexy
  • 69
  • 2
  • 7
0
votes
1 answer

sklearn.model_selection fails to load DLL

I'm trying to work through a tensorflow example which utilises sklearn and keep getting a DLL load error. I've cut down the code to the bare minimum in order to debug: import sklearn print(sklearn.__version__) from…
0
votes
1 answer

Getting dimension mismatch error when i try to predict with naive bayes / Python

I've created a sentiment script and use Naive Bayes to classify the reviews. I trained and tested my model and saved it in a Pickle object. Now I would like to perform on a new dataset my prediction but I always get following error message raise…
Nika
  • 145
  • 1
  • 13
0
votes
2 answers

How to find and add frequency column for ID?

I am a beginner at python, so bear with me! My dataset is from excel and I was curious how to find and add a frequency column for my ID. I first performed the groupby function for ID and date by doing: dfcount = dfxyz.groupby(["ID", "Date"]) and…
0
votes
1 answer

Confused about sklearn’s implementation of OSVM

I have recently started experimenting with OneClassSVM ( using Sklearn ) for unsupervised learning and I followed this example . I apologize for the silly questions But I’m a bit confused about two things : Should I train my svm on both…
0
votes
2 answers

Text field concatenation in sklearn pipeline

I have a multi line json dataset that contains multiple fields that can or cannot exists and can contain textual data in either string, list of strings or more complicated mapping (list of dicts) eg.: {"yvalue":1.0,"field1":"Some text",…
Tom Lous
  • 2,819
  • 2
  • 25
  • 46
0
votes
1 answer

UnicodeDecodeError in Python Classification Arabic Datasets

I have Arabic datasets for classification using Python; two directories (negative and positive) in a Twitter directory. I want to use Python classes to classify the data. When I run the attached code, this error occurs: > File…
0
votes
1 answer

create training validation split using sklearn

I have a training set consisting of X and Y, The X is of shape (4000,32,1) and Y is of shape (4000,1). I would like to create a training/validation set based on split. Here is what I have been trying to do from sklearn.model_selection import…
user785099
  • 5,323
  • 10
  • 44
  • 62
0
votes
1 answer

Write custom transformer in sklearn which returns .predict of estimator in .transform

We have a custom transformer class EstimatorTransformer(base.BaseEstimator, base.TransformerMixin): def __init__(self, estimator): self.estimator = estimator def fit(self, X, y): self = self.estimator.fit(X,y) …
Rudrani Angira
  • 956
  • 2
  • 14
  • 28
0
votes
1 answer

Naive Bayes classifier - empty vocabulary

I am trying to use Naive Bayes to detect humor in texts. I have this code taken from here but I have some errors and I don't know how to resolve them because I am pretty new to Machine Learning and these algorithms. My train data contains…
Mr. Wizard
  • 1,093
  • 1
  • 12
  • 19