Highest Voted 'dictvectorizer' Questions

4

votes

3 answers

AttributeError: 'Pipeline' object has no attribute 'partial_fit'

I am trying to train my binary classifier over a huge data. Previously, I could accomplish training via using fit method of sklearn. But now, I have more data and I cannot cope with them. I am trying to fitting them partially but couldn't get rid of…

asked May 10 '18 at 08:31

kntgu

184
1
2
14

4

votes

3 answers

How to encode categorical features in sklearn?

I have a dataset with 41 features [from 0 to 40 columns], of which 7 are categorical. This categorical set is divided in two subset: A subset of string type(the column-features 1, 2, 3) A subset of int type, in binary form 0 or 1 (the…

python scikit-learn categorical-data one-hot-encoding dictvectorizer

asked Nov 15 '16 at 19:11

Gil

111
1
7

4

votes

2 answers

How can I encode features with more than one value per column? MultiDictVectorizer needed?

I am vectorizing some features in sklearn, and I have run into a problem. DictVectorizer works well if your data can be encoded into one dict key per item. What if your items can have two or more values of the same column? For instance,…

python scikit-learn feature-extraction dictvectorizer one-hot-encoding

asked Feb 15 '16 at 23:46

rjurney

4,824
5
41
62

3

votes

1 answer

How to use Scikit Learn dictvectorizer to get encoded dataframe from dense dataframe in Python?

I have a dataframe as follows: user item affinity 0 1 13 0.1 1 2 11 0.4 2 3 14 0.9 3 4 12 1.0 From this I want to create an encoded dataset (for fastFM) as follows: user1 user2 user4 user4…

python pandas scikit-learn encode dictvectorizer

asked May 13 '16 at 06:17

exAres

4,806
16
53
95

1

vote

0 answers

how to solve model fitting shape error dictVectorization?

I'm working on a pos tagging problem and using LogisticRegressionCV model to solve it. I extracted features of words and vectorized them with DictVectorizer(). However, I'm getting an error while model is fitting. After model.fit part, the console…

python vectorization logistic-regression model-fitting dictvectorizer

asked Jan 09 '21 at 13:31

Nilay Yilmaz

11
2

1

vote

2 answers

Is it possible to create an equivalent "restrict" method for CountVectorizer as is available for DictVectorizer in Scikit-learn?

For DictVectorizer it is possible to subset the object by using the restrict() method. Here is an example where I have explicitly listed the features to retain by using a boolean array. import numpy as np v = DictVectorizer() D = [{'foo': 1,…

python scikit-learn feature-selection countvectorizer dictvectorizer

asked Dec 03 '19 at 20:55

Billy Franks

23
6

1

vote

1 answer

Python sklearn MultinomialNB: Dimension mismatch using DictVectorizer

I'm trying to do MultinomialNB. I got Value Error: dimension mismatch. I'm using DictVectorizer for the training data and LabelEncoder for the class. This is my code: def create_token(inpt): return inpt.split(' ') def tok_freq(inpt): tok =…

python-3.x scikit-learn valueerror dictvectorizer

asked Apr 24 '18 at 00:23

jted95

1,084
1
9
23

1

vote

0 answers

Method of vectors in various vector length to fixed length (NLP)

Recently I have been looking around about Natural Language Processing and its vectorization method and advantages of each vectorizer. I am into character to vectorize, but it seems like the most concerns about the character vectorizer for each word…

nlp word2vec word-embedding dictvectorizer

asked Apr 17 '18 at 02:47

Isaac Sim

539
1
7
23

1

vote

5 answers

using DictVectorizer to convert strings

satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years dept salary 0.38 0.53 2 157 3 0 1 0 TECHNICAL low 0.8 0.86 5 262 6 0 1 0 …

python pandas machine-learning scikit-learn dictvectorizer

asked Nov 16 '17 at 16:22

Vineeth

11
1
3

1

vote

1 answer

Converting string data to float before passing to SVM classifier

I have a dataset as follows: X_data = BankNum | ID | 00987772 | AB123 | 00987772 | AB123 | 00987772 | AB123 | 00987772 | ED245 | 00982123 | GH564 | And another one as: y_data = ID | Labels AB123 | High ED245 | Low GH564 | Low I'm…

python scikit-learn svm prediction dictvectorizer

asked Sep 12 '17 at 23:06

Xavier

227
1
3
11

1

vote

1 answer

Why would DictVectorizer change the number of features?

I have a dataset of 324 rows and 35 columns. I split it into training and testing data: X_train, X_test, y_train, y_test = train_test_split(tempCSV[feaure_names[0:34]], tempCSV[feaure_names[34]], test_size=0.2, random_state=32) This seems to…

python scikit-learn categorical-data dictvectorizer

asked Apr 12 '17 at 23:38

Nicholas Hassan

949
2
10
27

1

vote

0 answers

Different results when using pd.get_dummies() and DictVectorizer() with categorical variables

I have a problem when i try to use categorical variables in pipeline. pd.get_dummies() is a terrific tool but we can not use it right in pipeline. So I had to use DictVectorizer(). I do it as below (toy example) import numpy as np import pandas as…

python-3.x pipeline dummy-variable dictvectorizer

asked Jan 21 '17 at 16:01

Edward

4,443
16
46
81

1

vote

1 answer

Categorical variables in pipeline: dimension mismatch

I try to build a pipeline with categorical variables import numpy as np import pandas as pd import sklearn from sklearn.base import BaseEstimator, TransformerMixin from sklearn import linear_model from sklearn.pipeline import Pipeline df =…

python pipeline categorical-data dictvectorizer

asked Oct 25 '16 at 20:29

Edward

4,443
16
46
81

1

vote

1 answer

ngram vectorization - if new token found which not exists in corpus, what should I do with it

I'm building custom ngram vectorizer for bag of word model. I'm qurious - what should I do if during vectorizing of a short text I found new token, which not exists in corpus vocabulary. Should it be just skipped or what?

nlp vectorization dictvectorizer

asked Oct 20 '16 at 13:38

Ph0en1x

9,943
8
48
97

1

vote

1 answer

Categorical variables in sklearn pipeline with DictVectorizer

I want to apply a pipeline with numeric & categorical variables as below import numpy as np import pandas as pd from sklearn import linear_model, pipeline, preprocessing from sklearn.feature_extraction import DictVectorizer df =…

python pipeline categorical-data dictvectorizer

asked Oct 17 '16 at 20:21

Edward

4,443
16
46
81

Questions tagged [dictvectorizer]