Combining Word vectors and Scalar Features for classification

Question

I am working on a short sentence classification problem where I get the following information

Input Age of the person (1-100) Gender of the person (Male or Female) Content of the sentence

Output Label (Type of Content)

To model the sentences I'm using word2vec combined with tfidf. I would also like to add age and gender as features along with the sentence embedding to the classifier. What is the correct way to do this ? Since the embedding is an n-dimensional array and age,gender are scalars. I'm confused about how to add them and visualise the data.

Any luck finding a solution? I'm dealing with a similar problem. — Bee, Mar 12 '21 at 19:22

score 0 · Answer 1 · answered May 16 '17 at 05:36

Word embeddings, as n-dimensional vectors, are just n scalars.

So if for example you have 300-dimensional vectors derived from word vectors, then an age scalar (1-100), then a gender scalar (perhaps 0 or 1), you have 302 dimensions of data for your classifier.

See the sklearn FeatureUnion transformer for an example of concatenating such varied features together. (Some classifiers might perform better if such varied features are scaled to have more similar ranges/distributions.)

Combining Word vectors and Scalar Features for classification

1 Answers1