0

I am working on a short sentence classification problem where I get the following information

Input Age of the person (1-100) Gender of the person (Male or Female) Content of the sentence

Output Label (Type of Content)

To model the sentences I'm using word2vec combined with tfidf. I would also like to add age and gender as features along with the sentence embedding to the classifier. What is the correct way to do this ? Since the embedding is an n-dimensional array and age,gender are scalars. I'm confused about how to add them and visualise the data.

chaithu
  • 509
  • 2
  • 7
  • 29

1 Answers1

0

Word embeddings, as n-dimensional vectors, are just n scalars.

So if for example you have 300-dimensional vectors derived from word vectors, then an age scalar (1-100), then a gender scalar (perhaps 0 or 1), you have 302 dimensions of data for your classifier.

See the sklearn FeatureUnion transformer for an example of concatenating such varied features together. (Some classifiers might perform better if such varied features are scaled to have more similar ranges/distributions.)

gojomo
  • 52,260
  • 14
  • 86
  • 115