A first, naive, solution is simply to concatenate the embedding vector with a one hot encoded vector representing the POS tag.
If you want to do something fancier however, you should find a proper way for weighting these different features.
For example you could use XGboost: given a not-normalized set of features (embeddings + POS in your case) assign weights to each of them according to a specific task.
As an alternative, you can use neural networks for combining these features into a unique meaningful hidden representation.
Assuming that the context of each word is important in your task, you could do the following:
- compute word embeddings (N dimensional)
- compute pos (1 hot encoded vector)
- run a LSTM or a similar recurrent layer on the pos.
- for each word, create a representation consisting of its word embedding concatenated with its corresponding output from the LSTM layer.
- use a fully connected layer to create a consistent hidden representation.
P.S. note that the use of the recurrent layer is not mandatory, you could also try to concatenate pos and embedding directly and then apply the fully connected layer.