I've been wondering about this too. It's not really clear to me what they're doing, but this is what I found.
In the paper on wide and deep learning, they describe the embedding vectors as being randomly initialized and then adjusted during training to minimize error.
Normally when you do embeddings, you take some arbitrary vector representation of the data (such as one-hot vectors) and then multiply it by a matrix that represents the embedding. This matrix can be found by PCA or while training by something like t-SNE or word2vec.
The actual code for the embedding_column is here, and it's implemented as a class called _EmbeddingColumn which is a subclass of _FeatureColumn. It stores the embedding matrix inside its sparse_id_column attribute. Then, the method to_dnn_input_layer applies this embedding matrix to produce the embeddings for the next layer.
def to_dnn_input_layer(self,
input_tensor,
weight_collections=None,
trainable=True):
output, embedding_weights = _create_embedding_lookup(
input_tensor=self.sparse_id_column.id_tensor(input_tensor),
weight_tensor=self.sparse_id_column.weight_tensor(input_tensor),
vocab_size=self.length,
dimension=self.dimension,
weight_collections=_add_variable_collection(weight_collections),
initializer=self.initializer,
combiner=self.combiner,
trainable=trainable)
So as far as I can see, it seems like the embeddings are formed by applying whatever learning rule you're using (gradient descent, etc.) to the embedding matrix.