Is there a method for converting Hugging Face Transformer embeddings back to text?
Suppose that I have text embeddings created using Hugging Face's ClipTextModel using the following method:
import torch
from transformers import CLIPTokenizer, CLIPTextModel
class_list = [
"i love going home and playing with my wife and kids",
"i love going home",
"playing with my wife and kids",
"family",
"war",
"writing",
]
model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
inputs = tokenizer(class_list, padding=True, return_tensors="pt")
outputs = model(**inputs)
hidden_state = outputs.last_hidden_state
embeddings = outputs.pooler_output
My embeddings are in the variable "embeddings". Questions:
- Is it possible for me to convert my embeddings back to the input strings in "class_list"? To be precise: If I sent the embeddings to a person who had no foreknowledge of the list of original strings; what steps would they need to take to extract the list of the original strings?
- If so, how can I do this?