Converting Hugging Face Transformer Text Embeddings Back to Text

Asked Nov 06 '22 at 11:45

Active Nov 09 '22 at 06:26

Viewed 1,221 times

Is there a method for converting Hugging Face Transformer embeddings back to text?

Suppose that I have text embeddings created using Hugging Face's ClipTextModel using the following method:

import torch
from transformers import CLIPTokenizer, CLIPTextModel

class_list = [
    "i love going home and playing with my wife and kids",
    "i love going home",
    "playing with my wife and kids", 
    "family",
    "war",
    "writing",
]
    
model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
    
inputs = tokenizer(class_list, padding=True, return_tensors="pt")
outputs = model(**inputs)
hidden_state = outputs.last_hidden_state
embeddings = outputs.pooler_output

My embeddings are in the variable "embeddings". Questions:

Is it possible for me to convert my embeddings back to the input strings in "class_list"? To be precise: If I sent the embeddings to a person who had no foreknowledge of the list of original strings; what steps would they need to take to extract the list of the original strings?
If so, how can I do this?

edited Nov 09 '22 at 06:26

asked Nov 06 '22 at 11:45

john_mon

2

You may not get exact text back. [Here](https://huggingface.co/blog/how-to-generate) some strategies are discussed. – Azhar Khan Nov 06 '22 at 12:21
Tried that unsuccessfully. – john_mon Nov 07 '22 at 06:37

Converting Hugging Face Transformer Text Embeddings Back to Text

0 Answers0