7

Is there a method for converting Hugging Face Transformer embeddings back to text?

Suppose that I have text embeddings created using Hugging Face's ClipTextModel using the following method:

import torch
from transformers import CLIPTokenizer, CLIPTextModel

class_list = [
    "i love going home and playing with my wife and kids",
    "i love going home",
    "playing with my wife and kids", 
    "family",
    "war",
    "writing",
]
    
model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
    
inputs = tokenizer(class_list, padding=True, return_tensors="pt")
outputs = model(**inputs)
hidden_state = outputs.last_hidden_state
embeddings = outputs.pooler_output

My embeddings are in the variable "embeddings". Questions:

  1. Is it possible for me to convert my embeddings back to the input strings in "class_list"? To be precise: If I sent the embeddings to a person who had no foreknowledge of the list of original strings; what steps would they need to take to extract the list of the original strings?
  2. If so, how can I do this?
john_mon
  • 487
  • 1
  • 3
  • 13

0 Answers0