For some models, the vocabulary is serialized within the SavedModel protocol buffer (like for USE and ELMo) so one has to manually find it within the SavedModel and extract it (I've used logic to extract the vocab from USE from here):
import tensorflow_hub as hub
from tensorflow.python.saved_model.loader_impl import parse_saved_model
# This caches the model at `model_path`.
hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
model_path = '/tmp/tfhub_modules/063d866c06683311b44b4992fd46003be952409c/'
saved_model = parse_saved_model(model_path)
# The location of the tensor holding the vocab is model-specific.
graph = saved_model.meta_graphs[0].graph_def
function_ = graph.library.function
embedding_node = function_[5].node_def[1] # Node name is "Embedding_words".
words_tensor = embedding_node.attr.get("value").tensor
word_list = [s.decode('utf-8') for s in words_tensor.string_val]
word_list[100:105] # ['best', ',▁but', 'no', 'any', 'more']
For other models like google/Wiki-words-500/2, we're more lucky since the vocab has been exported to the assets/
directory:
hub.load("https://tfhub.dev/google/Wiki-words-500/2")
!head /tmp/tfhub_modules/bf115a5fe517f019bebae05b433eaeee6415f5bf/assets/tokens.txt -n 40000 | tail
# Antisense
# Antiseptic
# Antiseptics