I am following this example from Spark documentation for calculating the TF-IDF for a bunch of documents. Spark uses the hashing trick for this calculations so at the end you get a Vector containing the hashed words and the corresponding weight but... How can I get back the words from the hash?
Do I really have to hash all the words and save them in a map for later iterate through it looking for the keywords? There is no more efficient way built-in Spark?
Thanks in advance