0

I've been working on performing Latent Semantic Analysis using the SparseVectorsFromSequenceFiles, RowIdJob and RowSimilarityJob Hadoop jobs provided by Mahout, which run Map/Reduce jobs. I've been trying to find an equivalent implementation for these functionality that runs in memory, either in a single thread, or preferably in multiple threads.

Is there such a thing?

Julian Ortega
  • 947
  • 4
  • 11

1 Answers1

2

I don't know, don't think so, but it would be trivial to write. You just open a SequenceFile.Reader and for each record, get the Vector from the value Writable and do what you want It's probably 10 lines of code and not worth a tool.

Sean Owen
  • 66,182
  • 23
  • 141
  • 173