I'm creating MinHash and LSH in Octave/Matlab. But I'm trying to get a set (cell array or array) of shingles with k size from a given document and I don't know how to do it.
What I have right now is this simple code:
doc = fopen(document);
i = 1;
while (! feof(doc) )
txt{i} = strread(fgetl(doc), '%s');
i++;
endwhile
fclose(doc);
This creates a cell array with all the words from each line of the document, which is an argument the function that I'm trying to do.