1

I have huge vector of words, and I want a vector with the unique words only, and the frequency for each word. I've already tried hist and histc but they are for numeric value. I know the function tabulate but it gives the words some ' (e.g this turns to 'this'). If you have any idea how to do it MATLAB it would be great. thanks

ironzionlion
  • 832
  • 1
  • 7
  • 28
Yaeli778
  • 215
  • 3
  • 12
  • For the 1st part have you tried 'unique'? It works with cell arrays of strings – Benoit_11 Dec 15 '14 at 16:11
  • Yes, unique works. but, what is the next step? I thought about loop which count every word, but I guess there is a better way. – Yaeli778 Dec 15 '14 at 16:19

1 Answers1

6

You were on the right track! Just use unique first to prepare the numeric input for hist. The trick is that the word occurence ids returned by unique can be used as input for the hist function, so you can get the counts without explicit for loops:

words = {'abba' 'bed' 'carrot' 'damage' 'bed'};
[unique_words, ~, occurrences] = unique(words);
unique_counts = hist(occurrences, 1:max(occurrences));

This yields:

>> unique_words 
    'abba'    'bed'    'carrot'    'damage'

>> unique_counts
     1     2     1     1
ojdo
  • 8,280
  • 5
  • 37
  • 60
  • I've tried that, but I get this error: Error using histc Edge vector must be monotonically non-decreasing. Error in hist (line 119) nn = histc(y,edgesc,1); – Yaeli778 Dec 15 '14 at 16:28
  • Apparently not in the same way: I don't use `histc`, and the `word_ids` are - depending on your MATLAB version - already the monotonically increasing range from 1 to number of unique words. – ojdo Dec 15 '14 at 16:30
  • 1
    @Yaeli778 `histc(id_positions_in_word_vector, 1:max(id_positions_in_word_vector));` might do it. Source - http://stackoverflow.com/a/27281656/3293881 – Divakar Dec 15 '14 at 16:34