I have huge vector of words, and I want a vector with the unique words only, and the frequency for each word. I've already tried hist
and histc
but they are for numeric value.
I know the function tabulate
but it gives the words some ' (e.g this turns to 'this').
If you have any idea how to do it MATLAB it would be great. thanks
Asked
Active
Viewed 542 times
1

ironzionlion
- 832
- 1
- 7
- 28

Yaeli778
- 215
- 3
- 12
-
For the 1st part have you tried 'unique'? It works with cell arrays of strings – Benoit_11 Dec 15 '14 at 16:11
-
Yes, unique works. but, what is the next step? I thought about loop which count every word, but I guess there is a better way. – Yaeli778 Dec 15 '14 at 16:19
1 Answers
6
You were on the right track! Just use unique
first to prepare the numeric input for hist
. The trick is that the word occurence ids returned by unique
can be used as input for the hist
function, so you can get the counts without explicit for
loops:
words = {'abba' 'bed' 'carrot' 'damage' 'bed'};
[unique_words, ~, occurrences] = unique(words);
unique_counts = hist(occurrences, 1:max(occurrences));
This yields:
>> unique_words
'abba' 'bed' 'carrot' 'damage'
>> unique_counts
1 2 1 1

ojdo
- 8,280
- 5
- 37
- 60
-
I've tried that, but I get this error: Error using histc Edge vector must be monotonically non-decreasing. Error in hist (line 119) nn = histc(y,edgesc,1); – Yaeli778 Dec 15 '14 at 16:28
-
Apparently not in the same way: I don't use `histc`, and the `word_ids` are - depending on your MATLAB version - already the monotonically increasing range from 1 to number of unique words. – ojdo Dec 15 '14 at 16:30
-
1@Yaeli778 `histc(id_positions_in_word_vector, 1:max(id_positions_in_word_vector));` might do it. Source - http://stackoverflow.com/a/27281656/3293881 – Divakar Dec 15 '14 at 16:34