2

Is there any way to pull the count from a Multiset into a list?

String[] data = loadStrings("data/data.txt"); 

Multiset<String> myMultiset = ImmutableMultiset.copyOf(data);

for (String word : Multisets.copyHighestCountFirst(myMultiset).elementSet()) {
    System.out.println(word + ": " + myMultiset.count(word));
    // ...
}

As it stands I can output the most commonly occurring words into the console in Processing. I was wondering if it is at all possible to add the corresponding words and their count into an array or a list. I have tried like so:

for (String word : Multisets.copyHighestCountFirst(myMultiset).elementSet()) {
    float a[] = myMultiset.count(word);
}

but only received errors stating I cannot convert an int to a float[]

Is this even possible? Am I going about it all wrong? I've never used Multisets before so any help would be really useful

UPDATE: I have used this to get a copy of the highest count but am unable to convert it into a list.

Multiset<String> sortedList = Multisets.copyHighestCountFirst(myMultiset);
Nic
  • 6,211
  • 10
  • 46
  • 69
Nebbyyy
  • 358
  • 4
  • 20

1 Answers1

3

Please see Multiset.entrySet() docs:

Returns a view of the contents of this multiset, grouped into Multiset.Entry instances, each providing an element of the multiset and the count of that element.

So, i.e. to get the top 5 most occurring owrds, I'd loop over the entrySet():

ImmutableMultiset<String> top = Multisets.copyHighestCountFirst(myMultiset);

Iterator<Multiset.Entry<String>> it = top.entrySet().iterator();

for (int i = 0; (i < 5) && it.hasNext(); i++) {
    Multiset.Entry<String> entry = it.next();

    String word = entry.getElement();
    int count = entry.getCount();

    // do something fancy with word and count...
}

I'm assuming you need to show the top 5 most occurring words and their frequencies. If you only need the words, just use asList() method:

ImmutableMultiset<String> top = Multisets.copyHighestCountFirst(myMultiset);

ImmutableList<String> list = top.asList();

and iterate over list to get the first 5 elements.

fps
  • 33,623
  • 8
  • 55
  • 110
  • 2
    Wow thanks for taking the time :), you think its possibly easier to just use a map? – Nebbyyy Mar 26 '15 at 01:58
  • Main reason for doing this is i thought it would be easy to represent the values in a bar chart in processing this way... – Nebbyyy Mar 26 '15 at 02:01
  • @Nebbyyy I think you could perfectly use a map for that. Just use a `LinkedHashMap`, because it preserves insertion order. This way, you'll have the top word first, then the second topmost word, etc. Just use the keys (which hold the words) for the X axis and the values (which holds the frequencies) for the Y axis. I'll edit my answer so that it's clear that a `LinkedHashMap` should be used. – fps Mar 26 '15 at 02:12
  • There's nothing a map would do that you couldn't just use the `Multiset` for directly. If you want to preserve insertion order, use a `LinkedHashMultiset` -- or just keep using the multiset returned by `copyHighestCountFirst` without copying it. Copying it seems honestly pointless. – Louis Wasserman Mar 26 '15 at 03:58
  • @LouisWasserman How do you get the collection of the frequencies? – fps Mar 26 '15 at 04:35
  • 1
    There's no view for that collection directly, but what do you actually want to do with it? e.g. you could always iterate over the `entrySet`, and you can easily get the total count, etc. and everything else you need for the actual histogram that way. – Louis Wasserman Mar 26 '15 at 04:54
  • What im trying to do is display the top 5 most occurring words in a bar chart when pulling the data from twitter into a .txt file – Nebbyyy Mar 26 '15 at 11:55
  • @Nebbyyy Iterate de `ImmutableMultiSet` only 5 times with a classic `for(int i = 0; i < 5; i++)` – fps Mar 26 '15 at 12:19
  • ive had to redo it as ive been getting syntax error with the code. at the for loop an error on token"<", – Nebbyyy Mar 26 '15 at 12:30
  • @Nebbyyy Fixed. It's `Multiset` (instead of `MultiSet`) (the 's' from set must be lowercase). – fps Mar 26 '15 at 12:34
  • oh my god you life saver, can't believe i missed that, you are a true gent – Nebbyyy Mar 26 '15 at 12:42
  • @Nebbyyy It was my fault! Your code was OK from the beginning. I just didn't try the code I posted (my bad!), so I had that syntax error. Hope everything works well now that it's fixed. – fps Mar 26 '15 at 12:44
  • @Nebbyyy Which version of java are you using? – fps Mar 26 '15 at 12:50
  • I'm still receiving the error I just checked online and im on version 8 Update 40, and using processing 2.2.1. Do you think it might be an issue because of my java version as the error is simply with Multiset.Entry> using the "<" – Nebbyyy Mar 26 '15 at 13:02
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/73847/discussion-between-magnamag-and-nebbyyy). – fps Mar 26 '15 at 13:03