0

What is the best way to convert a long string to a data structure with words and counts.

I would do .split(" ") to split on spaces and presumably make an arraylist, then maybe go through arraylist and add each item to a hashmap or multiset? I'm not sure what the best way to do this is/if it can be done directly with some sort of hashmap without making an arraylist first.

Thanks!

LemonMan
  • 2,963
  • 8
  • 24
  • 34
  • Can you give an example of what your data looks like ? – Hunter McMillen Jun 28 '13 at 19:14
  • just lots and lots of strings, something like "word bla bla word another word /moren+!*&^@#random words"? – LemonMan Jun 28 '13 at 19:15
  • How do you define 'best'? as fast as possible, use as little memory as possible or easy to read and understand the code? – Simon Forsberg Jun 28 '13 at 19:15
  • 1
    well i'm not too concerned about it being optimal any acceptable way is good? I'm just not sure what people usually do for such a common task. likely i'll make arraylist, then go through arraylist and add each item to hashmap with a count? then merge multiple hashmaps for different strings – LemonMan Jun 28 '13 at 19:17

2 Answers2

3

If you're referring to a Guava Multiset, this is just the one line

HashMultiset.create(
  Splitter.on(CharMatcher.WHITESPACE).omitEmptyStrings()
    .split(string));
Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
  • interesting, this looks quite simple, though I may go with the other case to avoid a situation where someone using my code won't have guava – LemonMan Jun 28 '13 at 19:26
1
import java.util.HashMap;
import java.util.Map;

public class Test {
    private static Map<String, Integer> count = new HashMap<String, Integer>();

    public static void main(String[] args) {
        addToCountMap("This is my test string and it contains Test and test and string and some more");
        addToCountMap("This is my test string and it contains Test and test and string and some more");
        addToCountMap("This is my test string and it contains Test and test and string and some more");
        addToCountMap("This is my test string and it contains Test and test and string and some more");
        addToCountMap("This is my test string and it contains Test and test and string and some more");

        mergeWithCountMap(count);

        System.out.println(count);
    }

    private static void addToCountMap(String test) {
        String[] split = test.split(" ");
        for (String string : split) {
            if (!count.containsKey(string)) {
                count.put(string, 0);
            }
            count.put(string, count.get(string) + 1);
        }
    }

    private static void mergeWithCountMap(Map<String, Integer> mapToMerge) {
        for (String string : mapToMerge.keySet()) {
            if (!count.containsKey(string)) {
                count.put(string, 0);
            }
            count.put(string, count.get(string) + mapToMerge.get(string));
        }
    }
}
Davey Chu
  • 2,174
  • 2
  • 14
  • 24