0

I'm implementing a splaytree to hold words and their frequencies and chose to create a Pair class that would hold each word-frequency (key-value) pair. That is, each node of the splaytree holds a pair of the Pair class. The Pair class looks like this:

public class SplayEntry<K, V> implements Comparable<SplayEntry<K, V>>{

public K word;
public V frequency;

public SplayEntry(K word, V frequency) {
    this.word = word;
    this.frequency = frequency;
}
getters, setters, hashCode, equals, compareTo etc...

The Splaytree:

public class SplayTree<AnyType extends Comparable<? super AnyType>> {

public SplayTree( )
{
    nullNode = new BinaryNode<AnyType>( null );
    nullNode.left = nullNode.right = nullNode;
    root = nullNode;
}

And has BinaryNode class.

What I'm having trouble with is how to, for every word and frequency pair put it into the tree and also check whether the pair already exists and if so up the frequency by one. I read in a text file line by line and split each line into words then do a countWords() method that right now is a mess:

    public void countWords(String line) {
    line = line.toLowerCase();
    String[] words = line.split("\\P{L}+");
    SplayEntry<String, Integer> entry = new SplayEntry<String, Integer>(null, null);
    for (int i = 0, n = words.length; i < n; i++) {
        Integer occurances = 0;
        entry.setWord(words[i]);
        entry.setFrequency(occurances);

        if (tree.contains(entry.equals(entry)) && entry.getFrequency() == 0) {
            occurances = 1;

        } else {
            int value = occurances.intValue();
            occurances = new Integer(value + 1);
            entry.setFrequency(occurances);
        }

        entry = new SplayEntry<String, Integer>(words[i], occurances);
        tree.insert(entry);
    }
}

I know this isn't really working and I need help in figuring out how I should instantiate the SplayEntry class and in what order? I also want the method to, for every word in the words array, check whether it exists in a SplayEntry which is inside the tree (contains) and if the word is a new word then the frequency will be 1, else, the frequency will be +1. finally I just add the new SplayEntry into the Splaytree and let that put it in an appropriate node.

Right now I've just confused myself by working on the same piece of code for way too many hours than should be necessary, I would very much appreciate some pointers that can lead me in the right direction!

Please tell me if I've not made myself clear.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Snorkelfarsan
  • 485
  • 1
  • 6
  • 11
  • Did you implement hashCode()? – JustinKSU Oct 12 '11 at 20:59
  • I would create a new entry at the start of the for loop (not before it). Looks like you might be updating the previous entry instead of creating a new one. – JustinKSU Oct 12 '11 at 21:01
  • I would certainly consider using an entry that is mutable and don't create a new `SplayEntry` each time. So you code would say something like `splay = tree.get(); if splay == null tree.insert(new SplayEntry...) else splat.add(value + 1) or so`. – Gray Oct 12 '11 at 21:11
  • Yes, there is a hashCode() function in SplayEntry. – Snorkelfarsan Oct 12 '11 at 22:12
  • @Gray The Splaytree only has a contains method (boolean), and findMin and find Max in terms of methods that can be used for searching. So I have to check the map for existing nodes that have the entry that I'm currently looking at. – Snorkelfarsan Oct 12 '11 at 23:20
  • Right. Might be worthwhile switching to a `SkipList` or some other structure. Or store in a `HashMap` and unload into a `Tree` only at the end for sorting. But you may have other reasons to use it. Best of luck. – Gray Oct 13 '11 at 01:27
  • Wouldn't that defeat the purpose of using a splaytree? If you put the word-frequency pair in a HashMap first only to unload it into a tree before "printing" out the result"? – Snorkelfarsan Oct 13 '11 at 02:52

1 Answers1

1

I suggest using a standard implementation of a splay tree, i.e. without the counters, and having a separate HashMap for frequencies. This does not sacrifice complexity, since operations on a splay tree are O(log n), while operations on a HashMap are O(1). To preserve encapsulation and invariants, you can put both within a larger class that exposes the required operations.

Rok Strniša
  • 6,781
  • 6
  • 41
  • 53
  • That's an interesting suggestion. for example, I would have an instance of the Splaytree take words in each node (will there be duplicates?), and then create an ordinary HashMap that would have pairs of s and their frequency(count)? I would then have to compare the keys of the hashmap with the element(words) within each node to find a match, right? – Snorkelfarsan Oct 12 '11 at 22:43
  • Actually, if your elements are hashable, you might as well use only the `HashMap`. I think you would only want to use a splay tree when your elements are not hashable, but are comparable. – Rok Strniša Oct 12 '11 at 23:14
  • I have to use a Splaytree for this implementation of the program. Already have a Map implementation. – Snorkelfarsan Oct 12 '11 at 23:41
  • Could you elaborate a bit on what you mean by: "To preserve encapsulation and invariants, you can put both within a larger class that exposes the required operations." And how is the complexity of a HashMap O(1)? – Snorkelfarsan Oct 13 '11 at 19:17
  • @Snorkelfarsan If you use two or more structures to represent your logical structure, you need to keep them in sync in order to preserve to logical relations of the data entered into that larger logical structure. For hashtable complexity, see [a stackoverflow post on the topic](http://stackoverflow.com/questions/3949217/time-complexity-of-hash-table). – Rok Strniša Oct 13 '11 at 19:56