2

First of all, I am new to programming, so I would expect simple and well explained answers. Secondly this is a very specific question and I don't want moderators and other users to just close this question as off-topic or for being too broad.

Anyway I want to implement Huffman coding in java using some kind of a data structure. But ,however, I was thinking of using splay tree as it's something that will not be covered in my course's syllabus and also since I want to learn a new data structure. Now the main question is if the Huffman coding algorithm would require splay tree data structure in the first place?

What can I use splay tree for in my Huffman based data compression project? Or would you rather suggest a better(for it's efficiency and maybe creativity in the context that it's unique and not so heard of) data structure for this project?

Thanks

  • 2
    A splay tree is a particular type of binary tree with self-balancing properties (specifically, accessed elements are rotated to the root). For Huffman coding, I think you want to use a more general binary tree, as you'll be figuring out the structure based on the frequency of symbols in a string as opposed to access frequencies. However, splay trees are often used in *online* compression, as you can learn about here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.137.1924&rep=rep1&type=pdf – Ryan Marcus Mar 09 '16 at 23:55
  • Thanks for your comment and thanks for that PDF. –  Mar 10 '16 at 02:05

1 Answers1

1

Any Huffman code can be represented by the structure of a binary tree, whose leaves are the symbols to be encoded. When following a path from the root to the symbol to be encoded, left and right branches can be represented as 0 or 1 bits; the result is a correct prefix code, with code lengths specified by the depth of the symbols.

Ideally, you would use the structure of the splay tree directly, to determine the Huffman code for each symbol. However, splay trees maintain their data in the nodes, not the leaves. You will either need to find some way to use a splay tree based on data in the leaves, or come up with a transformation that computes a valid (and efficient) set of prefix codes from node locations instead.

One possibility is to maintain the leftmost and rightmost leaf of each subtree in its root node (to be updated as the tree is splayed, of course). This should allow you to search for leaves, even though you don't actually care about your node data as such. Conventional splaying operations should then naturally generate a dynamic Huffman code biased towards recently occurring symbols.

comingstorm
  • 25,557
  • 3
  • 43
  • 67
  • Your answer was exactly what I was looking for. However what do you mean by your last sentence? "Conventional splaying operations should then naturally generate a dynamic Huffman code biased towards recently occurring symbols." Huffman Coding generates codes for frequently occurring symbols, and splay trees are useful in case we need to access recently accessed elements quickly. So how would I combine these both? I mean why would one need such a functionality in Huffman coding, the functionality of accessing recently occurring symbols? –  Mar 11 '16 at 08:08
  • Well that exactly is my question and I want to know if there is any advantage of using a splay tree here and also is there a way I can use the splay tree's ability for some extra benefit in Huffman coding?- Thanks –  Mar 11 '16 at 08:09
  • Different kinds of inputs have different symbol frequencies. If you don't know in advance what kind of input you have, you will have to adapt. Splay trees move frequently used nodes towards the root for faster access; in terms of a Huffman tree, this translates into shorter bit codes, which is presumably what you want... – comingstorm Mar 11 '16 at 08:16
  • Also, different sections of the same file can have different statistics; fast adaptivity within a file may be able take better advantage of that. – comingstorm Mar 11 '16 at 08:18
  • Correct me if I am wrong, but wouldn't splay trees move recently accessed nodes towards the root rather than moving frequently used nodes? I mean that's what I read on Wikipedia etc. –  Mar 11 '16 at 08:20
  • Sure, and my point is that they're generally the same thing. If you haven't accessed a node recently, it's not currently very frequent, is it? – comingstorm Mar 11 '16 at 17:04
  • Oh now I understand what you meant. I will try to implement it and see if I can do it. Thanks for your answer. –  Mar 12 '16 at 08:53