2

I'm new of C++ and I'm making a huffman tree written in c++, and I'm in trouble in generating tree structure. Here is my code:

    void Huffman::generateTree() {
        std::vector<Node> nodes;
        for (auto itr : freq) {
            nodes.push_back(*(new Node(itr.first, itr.second)));
        }
        while (nodes.size() > 1) {
            Node::sort(&nodes);
            Node leftNode = nodes.back();
            nodes.pop_back();
            Node rightNode = nodes.back();
            nodes.pop_back();
            Node newAnode = *new Node();
            newAnode.merge(&leftNode,&rightNode);
            nodes.push_back(newAnode);
        }
        root = &nodes.back();
        displayTree();
    }

It finally combined to one node, but the left node and right node is wrong: (For debug purpose, every node have a random ID when created so that I found problem.)

NODE freq26|id96
|-Left:
   NODE freq10|id17
   |-Left:
   |--ERROR:SELF freq10|id17
   |-Right:
      NODE freq16|id19
      |-Left:
         NODE freq10|id17
         |-Left:
         |--ERROR:SELF freq10|id17
         |-Right:
            NODE freq16|id19
            |-Left:
               NODE freq10|id17
               |-Left:
               |--ERROR:SELF freq10|id17
               |-Right:
                  NODE freq16|id19
...endless loop

Just like the output, start from the first subnode, every nodes' left child node is it self and right node is another node but only two switching each other, and finally looped.

I searched and read several post about this and I know it maybe caused by pop_back and push_back elements to vector, the pointer may be point to another element, but how I can resolved it? Is there any way to get the REAL element in vector that not affected by operating vector?

Here are some code of my Node:

//header
    class Node {
    private:
        char value;
        int frequency;
        Node *left;
        Node *right;
        String code;
...
    class Huffman{
    private:
        Node * root;
        std::map<char,int> freq;// calculated frequency of data
        bool generated;
//source
    bool Node::sortMethod::operator()(const Node &nodeA, const Node &nodeB) {
        return nodeA.getFrequency() > nodeB.getFrequency();//getFrequency(): simply return node's frequency int
    }
    void Node::sort(std::vector<Node> *nodes) {
        std::sort(nodes->begin(), nodes->end(), sortMethod());
    }
    Node *Node::merge(Node *pLeft, Node *pRight) {
        left = pLeft;
        right = pRight;
        frequency = left->getFrequency() + right->getFrequency();
        return this;
    }

Everything helpful is welcome, thanks!

  • 4
    `nodes.push_back(*(new Node(itr.first, itr.second)));` this is an immediate memory leak. – Fantastic Mr Fox Jun 08 '21 at 07:20
  • I'm sorry but why it is a memory leak? –  Jun 08 '21 at 07:27
  • 1
    Storing pointers you acquire with `&` for later use is usually a disaster. – molbdnilo Jun 08 '21 at 07:30
  • 1
    @CKylinMark `newAnode.merge(&leftNode,&rightNode);` -- You are passing pointers to temporaries. Every iteration of that loop, they go away. – PaulMcKenzie Jun 08 '21 at 07:31
  • 2
    @CKylinMark -- It is a memory leak since there is no way to get that pointer value back again so as to make the call to `delete` later on to clean up the memory. It looks like you may have written code previously in another language, maybe Java or C#, where usage of `new` is the way to create objects and the garbage collector cleans up for you. Needless to say, C++ isn't one of those languages. – PaulMcKenzie Jun 08 '21 at 07:32
  • `for (auto itr : freq) {nodes.push_back({itr.first, itr.second});}` -- That gets rid of that one leak. – PaulMcKenzie Jun 08 '21 at 07:38
  • @PaulMcKenzie Yes, I rewrote this from Java code... Pointers is hard to understand for me so my code looks so stupid... Thanks for comments! –  Jun 08 '21 at 07:47
  • 3
    @CKylinMark Do not use Java as a model in writing C++ code. There is much wrong with what you've written, you might as well start from scratch. First is the memory leak. The second is that you're using addresses of local variables that will go out of scope on each loop iteration. Third is that you're holding onto the address of items stored in a vector -- if that vector is resized, those addresses become invalidated. Four, you are assuming that C++ is a reference-based language when it is value based: `Node rightNode = nodes.back();` -- that does not give you a reference to the last item. – PaulMcKenzie Jun 08 '21 at 07:50
  • Bottom line is that you can't simply copy Java code and convert it line-by-line to C++. Again, this: `Node rightNode = nodes.back();` -- is one thing that burns a lot of Java programmers who do not know that C++ is a value-based language. If you want a reference to the last item, you have to explicitly declare a reference: `Node& rightNode = nodes.back();`. But then, you have problems later by storing the reference to the last item, and then resizing the vector, invalidating the reference. Again, so much is wrong, might as well start over, but this time, properly learn C++. – PaulMcKenzie Jun 08 '21 at 07:54
  • @PaulMcKenzie Well...I must say you are right...If I have time, I really want to learn C++ seriously... In fact I never touched C++ before and I start to learn it just for about 2 week and for submitting as homework...and really thank to your help for the comments above. I think I need totally rewrite my project, because of the ton of the code have the same problem I shown in my question. And if you have more tips for that I'll be appreciate. –  Jun 08 '21 at 08:08

1 Answers1

1

As the comments suggest, you need to stop thinking that C++ is similar to Java, it really isn't.

Objects in C++ have explicit lifetimes, and the language doesn't stop you from holding onto a reference or pointer to an object after it has ceased to exist.

If you want something to outlive the call it was created in, it needs to be dynamically allocated, and something should track it's lifetime. The default is std::unique_ptr, which deletes what it points to when the pointer is destroyed. You can transfer ownership by moving the unique_ptr.

Unfortunately, you can't usefully have a std::priority_queue of std::unique_ptr as you can't move the top, and pop doesn't return the value. Otherwise I would suggest using that rather than re-implementing it.

I don't think you have to sort your nodes each loop, as the merged node will always have the greatest frequency, so they go at the back.

class Node;
// Aliases for types we will use
using NodePtr = std::unique_ptr<Node>;
using Nodes = std::vector<NodePtr>;

class Node {
private:
    char value;
    int frequency;
    NodePtr left;
    NodePtr right;
    std::string code;

    Node(char value, int frequency) : value(value), frequency(frequency) /*, code?*/{}
    Node(NodePtr left, NodePtr right) : /*value? ,*/ frequency(left->frequency + right->frequency), left(std::move(left)), right(std::move(right)) /*, code?*/{}
... 
};

// N.b. not members of Node
bool compareNodePtrs(const NodePtr & left, const NodePtr & right) {
    return left->getFrequency() > right->getFrequency();
}

void sortNodes(Nodes & nodes) {
    std::sort(nodes.begin(), nodes.end(), compareNodePtrs);
}

void Huffman::generateTree() {
    Nodes nodes;
    for (auto itr : freq) {
        nodes.emplace_back(std::make_unique<Node>(itr.first, itr.second));
    }
    sortNodes(nodes);
    while (nodes.size() > 1) {
        auto left = std::move(nodes.back());
        nodes.pop_back();
        auto right = std::move(nodes.back());
        nodes.pop_back();            
        nodes.emplace_back(std::move(left), std::move(right));
    }
    root = std::move(nodes.back());
    displayTree();
}
Caleth
  • 52,200
  • 2
  • 44
  • 75
  • Thanks for your answer! It teaches me a lot! –  Jun 09 '21 at 07:16
  • I'm confused about what's the difference of `sort` belong to `Node` or not? When should I define it as a member of one class? –  Jun 09 '21 at 07:25
  • 1
    @CKylinMark it only needs to be a member of `Node` if it uses private details of `Node`. Java forces every function into *some* class, C++ has no such restriction – Caleth Jun 09 '21 at 07:30
  • OK, I understood. Thanks! –  Jun 09 '21 at 09:34