12

We're learning B-trees in class and have been asked to implement them in code. The teacher has left choice of programming language to us and I want to try and do it in C#. My problem is that the following structure is illegal in C#,

unsafe struct BtreeNode
        {
            int key_num;        // The number of keys in a node
            int[] key;          // Array of keys
            bool leaf;          // Is it a leaf node or not?
            BtreeNode*[] c;     // Pointers to next nodes
        }

Specifically, one is not allowed to create a pointer to point to the structure itself. Is there some work-around or alternate approach I could use? I'm fairly certain that there MUST be a way to do this within the managed code, but I can't figure it out.

EDIT: Eric's answer pointed me in the right direction. Here's what I ended up using,

class BtreeNode
{
        public List<BtreeNode> children;       // The child nodes
        public static int MinDeg;               // The Minimum Degree of the tree
        public bool IsLeaf { get; set; }        // Is the current node a leaf or not?
        public List<int> key;                   // The list of keys 
...
}
Tim
  • 35,413
  • 11
  • 95
  • 121
chronodekar
  • 2,616
  • 6
  • 31
  • 36
  • 4
    Why do you want to use a struct instead of a class? – CodesInChaos Feb 03 '12 at 17:39
  • 1
    of course you can use C# for B trees – Adrian Feb 03 '12 at 17:41
  • 9
    Do not try to use unsafe code in C# until you are an expert; you will get it wrong and it will be painful and difficult. Rather, learn the safe way of doing things first; C# is designed so that the safe way of doing things is almost always easier than the unsafe way. – Eric Lippert Feb 03 '12 at 17:52
  • A BTree is intended to be file-based with nodes fitted to multiples of segments. So the struct/pointer approach could make sense. chronodekar: do you use a file or will it be a BTree in memory? – H H Feb 03 '12 at 17:56
  • 1
    @HenkHolterman This is merely an academic exercise to get a basic understanding of how B-trees function, and for that C# with classes and references are fine. If the OP was tasked with using a B-tree to implement an index for a database engine then I'd tell him to use pointers/structs, and to work in C++. – Servy Feb 03 '12 at 18:06
  • @HenkHolterman (Since you asked.) How do I know that it's an academic exercise? The OP stated so specifically. How do I know that C# classes are fine, this isn't absolute, but I highly doubt the teacher would have given them free choice of language if they wouldn't have accepted a solution given in any language. I'm also basing it on past experiences as a student; as well as all of the work that I've done with students. As for my recommendation, I think I am a suitable reference for what I would or would not recommend. – Servy Feb 03 '12 at 18:22
  • @CodeInChaos, there is no reason for me to be using structs in particular. I could very well move onto classes if they work for my purposes – chronodekar Feb 04 '12 at 00:50
  • 1
    @Henk Holterman, as Servy mentioned above, this IS an academic exercise and the intent is to get us familiar with the working of B-trees. Specifically, our teacher wants us to implement the text book algorithm to insert/delete/search for nodes. On a conceptual level, I understand (I think) what's written in the text, but implementing the same in C# has me stumped. If I can't figure this out, I WAS planning to move onto VC++, but that also means to study another language from scratch... :( – chronodekar Feb 04 '12 at 00:53

3 Answers3

29

Coincidentally I actually just did implement a btree in C#, for a personal project. It was fun. I built a btree of lexicographically ordered variable size (up to 64 byte) keys which presented a number of challenges, particularly around figuring out when a page of storage was too full or too empty.

My advice, having just done that, is to build an abstraction layer that captures just the btree algorithms in their most abstract form, as an abstract base class. Once I got all the btree rules captured in that form, I specialized the base class in several different ways: as a regular fixed-key-size 2-3 btree, as one of my fancy variable-size-key btrees, and so on.

To start with, under no circumstances should you be doing this with pointers. Unsafe code is seldom necessary and never easy. Only the most advanced C# programmers should be turning off the safety system; when you do that, you are taking responsibility for the type and memory safety of the program. If you're not willing to do that, leave the safety system turned on.

Second, there's no reason to make this a struct. Structs are copied by value in C#; a btree node is not a value.

Third, you don't need to keep the number of keys in a node; the array of keys knows how many keys are in it.

Fourth, I would use a List<T> rather than an array; they are more flexible.

Fifth, you need to decide whether the key lives in the node or in the parent. Either way can work; my preference is for the key to live in the node, because I see the key as being associated with the node.

Sixth, it is helpful to know whether a btree node is the root or not; you might consider having two bools, one "is this a leaf?" and one "is this the root?" Of course a btree with a single item in it has a single node that is both leaf and root.

Seventh, you are probably going to build this thing to be mutable; normally one does not make public mutable fields on a C# class. You might consider making them properties. Also, the list of children can be grown and shrunk, but its identity does not change, so make it referentially read-only:

So I would probably structure my basic node as:

class Node
{
    public int Key { get; set; }
    public bool IsRoot { get; set; }
    public bool IsLeaf { get; set; }
    private List<Node> children = new List<Node>();
    public List<Node> Children { get { return this.children; } }
}

Make sense?

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • 1
    Putting `struct` nodes into a single array backing the btree based collection could still be a good idea as performance optimization. But of course one would use indices instead of pointers in that case. Of course this question is mainly about learning how btrees work, so the much clearer code using classes is preferable here. – CodesInChaos Feb 03 '12 at 18:59
  • @Eric Lippert, Honestly? The idea of "Lists" are new to me. It's nearly time for me to go to class now, but I'll try out your suggestion later in the day and report back. Regarding your 3rd point - I keep the number of keys in the node just because that's how my text(Introduction to Algorithms by Cormen,Leiserson ..et al) shows things as. True, the array has that info as well, but I think my teacher would prefer it to be explicitly mentioned. – chronodekar Feb 04 '12 at 00:58
  • 8
    @chronodekar: Remember, the algorithms presented in CLR assume a very C-like approach to the world. In more modern languages there are higher level abstractions than arrays, and objects are far more self-describing. And also remember: **every redundancy in a data structure is not only a waste of memory, it is also a bug waiting to happen**. Fields that have to be exactly the same as other fields present an opportunity for them to get out of sync. – Eric Lippert Feb 04 '12 at 07:12
  • 1
    @chronodekar: you can have a property that will explicitly show the children count, but you don't have to store a separate value for it: `public int ChildrenCount { get { return children.Count; } }` - that way you won't have a redundancy in your data, but you will still have some redundancy in public interface of the `Node` class. – Dyppl Feb 06 '12 at 04:06
  • @Eric Lippert: UPDATE:I'm taking your advise and have removed the key_num field. Though, lists(or collections) are tricky. I *think* I've got my "BtreeSplitChild()" function done right, but until I finish everything, I can't really tell. Will update again in a few more days. – chronodekar Feb 08 '12 at 15:10
14

Use a class instead of a stuct. And throw out the pointers.

class BtreeNode
{
    int key_num;        // The number of keys in a node
    int[] key;          // Array of keys
    bool leaf;          // Is it a leaf node or not?
    BtreeNode[] c;      // Pointers to next nodes
}

When you declare a variable of a class type, it is implicitly a reference(very similar to a pointer in c) since every class is a reference type.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
8

All you need to realize that a pointer in C is "somewhat similar" to a reference in C#. (There are various differences, but for the purposes of this question you can concentrate on the similarities.) Both allow a level of indirection: the value isn't the data itself, it's a way of getting to the data.

The equivalent of the above would be something like:

class BtreeNode
{
    private int keyNumber;
    private int[] keys;
    private bool leaf;
    private BtreeNode[] subNodes;

    // Members (constructors etc)
}

(I don't remember much about B-trees, but if the "keys" array here corresponds to the "keyNumber" value of each subNode, you may not want the keys variable at all.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • just a note (although it's quite irrelevant to the question), having keys[] separately may allow less cache misses while lookup by key. The keys[] are likely to occupy a single(?, depends on the size) cache line, so much faster than the indirection of BtreeNode. Again, it's totally irrelevant to the OP's question. – bestsss Feb 03 '12 at 18:05
  • @bestsss: On the other hand, it means that there are more objects in total, so you may well end up with more cache misses at a higher level. I'd definitely implement it *without* the optimization first, and then benchmark it if performance were an issue. – Jon Skeet Feb 03 '12 at 18:11
  • Of course.. no keys[] to start. Such optimizations are mostly unnecessary anyways. It was pointing out that having explicit keys can be a performance boost. – bestsss Feb 03 '12 at 18:49
  • @Jon Skeet: yes, keyNumber DOES correspond to the number of keys. I'm leaving that data in as I feel my teacher (and non-.NET classmates) would understand my code better that way. – chronodekar Feb 04 '12 at 01:01