-2

I want to create a simple implementation of a B+-Tree in Java and i need some help. I want my program to implement these functions: search, insert, delete.

My questions:

  1. What is the best data structure to use to represent the Tree? I was thinking TreeMap.
  2. In a B+-Tree the data is stored olny in the leaf nodes (K,V) and in the inner nodes instead of data in every record there is a pointer to a child node (K,P). I would like a suggestion on how to point to an other node since i cant use pointers in java.

Also if you have any recommendations or you have in mind a simple implementation i could use as a reference, please tell me.

Thanks

mellamokb
  • 56,094
  • 12
  • 110
  • 136
Christos Baziotis
  • 5,845
  • 16
  • 59
  • 80

1 Answers1

7

The whole point of a B-tree (or any of the small variations that exist) is to store data on disk so that it can be read with a small number of disk accesses. If you're going to keep everything in memory, you should use a balanced binary search tree (maybe a red-black tree or splay tree), or even a vanilla BST, but neither of your questions seems to consider this fact.

  1. What is the best data structure to use to represent the Tree? I was thinking TreeMap.

TreeMap is an in-memory data structure, so it's unclear how this will help to represent an on-disk tree. Also, this implements a binary search tree for you, so you aren't really implementing the B-tree yourself if you use TreeMap.

  1. In a B+-Tree the data is stored olny in the leaf nodes (K,V) and in the inner nodes instead of data in every record there is a pointer to a child node (K,P). I would like a suggestion on how to point to an other node since i cant use pointers in java.

You don't need actual pointers to represent a B-tree, just file offsets. You'll need to define a way of representing these offsets (either the number of bytes or blocks from the beginning of the file, depending on how the rest of your implementation is structured) and access everything in terms of file offsets. In fact, you should not use standard C-style pointers to point between nodes in your B+-tree. If you did, those pointers would be meaningless the next time the program started, so you would lose the persistence benefit of an on-disk data structure.

To access the file contents cleanly, I recommend memory mapping. One useful method for creating a memory-mapped file object in Java is FileChannel.map. That method returns a MappedByteBuffer, which you can use to read a chunk of bytes at a particular file offset.

Adam Mihalcin
  • 14,242
  • 4
  • 36
  • 52
  • Firstly thanks for your answer! About the storage of the tree i was thinking using Serializable. Also about the data structure since the TreeMap is not what i want do you have any data structure to suggest or do you think it would be better if i created a Node class of my own? – Christos Baziotis Jan 07 '13 at 22:40
  • @sijoune How much do you care about performance? If you care, you probably shouldn't use `Serializable` or create a Node class of your own. You should instead view each block as an array of N-byte items (you define N), and directly manipulate each item. If you serialize a Java object, a lot of extra information gets saved along with the object data, and you don't know for sure how big each of the objects is on disk. If you don't even know the size of each object, you don't know how many you can fit in a page, and performance goes down the tubes. – Adam Mihalcin Jan 07 '13 at 22:44