I have built a b+tree index of my own, with all the operations for insert/delete/search over the index. To accelerate the insertion of a huge dataset, I would like to implement bulk-loading as well to be able to experiment with large datasets.
What I have been trying to do is to sort the data and start filling the pages at the leaf level. keys are copied or pushed at the upper levels once necessary. I always keep track of the frontier of the index at various heights. For example, if my index is of height 3 (root, one level containing internal nodes and the leaves level), I only keep 3 pages in the memory and once they get full, or there is no more data, I write them to the disk.
The problem is how much data to write to each page to maintain the page limits of all individual nodes. These limits can be found here. I could not find any useful resource that has details on implementation of bulk loading or a good strategy for deciding what fill ratio to use in order to guarantee node limits.
Any ideas?