39

What Data Structure is best to use for file organization? Are B-Trees the best or is there another data structure which obtains faster access to files and good organization? Thanks

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Bernice
  • 2,552
  • 11
  • 42
  • 74
  • 1
    I'm a fan of using databases to store the information. I believe most DB's use a b-structure. Is there a specific task you're trying to accomplish ? – kevingreen Jan 02 '13 at 17:43
  • I'm just curious which data structure is used by OS's for file organization since I'm learning data structures and I implemented a few of them: Red Black Trees, AVL trees, B-Trees, Skip Lists.. I would like to know which of them I can use for a more useful task (not storing numbers) – Bernice Jan 02 '13 at 17:47
  • I'm not specifically sure how most OS's store the data. Good luck on the research. – kevingreen Jan 02 '13 at 17:59
  • 1
    Red Black Tree, AVL trees, and all the other are in-memory data structures. They are not a good fit for persistent (on-disk) data structures that are more important for file systems. Also, I'm not sure if you know that, but a B-Tree and a binary tree are totally different pairs of shoes. Just to clarify. – dmeister Jan 04 '13 at 16:39
  • Thanks for the info @dmeister. Yes you are right RBT, AVL's etc are used in memory. That's why I needed to research about persistent data structures. Of course there's a huge difference between a B-Tree and binary tree! – Bernice Jan 05 '13 at 20:52

1 Answers1

55

All file systems are different, so there are a huge number of data structures that actually get used in file systems.

Many file systems use some sort of bit vector (usually referred to as a bitmap) to track where certain free blocks are, since they have excellent performance for querying whether a specific block of disk is in use and (for disks that aren't overwhelmingly full) support reasonably fast lookups of free blocks.

Many older file systems (ext and ext2) stored directory structures using simple linked lists. Apparently this was actually fast enough for most applications, though some types of applications that used lots of large directories suffered noticeable performance hits.

The XFS file system was famous for using B+-trees for just about everything, including directory structures and its journaling system. From what I remember from my undergrad OS course, the philosophy was that since it took so long to write, debug, and performance tune the implementation of the B+-tree, it made sense to use it as much as possible.

Other file systems (ext3 and ext4) use a variant of the B-tree called the HTree that I'm not very familiar with. Apparently it uses some sort of hashing scheme to keep the branching factor high so that very few disk accesses are used.

I have heard anecdotally that some operating systems tried using splay trees to store their directory structures but ran into trouble with them. Specifically, it prevented multithreaded access to the same directory from multiple readers (since in a splay tree, each access reshapes the tree) and encountered an edge case where the tree would degenerate to a linked list if all elements of the tree were accesses sequentially. That said, I don't know if this is just an urban legend, since these problems would have been apparent before anyone tried to code them up.

Microsoft's FAT32 system used a huge array (the file allocation table) that store what files were stored where and which disk sectors follow one another logically in a file. The main drawback is that the table had to be set up in advance, so there ended up being upper limits on the sizes of files that could be stored on the disk. However, the array-based system was pretty easy to implement.

This is not an exhaustive list - I'm sure that other file systems use other data structures. However, I hope it helps give you a push in the right direction.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • 2
    Very useful post thank you! I will research about bit vectors then, and do some more research about other OS's.. I heard that splay trees were troubling! I am most familiar with B-Trees but I look forward to learning other data structures which will serve useful for this kind of stuff! Thanks for your long answer :) – Bernice Jan 02 '13 at 18:11