7

I've been playing around with database programming lately, and I noticed something a little bit alarming.

I took a binary flat file saved in a proprietary, non-compressed format that holds several different types of records, built schemas to represent the same records, and uploaded the data into a Firebird database. The original flat file was about 7 MB. The database is over 70 MB!

I can understand that there's some overhead to describe the tables themselves, and I've got a few minimal indices (mostly PKs) and FKs on various tables, and all that is going to take up some space, but a factor of 10 just seems a little bit ridiculous. Does anyone have any ideas as to what could be bloating up this database so badly, and how I could bring the size down?

skaffman
  • 398,947
  • 96
  • 818
  • 769
Mason Wheeler
  • 82,511
  • 50
  • 270
  • 477

3 Answers3

1

Firebird fill pages in some factor not full. e.g. db page can contain 70% of data and 30% free space to speed up future record updates, deletes without moving to new db page.

CONFIGREVISIONSTORE (213)
    Primary pointer page: 572, Index root page: 573
    Data pages: 2122, data page slots: 2122, average fill: 82%
    Fill distribution:
         0 - 19% = 1
        20 - 39% = 0
        40 - 59% = 0
        60 - 79% = 79
        80 - 99% = 2042

The same is for indexes.

You can see how really db size is when you do backup and restore with option

-USE_ALL_SPACE

then database will be restored without this space preservation. You must know also that not only pages with data are allocated but also some pages are preallocated (empty) for future fast use without expensive disc allocation and fragmentation.

as "Peter G." say - database is much more then flat file and is optimized to speed up thinks.

and as "Harriv" say - you can get details about database file with gstat

use command like gstat - here are details about its output

Livius
  • 958
  • 1
  • 6
  • 19
1

From Firebird FAQ:

Many users wonder why they don't get their disk space back when they delete a lot of records from database.

The reason is that it is an expensive operation, it would require a lot of disk writes and memory - just like doing refragmentation of hard disk partition. The parts of database (pages) that were used by such data are marked as empty and Firebird will reuse them next time it needs to write new data.

If disk space is critical for you, you can get the space back by doing backup and then restore. Since you're doing the backup to restore right away, it's wise to use the "inhibit garbage collection" or "don't use garbage collection" switch (-G in isql), which will make backup go A LOT FASTER. Garbage collection is used to clean up your database, and as it is a maintenance task, it's often done together with backup (as backup has to go throught entire database anyway). However, you're soon going to ditch that database file, and there's no need to clean it up.

Noel
  • 600
  • 16
  • 37
  • 1
    Umm... this has nothing to do with my question. I'm not backing up and restoring anything, and I'm not deleting data from the database. I'm simply creating a database, creating tables, and filling them from data that originally came from a flat file. – Mason Wheeler May 11 '11 at 20:02
  • 1
    @Mason Wheeler - I think the point is that it **doesn't** free up the memory (ie. hard drive space) when it does an expensive operation. It just flags unused sectors to be overwritten later. So, while your DB file size is large, it may be because not all of that space is actually in use. – Richard Marskell - Drackir May 11 '11 at 20:49
  • 1
    Mason Wheeler: If you backup and then restore, is the database still over 70 MB? – Harriv May 11 '11 at 20:50
1

Gstat is the tool to examine table sizes etc, maybe it will give you some hints what's using space.

In addition, you may also have multiple snapshots or other garbage in database file, it depends on how you add data to the database. The database file never shrinks automatically, but backup/restore cycle gets rid of junk and empty space.

Harriv
  • 6,029
  • 6
  • 44
  • 76
  • Sounds interesting. Where do I obtain it? The manual seems to suggest that it should come with Firebird, but it's not in the zipfile I got from Sourceforge, and Googling for "gstat firebird download" turns up nothing useful. – Mason Wheeler May 11 '11 at 21:27
  • gbak , gstat etc are inside the C:\Program Files\Firebird\Firebird_2_5\bin folder. – Yiannis Mpourkelis May 12 '11 at 06:40
  • Yes, Gstat is included in the bin directory. If you're running Windows, you may want to use the installer, it will put everything in right place. Some tools (like DBWorkbench from Upscene) provide more user friendly interface to statistics. – Harriv May 12 '11 at 12:45