4

I was just following along an example given in a book to illustrate the Python shelve module on macOS High Sierra.

As shown below only two small tuples of short strings get stored in a shelf. And as you can see in the very last line, the resulting file is 16 Megabyte large.

The resulting file only gets that large when I try the example on macOS High Sierra with the Python version installed through Homebrew (either 3.6.4 or 2.7.14). If I run it on a Linux host or with the pre-installed Python version (2.7.10) or with Python 3.6.4 installed through the official installer in macOS, the resulting addresses file is just a few Kilobyte large, just as reported by others in the comments (thanks!).

 ~/tmp> rm addresses
 ~/tmp> python3
Python 3.6.4 (default, Jan  6 2018, 18:43:09)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
[...]
>>> import shelve
>>> book = shelve.open("addresses")
>>> book['flintstone'] = ('fred', '555-1234', '1233 Bedrock Place')
>>> book['rubble'] = ('barney', '555-4321', '1235 Bedrock Place')
>>> book.close()
>>>
 ~/tmp> ll
total 32768
-rw-r--r--  1 moritz  staff    16M Jan 24 13:05 addresses
anothernode
  • 5,100
  • 13
  • 43
  • 62
  • It's only 16 **kB** on my system. Also Python 3.6.4, but on Arch Linux. It looks like `shelve` can use one of several implementations, which might account for the difference. – Thomas Jan 24 '18 at 12:43
  • Good to know, must be something with my setup, then! Strange... – anothernode Jan 24 '18 at 12:45
  • 1
    I bet your addresses shelve file contains more than just that. If I try to reproduce your problem, I get three files (bak, dat, dir), with about 650 bytes altogether. – L3viathan Jan 24 '18 at 12:46
  • Did you remove any file starting with `addresses` prior to executing the above code? – L3viathan Jan 24 '18 at 12:47
  • 1
    @L3viathan: Yes, I did, you can see it in the code I posted, the very first line says `rm addresses`. I just tried the exact same example on a Linux host and the resulting file is just 13K! – anothernode Jan 24 '18 at 12:48
  • What flags is `ll` using? – Emil Vikström Jan 24 '18 at 13:14
  • "Just 13K" is still pretty unbelievable. In my recent Python 3.5.3 on Windows 10, I also get a file as little as 650 bytes. (Most of it is zeroes; more than 400, between Fred and Barney.) – Jongware Jan 24 '18 at 13:16
  • @EmilVikström: `ls -lh`. @usr2564301: Well, yeah, I don't know, I'd still be happy with 13K instead of 16M... :) – anothernode Jan 24 '18 at 13:23
  • 16 kB on my system. Python 3.6.1 on macOS. – Emil Vikström Jan 24 '18 at 15:53
  • @EmilVikström: How did you install Python? I tried again with Python 2.7.14 installed through Homebrew, and the resulting file is also 16M large. Looks like it's something specific about the Homebrew version! – anothernode Jan 24 '18 at 16:13
  • Yup, it's definitely something about the Homebrew version. I installed Python 3.6.4 through the official installer and with that the shelf file also only gets to 16K. – anothernode Jan 24 '18 at 16:22

1 Answers1

6

I could confirm this behavior is introduced by gdbm 1.14, gdbm is the library used by shelve to access database file.

With change 2e8a5e0, gdbm will try to extend file size to match next_block_size. next_block_size is calculated by 4 * block_size, which is the optimal I/O block size of underlying filesystem, obtained by stat.st_blksize returned by stat(2). On my macOS 10.13.3, a file on APFS on SSD volume, stat.st_blksize is 4194304 bytes, next_block_size is 16777216 bytes, therefore the init db file size is 16MB.

ps: I examined an HFS+ fs on an HDD volume at my hand, st_blksize value is 4096 bytes.

georgexsh
  • 15,984
  • 2
  • 37
  • 62