Why are tape compression capacities always double that of the native?
5 Answers
It's just how tape drive marketing has shaped up over the years. The compression algorithms are shown at 2:1 compression (which is, frankly, a lot more than the 15 - 30% compression ratios that I usually see on real data).
I reckon from the native capacity, and anything more capacity that I can get is "bonus", unless I've got a very clear idea of the nature of the data. Given the possibility that you're going to run into pre-compressed data in a lot of environments (ZIP, JPG, AVI, MPG, MP3, etc) unless you're sure you've got uncompressed data to be backed-up I'd recommend not planning on anything more than the native capacity.
I describe the 2:1 compression world, to my Customers, as a "jolly, happy fantasy land where everyone stores only long runs of the letter 'A'".
Edit:
Don't forget that Office 2007 documents are already ZIP files (in disguise). If you're backing up a lot of those, don't assume you're going to get any compression.
Edit:
It's counter-intuitive, but you may find that you also get a speed increase on over-the-wire backups going with "Software compression" in your backup software, versus hardware compression. Backup Exec (at least version 11 and newer) does its "Software compression" at the remote host when performing an over-the-wire backup. If you're backing-up data that can be highly compressed you may be able to trade-off host CPU for network bandwidth and end up with a higher throughput. Obviously, this is one of the things that you need to "tune" and monitor, but generally speaking you should be monitoring your throughput anyway (since some tape drives show throughput degredation caused by read/write head damage or tape wear, both of which can be used as an ad hoc predictive indicator for failure).

- 141,881
- 20
- 196
- 331
-
2Or Database export files. Those compress nicely. But for normal user data, which constitutes the vast majority of my total data, 1:1.5 is doing VERY well. – sysadmin1138 Nov 12 '09 at 19:30
-
+1 sysadmin1138. I consistently get 1.5:1 with very little deviation. – Nic Nov 13 '09 at 05:48
Because it sounds more impressive than "somewhere between nothing and about (holds hands out) this much". Same reason hard drive manufacturers quote their drive sizes in millions of bytes -- marketing.
Less cynically, compression ratios are never fixed; they're always dependent on the data being compressed. Thus, if you want to quote any sort of "compressed capacity", you need to estimate. Presumably, based on an exhaustive survey of compression patterns in archived data (there's that cynicism again) one tape manufacturer or another worked out that a 2:1 ratio was somewhere in the right area, and just left it at that.
Personally, I would have been happier had they just stuck with "this is how many bits you can write to this tape", and left compression to someone else, but "monkey see, monkey do" applies to hardware manufacturers as much as four year olds, so once one manufacturer started writing "You can put 200MB (mumblecompressedmumble) on this tape!", they all had to follow suit...

- 96,255
- 29
- 175
- 230
-
1The "K = 1,000, M = 1,000,000, G = ..." thing is not marketting, contrary to popular belief. It is convenient for that purpose, but that isn't why they do it. Network speeds and physical storage have just always been quoted in SI units. The 1024 based measurements came later as a convenience for programmers and certain varieties of chip engineer. – David Spillett Nov 12 '09 at 19:17
-
-
Compression embedded in the tape drive isn't necessarily bad, especially because it doesn't impact host system CPU usage. Tape backup management software could do a better job of reporting on the compression factor achieved, though. I think we're pretty safe to say that compression technology isn't going to regress, so if you know how well your data compresses it's perfectly fine to calculate the number of necessary medias, etc, based on the compressed size. *Finding* that compressed size, though, can be challenging. – Evan Anderson Nov 12 '09 at 19:46
-
@David: Like Bill, I'd like an authoritative citation for those claims, as I distinctly remember when disk capacities were quoted in Mibibytes (gawd what a nasty word), and the stink when manufacturers started fiddling the numbers. Also compare the size of "old-skool" SCSI drives versus their "metric" equivalent sizes... – womble Nov 12 '09 at 19:52
-
@Evan: Being able to rely on a certain compression ratio really only works if you're always compressing the same data. A dream that will never be fulfilled where I work. – womble Nov 12 '09 at 19:52
-
@Bill This is a reasonable reference on the topic http://www.iec.ch/zone/si/si_bytes.htm . – Helvick Nov 12 '09 at 20:03
-
I've actually found that roughly double is generally what I've actually got, but again it depends on the data. – Maximus Minimus Nov 13 '09 at 00:03
The vendors are just assuming a compression ratio of 2:1, it's one of those things that should be a reasonable assumption and it looks better than just quoting the raw capacity but YMMV.

- 20,019
- 4
- 38
- 55
Not all tapes claim a 2:1 ratio. A Sony AIT-2 Turbo tape we have lying around claims 80BG/208GB.
According to the media info in Backup Exec we are currently getting between 1.9:1 and 2.1:1. While that may sound unusual, in my previous role we were getting between 1.8:1 and 2:1. That was in a completely different industry and therefore had a different file mix and using Arcserve. Clearly the claimed ration is achievable but it really does depend on what you're backing up. If you have a significant percentage of compressed files (e.g. ZIP, RAR, JPG, etc.) you may even lose capacity by having compression enabled, due to the overheads.

- 27,458
- 12
- 55
- 109
I think it's just that all the vendors agreed to just use 2:1 instead of trying to justify "our compression is 1.8:1 vs. 1.6:1 for the other guys." We use 1.5:1 when estimating how many tapes we need for something.

- 12,899
- 28
- 46
- 59