3

I have a server which is used for scientific computation, each user has a virtual machine (Linux or Windows) for their usage. the problem is reports generated by these computations take a huge amount of storage when many users use this server.

I wanna know is there any way to compress the outputs, not after the report is completed but when it is to be written on the disk

update1: We use Vsphere as the hypervisor and H.D.D for storage

Yashar
  • 151
  • 4

2 Answers2

3

You could inline compress and/or deduplicate your storage. There are several ways to pull this off - some easier and some more effective.

To start, Linux and UNIX systems can use ZFS - a filesystem and volume manager that supports both compression and deduplication at the block level. Any shared or local storage system built on top of this can use these features, so something as simple as NFS on top of ZFS can do what you want with a shared pool across all VMs.

Linux can use btrfs, which is a multi-device filesystem that supports inline deduplication and compression. Same ideas as above. Btrfs has fewer hardware requirements than ZFS, but inline deduplication is still pretty intensive (as in you would be best served doing it across a larger shared dataset using either filesystem). Something to keep in mind about btrfs is that it is a filesystem first, and a volume manager second. It also does not offer block based abstractions like ZFS does, so it is purely a file based system.

There are several NAS/SAN offerings that include this functionality. Using one for shared storage would make good use of deduplication and compression while being a canned and supported product. FreeNAS is an example of one such system, which can use ZFS. Synology NAS devices also can and often do use btrfs.

Spooler
  • 7,046
  • 18
  • 29
1

I agree with @SmallLoanOf1M, but just another idea you could do is actually compress the reports within the VM - this way you don't put the load on the underlying hypervisor (i.e Xen Dom0) and put it within the CPU limits of the VM.

If you can pipe the data, you could always do something like:

report-generator-program | lz4 > report-file.lz4

Or use gzip if you want slower but better compression, or xz if you want to melt your CPU.

Brennen Smith
  • 1,742
  • 8
  • 11
  • 1
    I have found `brotli` recently to be excellent, compressing (at level q1) about 4 times faster than gzip at the same compression ratio without totally cooking a turkey-on-die. q0 is melt your CPU territory again. – Spooler Mar 02 '18 at 22:09