13

I have a virtual machine (Debian) running on a physical machine host. The virtual machine acts as a buffer for data that it frequently receives over the local network (the period for this data is 0.5s, so a fairly high throughput). Any data received is stored on the virtual machine and repeatedly forwarded to an external server over UDP. Once the external server acknowledges (over UDP) that it has received a data packet, the original data is deleted from the virtual machine and not sent to the external server again. The internet connection that connects the VM and the external server is unreliable, meaning it could be down for days at a time.

The physical machine that hosts the VM gets its power cut several times per day at random. There is no way to tell when this is about to happen and it is not possible to add a UPS, a battery, or a similar solution to the system.

Originally, the data was stored on a file-based HSQLDB database on the virtual machine. However, the frequent power cuts eventually cause the database script file to become corrupted (not at the file system level, i.e. it is readable, but HSQLDB can't make sense of it), which leads to my question:

How should data be stored in an environment where power cuts can and do happen frequently?

One option I can think of is using flat files, saving each packet of data as a file on the file system. This way if a file is corrupted due to loss of power, it can be ignored and the rest of the data remains intact. This poses a few issues however, mainly related to the amount of data likely being stored on the virtual machine. At 0.5s between each piece of data, 1,728,000 files will be generated in 10 days. This at least means using a file system with an increased number of inodes to store this data (the current file system setup ran out of inodes at ~250,000 messages and 30% disk space used). Also, it is hard (not impossible) to manage.

Are there any other options? Are there database engines that run on Debian that would not get corrupted by power cuts? Also, what file system should be used for this? ext3 is what is used at the moment.

The software that runs on the virtual machine is written using Java 6, so hopefully the solution would not be incompatible.

Sevas
  • 249
  • 1
  • 6
  • 15
    "The physical machine that hosts the VM gets its power cut several times per day at random. There is no way to tell when this is about to happen and it is not possible to add a UPS, a battery, or a similar solution to the system." I **really** want to know how that's possible. Is it in the International Space Station so it requires $20 million to send a UPS up or something? – ceejayoz Nov 07 '12 at 17:23
  • 3
    Does the machine at least have a RAID controller with battery backed cache? – Zoredache Nov 07 '12 at 17:40
  • 4
    We could recommend very interesting, creative and perhaps theoretically correct solutions to this problem. *However*, we don't know what hypervisor and hardware is running on the host, so there would be no guarantee that disk writes are really written, or written in the correct order… – pino42 Nov 07 '12 at 19:39
  • 3
    There are 50+ installations with this configuration live and modifying them at the moment is not an option. Besides, adding a battery or UPS to the mix would introduce an additional maintenance cost in the form of periodic battery checks in addition to the material and labour costs involved in such a modification. I fully agree that trying to solve this using a software approach is NOT the right way to do it, but it seems that it is the only option at the moment. The goal is to minimize the possibility of inevitable corruption having an impact on the software if possible. – Sevas Nov 07 '12 at 19:43
  • 1
    Non-constructive comment incoming: I'd say you must be working for the US Military (or other affiliated 'company') but your profile states you're in Dublin. – PenguinCoder Nov 07 '12 at 20:35
  • 1
    If you ever have to deal with flat files, XFS filesystem will be happy to manage zillions of files with acceptable performance – PPC Nov 07 '12 at 20:50
  • 5
    @Sevas Sounds like it's not your call, but I'd suggest that it's worthwhile to point out that 50 basic, cheap UPSs would cost $2500, and don't need maintenance (you replace them after a couple years when the batteries start to go). The cost of trying to solve this in software is going to be much higher than that, unless you know a bunch of coders who work for free. Might be helpful to getting management to solve this for $50/unit, instead of dozens or hundreds of skilled man-hours @ 3-figures an hour. – HopelessN00b Nov 07 '12 at 22:14
  • So is your software going to address the damage done to hardware as a result of the power spikes/outages? – Steve Nov 07 '12 at 23:32
  • 1
    This sounds like either oil production remote locations on land or offshore drilling rigs or something. Interesting problem to have! – Mark Allen Nov 07 '12 at 23:52
  • 10
    This actually sounds like a malicious program. The user does not know the "VM" is running on their computer. It is stealing data from across the whole network - then funneling it out through one connection to hide itself. The user "turns the computer off and on" randomly - so you cant just add a UPS. – Laurence Nov 08 '12 at 00:51
  • Hah, interesting theory TheShiftExchange, I wish it really was that and I could just forget the whole thing... – Sevas Nov 08 '12 at 08:32
  • "the goal is to minimize the possibility of inevitable corruption having an impact on the software if possible." If the goal is to minimize corruption, as @HopelessN00b said, I would assume too that the cost of UPSs (or a generator?) would greatly outweigh the cost of whatever other trickery you'd have to do to work around this pile of... crap. "Building, maintaining and dealing with the inevitable bugs are present will cost $$$$ in hours and man power. UPS/Generator will cost $$. You make the call, Mr Boss." – WernerCD Nov 08 '12 at 15:05
  • 1
    I think you should look into preventing corruption of the host system and the VM images too, not just the database. – frozenkoi Nov 09 '12 at 22:33

4 Answers4

23

Honestly your best approach here is to either fix the power-cuts, or deploy a different system in a better location.

Yes there are systems such as redis which will store data in an append-only-log for replay, but you risk corruption at lower levels - e.g. if your filesystem is scrambled then the data on disk is potentially at risk.

I appreciate any improvement would be useful to you, but really the problem is not one that can be solved given the scenario you've outlined.

  • 8
    +1 The correct answer is "Don't do that" – Chris S Nov 07 '12 at 17:36
  • 6
    +1 Eventually random power cuts will corrupt your filesystem. Electronics do weird unpredictable things as their power fails. – Grant Nov 07 '12 at 17:45
  • -1 (virtual -1). I think that such a system *must* be build on the assumption that power cuts happens from time to time. This assumption is a real world fact that you have to deal with. – Igal Serban Nov 13 '12 at 13:12
11

Your approach can work. Let me suggest some enhancements to it. There was a question in stack overflow on atomic writing to file. Essentially you save each packet of data to a temporary file and then you rename it to it's final name. Renaming is an atomic operation that will be safe from power failures. That way you are guaranteed that all your files in your final destination have been saved correctly with no corruption.

Then what you can do to deal with the issue of having millions of files. Is cron a job that runs maybe every hour that takes all the files older then an hour and combines them into one big file using again atomic file operations so that this job runs safely even during a power failures, and then deletes the old files. Kind of like log rotation. An hours worth of files would be around 7,200 files. So at any point in times you shouldn't have more then 20,000 files on disk.

Marwan Alsabbagh
  • 401
  • 8
  • 13
  • 1
    Not a bad answer, but the problem with it is in assuming that the write itself is an atomic operation, which it's not. So a power failure at the wrong time could still create data or FS corruption. Probably about the best option short of fixing the power, or plugging the thing into a UPS, though, so +1. – HopelessN00b Nov 07 '12 at 17:52
  • @HopelessN00b [renaming will be an atomic operation this is a POSIX requirement](http://stackoverflow.com/a/2333979/1699750) – Marwan Alsabbagh Nov 07 '12 at 18:15
  • Yes, renaming the file *once written* is an atomic operation. Writing the file in the first place, is not. – HopelessN00b Nov 07 '12 at 19:01
  • 3
    @HopelessN00b It doesn't matter that the new file is half-written or corrupt. You have the old file which is in a good state. When you recover the system you destroy the half-written file. – DJClayworth Nov 07 '12 at 21:23
  • 2
    @HopelessN00b Exactly! only temporary files in a temporary directory lets say could ever be half written. All the files in your final destination directory will always be non-corrupt and safely on disk – Marwan Alsabbagh Nov 08 '12 at 04:14
  • @MarwanAlsabbagh How can software requirements help against power cuts? If I interrupt the renaming operation by means of cutting the power, the writing of the corresponding file entry in the file system will be corrupt. I cannot see how could a software layer avoid that. Do you have any implementations details? – Alberto Nov 09 '12 at 10:10
7

You install a UPS or a RAID card with a battery-backed write cache to the system, and for as little as $49.95, you accomplish what is simply impossible to accomplish in software alone.

Your claim that it's somehow not possible to hook this server up to a UPS or battery... is simply not believable.

HopelessN00b
  • 53,795
  • 33
  • 135
  • 209
  • 9
    Bureaucratic stupidity is always believable. – Dan Is Fiddling By Firelight Nov 07 '12 at 21:16
  • 3
    @DanNeely `My PHB won't let me hook this up to a UPS/battery` is a very different thing from `it is not possible to add a UPS, a battery, or a similar solution to the system.` Not to get too pedantic, but it's an important distinction because it changes the approach and solutions available. – HopelessN00b Nov 07 '12 at 21:36
  • Or, as mentioned elsewhere, the hijacked computer's user would be surprised if I asked to install a UPS. Situation is a bit unbelievable otherwise. Anyone would, within reason, accept a UPS over corrupted data given the proper business case. – WernerCD Nov 08 '12 at 15:03
  • @WernerCD I'd like you to meet our CIO. While I agree that hijacking someone's computer is a possible use-case for this, I can think of legitimate ones as well, so I'll give the guy the benefit of the doubt. Think about embedded systems and controllers, or like a Raspberry Pi - it can definitely be the case that the "computer" you're using is worth less than the $50 it would take to attach it to a UPS. – HopelessN00b Nov 08 '12 at 15:08
  • Even if the computer is worth less than the $50 UPS - it's the data on the computer that is actually worth something. Google was built on "worthless" computers. More important than the cost of the CPU is the cost of lost data, lost man-power (This programming adventure, data corruption chasing, bug tracking in the old system as well as this new part), lost customers value (Lost my data? Next company please.), etc. – WernerCD Nov 08 '12 at 16:29
  • @WernerCD I don't disagree, but a PHB or accountant or MBA (etc) won't necessarily see it that way, and will often make the wrong decision. I've had many side-jobs for that exact reason, in fact. `Oh, you saved $200 doing it this way? How nice for you. My services fixing it will cost $5,000.` – HopelessN00b Nov 08 '12 at 16:50
  • Which is where becoming a better "sales" person will help as an IT person. If you give the costs "$$$$ to do it the 'easy' way, because of X,Y,Z,Time,Customer Happiness Lost, etc... or $ to add some UPS's". Speak the bean counters language. – WernerCD Nov 08 '12 at 17:12
5

Mount the entire system read-only, except for a block device that stores all your data. Use that block device directly and implement your own data storage mechanism using that raw block device.

MikeyB
  • 39,291
  • 10
  • 105
  • 189
  • 3
    ...and invest in a battery-backed disk controller card, and make sure there's no write-cache on the disk, or you're still screwed. – voretaq7 Nov 07 '12 at 17:43
  • Came here to say they should be booting off of a Live-CD or equivalent ROM system, with some solid state storage used with the flat file solutions. – Mark Allen Nov 07 '12 at 23:55
  • The write cache can be disabled. This approach is viable. Append only Storage mechanism is advised. Blocks are written atomically (i assume) so you can have two "pointer" blocks which point to the start and end of the section with new/todo data. The pointers are updated after writing/finishing data. NCQ should be disabled too. – sleeplessnerd Mar 19 '13 at 17:59