1

I'm building a backup station.

I'd like to be able to get an image of an HDD containing its partition table and all its partition (not one partition at the time) so that restoring will be easy to do. I'd like to do it with different HDD simultaneously, and each one will be USB.

I tried partimage but it seems to backup one partition at the time. I tried to use clonezilla but it seems to need a client machine and that's not what I need.

A linux solution will be appreciated, but I could use it in virtual machine if needed, even if I'd like it to be an automated process as much as possible.

It must support NTFS because most of the backups I'll do will use NTFS.

Note:
clonezilla seems interesting because as I understood it the client builds a package and sends it to the server over the network. I'd like to build the same easily-restorable package of an HDD plugged via USB, without any extra machine or network involvement.

Andrea Ambu
  • 480
  • 1
  • 8
  • 13

4 Answers4

2

Echoing wombie's concern, I don't think you want the server trying to do big data copy jobs in parallel.

Whether you are trying to copy multiple partitions, which wombie predicts would cause the disk heads to thrash and slow things down, or to trying to copy multiple disks over a usb bus, in which each data stream may cause interrupts that would slow each other down, unless you are dealing with a transmission technology specifically designed to handle high throughput from multiple clients, you are going to slow things down if you try to do them in parallel.

For example, trying to ftp a single file over 10BaseT Ethernet, I could get over 1 MByte/sec (over 8Mbit/sec) throughput, but if I tried to ftp two files from different machines, even to the same server, the throughput would fall to about 150 KByte/sec/per transfer (i.e., about 300 KByte/sec, 2.4MBit/sec). (This is from memory, and it may have taken 3 transmitting stations to get the 10BaseT throughput to drop from ~90% to ~30%. Still, adding a second station did decrease the overall efficiency, due to collisions.)

Besides, its a catch-22: the protocols that can gracefully handle multiplexing high throughput streams generally introduce high overhead. Classic examples of networking protocols that gracefully handle multiplexing high throughput streams: Token-Ring, FDDI, ATM. For example, ATM introduces a minimum 10% overhead (of the 53 bytes in a cell, 5 are header) to the transmission.

Whether you use dd, partimage, or clonezilla, I would suggest:

  1. write a script that sequential checks to see if there is a disk to copy
  2. copies one disk at a time
  3. loop

Then, when you add a disk to the chain, it will get copied. Like some bittorrent clients that periodically check for a torrent in some folder and then process the torrent automatically.

I would also suggest not using USB, if you can, or at least getting multiple USB cards so each disk can have its own USB bus.

pcapademic
  • 1,670
  • 1
  • 15
  • 22
  • My main issue for now is making an image that I'm able to restore. I'd like to do it in parallel but i'm not sure what problem you're mentioning. I'll use different USB PCI cards, what's the problem? The disk where I'll save images? Anyway I'll give this `dd` a look. – Andrea Ambu Jan 06 '10 at 00:09
  • If you have two high CPU intensive applications, both will finish more quickly if you do them in serial rather than in parallel. The context switching inherent in parallel incurs overhead. The only way that parallel takes less time is if all the resources in the chain are under-utilized by a large margin – pcapademic Jan 06 '10 at 01:08
  • That's true but I guess the USB bottleneck should come first, and it would be ok for me. – Andrea Ambu Jan 06 '10 at 10:23
  • If you have two high throughput data transfers occurring on one hub, both will finish more quickly if you do them in serial rather than in parallel (i.e., the comment about CPU was an example, rather than a statement of concern about CPU usage in this situation. Although, the more I think about it, gzip can be pretty CPU intensive on large datasets). But if the system works well enough for you, great. – pcapademic Jan 06 '10 at 19:02
1

With regards to clonezilla, presumably, the client and the server could reside on the same machine. Install the server, perhaps testing with a separate machine, and then install the client and have it connect to localhost or to an assigned IP of the server.

pcapademic
  • 1,670
  • 1
  • 15
  • 22
0

No, you don't want to be able to do this. Reading one partition at a time is the right thing to do, because then the disk heads can just stream data off the disk. If you try to read multiple partitions on the same disk simultaneously, the drive will spend half its time whipping between different parts of the disk, and you won't get anywhere near the same data transfer speed, which means your backups will take longer.

If you want to take a single image of the entire hard drive, including the partition table, then just use dd to read the entire image into a file (run the output through gzip to avoid wasting lots of disk space storing the empty space on the disk).

Dennis Williamson
  • 62,149
  • 16
  • 116
  • 151
womble
  • 96,255
  • 29
  • 175
  • 230
  • Not to mention the extra wear from the heads thrashing about. – David Jan 05 '10 at 22:13
  • I don't want to backup more than one partition from the same HDD simultaneously. I want to get just *one* image of *one* hard driver, no matter what and how many its partitions are. Acronis does something similar. I want to be able to backup *different* hard drives at the same time, but that's not what you were talking about, right? – Andrea Ambu Jan 05 '10 at 22:22
  • No, I wasn't talking about multiple hard drives, although there are USB bus capacity limitations to worry about. Answer updated in light of this new interpretation of the question. – womble Jan 05 '10 at 22:35
  • Andrea's question seemed to have two parts: multiple usb attached hard disks and multiple partitions on each disk (e.g., the "not one partition at a time" comment – pcapademic Jan 05 '10 at 22:54
  • @EricJLN: right. Think about notebook's hard disks for example, they almost always have more than one partition. – Andrea Ambu Jan 06 '10 at 10:27
0

Can you not just spawn multiple copies of dd?

Dennis Williamson
  • 62,149
  • 16
  • 116
  • 151
  • I didn't find a way to use it to copy a whole disk, and it would be nice to do it *knowing* the filesystem so that there is no space wasted. Anyway I saw how to copy a single partition only, do you have any pratical example on its usage for my purposes? – Andrea Ambu Jan 06 '10 at 12:02