0

My department does data migrations when a client switches from another software vendor to us, often we need to get a copy of their old data (whatever that may have been) and send it to us.

The big challenge we face is some systems will have hundreds of thousands of files (mainly document/image repositories) that the whole collection can be in the 10's of gigabytes in size. We grab a copy of their data at the start of the conversion process, then we grab a 2nd set right before the install which could be months later.

We are looking to find a better solution for uploading that 2nd set of data. Right now the main method is just creating a large zip of the whole directory and FTPing (via a write only account) it to our server, that of course has a large overhead due to a large portion of the files are that likely have not changed seance the initial data grab.

Tools like rsync seem like the perfect solution but from what I have researched there is no easy way to do "write only" a account like we did with the FTP. Preventing un-authorized downloading of another client's data is a big concern from the higher ups.

In Summary, what kind of tools should I be using with these kind of requirements:

  • Does not allow downloading of other client's data.
  • Minimal setup work to be performed client side. Usually instructions are given over the phone on how to upload the data and we don't have anyone on site. Also the person on the other end of the phone is often VERY unskilled in computer use.
  • Windows comparability of the client. 95% of our clients are windows users, the other 5% are Macs but Mac comparability is not a major concern (but would be a +).
  • Allows us to not send redundant files that have not changed.
  • Reliability on the client side. We have attempted to use BITS in the past to upload but we found that a fairly large number of XP era machines just could not get it to work correctly. Any client we use needs to work 99% of the time on any Windows machine XP SP2 or newer.
  • Minimal setup work server side per client. We do not want to have to create a separate user for every single client who uploads, but if we had to it would not rule out a tool only counted as a -.
  • The server side program runs inside windows. We are mainly a Windows/C# shop, having to setup and manage a Linux box would be not preferred. However if the tool in question fills all the other requirements well it would not be ruled out for not running in windows.

Currently the frontrunner is rsync and writing some sort of user manager that would create a separate user account on the rsync server per client, but I am sure there are other options I do not know about which could be better suited.

Scott Chamberlain
  • 1,455
  • 2
  • 21
  • 37
  • "We do not want to have to create a separate user for every single client who uploads" conflicts severely with "Preventing un-authorized downloading of another client's data is a big concern from the higher ups." If the person on the other end of the line is using a shared account, they can download or over-write anything that they have access to. – mfinni Jun 27 '13 at 17:21
  • @mfinni Overwriting is not a concern, and using some form of service that allows uploading but not download solves the shared account issue. – Scott Chamberlain Jun 27 '13 at 17:48

2 Answers2

1

IMO, the solution you've already described is your best bet. The separate accounts for each client is the only way I can think of off-hand to satisfy the first requirement, and using SSH keys with rsync (as opposed to passwords) helps that. Rsync itself addresses the other points.

John
  • 9,070
  • 1
  • 29
  • 34
0

After doing some more research after asking this question I think I found a solution that fits all the criteria, rdiff.

All that needs to be done is write a wrapper application that acts as self extractor for rdiff.exe, cygwin1.dll, and cygpopt-0.dll and then provides a easy to use GUI interface for the relevant rdiff operations.

After doing that we just make a signature file before the initial zip is performed and transferred and keep a copy of it on file. Once the 2nd transfer is going to happen we use that original signature file to generate a diff and only upload the diff to the FTP server.

Scott Chamberlain
  • 1,455
  • 2
  • 21
  • 37