The company I work for produces software that takes in Gigabytes of data a day, and outputs (lesser amounts of) Gigs a day. We have a significant infrastructure for managing data flow, and by significant I mean significantly bad. When we require a new type of data in our software, we develop a script for pulling data from an HTTP or FTP source, drop it on a server, and go. There are often complaints of over/under-utilization across most of our servers, but we have no good way for managing the balance of resources spent retrieving / writing out data.
Is there a software package that is designed to manage data transfer externally (retrieval) as well as passing it around from server-to-server inside our network?
I'm looking for something that offers;
- A common base for simplifying the design of download scripts.
- A logging/monitoring framework for knowing what's going on, what may have not been retrieved, etc. (Bonus points if the monitoring aspect can integrate into NAGIOS.)
- A single location for viewing the status of download tasks, etc. (i.e. A Dashboard)
- An implementation meant to be run across multiple servers, with ultimately what amounts to multiple server download slaves.
- Simplicity (relatively speaking). Again, we're trying to get off of a nightmare infrastructure.