1

I am about to launch my startup company, and we will be going live in a few weeks time. We have really tight budgetary constraints, since we are bootstrapping - and would prefer not to raise external capital.

I cant use shared hosting because I need more control of the server machine (for technical reasons - e.g. using proprietary extensions to PHP, Apache and in the database layer as well) - but want to control costs and dont want to go fully private server route, until we have determined the market size etc. So the only real alternatives AFAIK is between virtual server and the cloud.

At the moment, cloud services seem a bit "vague" to me. My understanding is that they allow an entity to outsource its IT infrastructure, which in my mind (at least), is indistinguishable from what a hosting provider provides (at least from a functional point of view) - I would like to seek some clarification on exactly what the difference between the two is.

Back to my original question, my requirements are:

  1. IT infrastructure that can scale with growth
  2. Ability to have control of the machine (for e.g. to install our internally developed libraries etc)
  3. Backup software that is flexible and comprehensive enough (yet simple to use), that allows a (secured) backup strategy to be implemented. On this issue, I have always wondered where the actual backed up data was stored (since the physical machines are remote, and one cant get access to any actual tapes etc backed onto). I would also like some advice and recommendations in this area. Regarding data size, I am expecting the dataset to be increasing by a few megabytes of data (originally, say 10Mb, in about a years time, possibly 50Mb) every day.

As an aside, I have decided to deploy on a Debian server (most of my additional libraries etc were compiled and built on a Debian machine).

Mindful of all of the above, I would like some advice (and reason) as to which route to take. I would also like some advice on which backup software to use, from people who have walked a similar path.

user35402
  • 1,171
  • 3
  • 10
  • 18

4 Answers4

6

"Cloud" services are a bit like web 2.0 -- you take an idea that isn't well understood but isn't new either, and you give it catchy name, and suddenly everybody is talking about it.

Cloud services are based on the "virtualized data center" idea sold to us 10 years ago.

Cloud hosting is (usually) simply a virtual private server environment, except with the understanding that you may want to quickly provision additional hardware at a moment's notice. Amazon EC2, for example, simply gives you a slice of a server using Xen (albeit sometimes a fairly beefy slice if you're willing to pay for it) which you load with a VM image you store in S3.

The initial setup may be a bit daunting your first time through it, but when you're done, you can launch any number of identical instances within minutes from your browser. Another click, and the server disappears. It costs the same to run 5 instance for 1 hour, or run 1 instance for 5 hours. That's what they mean by "elastic". You can see how this has some significant implications in the area of scaling. You only pay for the hardware you use, and only when you use it. If you want, for example, you can run 5 servers during business hours, and just 1 in the evening, and they don't give you any grief about constantly adding and removing hardware.

Remember, cloud services aren't necessary in order to scale.

You scale a cloud service by adding new hardware. You scale a dedicated service by adding new hardware. The mechanism involved, the steps you take, the planning you have to do ahead of time to parallelize your work flow -- it's all the same. Cloud services allow you to scale, and then un-scale, very quickly and cheaply, putting additional hardware wherever you need it. If you run a business the size of Amazon, this sort of thing is really the only way to go. They've been running a cloud network long before they started selling it as a service.

Performance on virtual private servers can be annoyingly inconsistent if you don't control the whole box.

If you share the hardware, you share the resources. If you share with yourself, that's no big deal. If you share with someone who's trying to run their own search engine, you may feel like you're getting less than you paid for.

Some cloud providers "simplify" the process by giving you less control.

Properly scaling with demand is actually not a simple problem. I mean, it's simple if you know how to do it, but if you've never done it, then you probably don't know how. Some of the smaller places try to differentiate themselves by taking over some of that complexity. You may not want that, or you may find it very helpful.

Reliability matters

All virtual servers run on normal server hardware. If the underlying computer catches fire, virtual servers die just as fast. However, most cloud providers also provide a SAN for your permanent storage. It's worth pointing out that if properly managed, a SAN is dramatically more reliable than single server hard drive, and can quickly be assigned to a different server should you have trouble with your current machine. It's also much more expensive per byte.

And so...

Dedicated is simplest, and definitely gives you the most resources for the money. It's also the least flexible. Traditional VPSes give you less, but should cost less.

And for backup: there's nothing wrong with rsync -- it's the basis for most backup tools. Add hard links for snapshots, and you have a real solution.

tylerl
  • 15,055
  • 7
  • 51
  • 72
1

I'm assuming you are referring to Amazon's Cloud Services (EC2, S3, EBS, etc) or similar as cloud is a big vague term that can be pretty much any service on the internet.

In regards to Amazon over a standard VPS, the big differences are:

Modular Pricing
With Amazon your pricing is very much about usage, the big one is the per hour charge of having the machine turned on but it also breaks down allocated storage, data transfer, static IPs, etc. Most VPS are sold as a monthly packaged price. They specify what kind of machine you get, storage amount, included bandwidth, etc for a static monthly price. This is great for anticipating costs but has the potential for paying for more then what you actually make use of.

A side benefit of the way they have broken up the pricing, is the flexibility of configuration also. Some apps need tons of data storage, but very little processing power, others are just the opposite. You don't have to go up to a bigger package to cover the one requirement and be paying for a bunch of extras you don't need.

Hardware on the fly
Amazon has created an API and tools to control the various services they have so that you can control it programatically. If you design your application properly for horizontal scaling (shared session management, load balancers, etc) you can run on the minimal box as a default then based on time of day, system load, concurrent users, etc add more boxes into the mix.

This is a great way to save hosting costs depending on the usage patterns of your audience. For example, lets say your load is 10am-4pm M-F, so 6 hours a weekday and the rest of the time its very random access.

If you need a large box to handle those 6 hours and are running it 24/7 you'd be spending $0.34/hr * 24 hrs * 30 days = about $245/mo (with some additional costs for storage, bandwidth, etc but the hourly is the bulk of the cost).

Instead, if you ran a small box 24/7 and had a large one come in to help out during those 6 hours you'd be talking $0.085 /hr small * 24 hrs * 30 = $60 plus $0.34 /hr large * 6 hrs * 20 = $40 for a total of about $100, less than half of running the large box 24/7.

Environment
They are all VPS (Amazon uses Xen as their hypervisor) Amazon has just created this interface to controlling the services, running things on the fly, modular services, etc but the machines are still just virtual machines. There isn't a difference to you as far as configuring apache, maintaining patches, etc. Be it VPS, Amazon, Rackspace's cloud, etc you will still SSH onto the box and be in a full linux environment.

ManiacZX
  • 1,656
  • 13
  • 16
0

1: needs cloud.

2: needs physical machine or virtual machine

3: can run with anything. have fun paying for it, though, except with a separate dedicated physical server.

Depending on your need I would get a powerfull machine and run virutalization on it. Allows you to havea pretty big layout (like vm's for development etc.) and the costs are lower than cloud. Later you can always switch.

TomTom
  • 51,649
  • 7
  • 54
  • 136
  • I disagree slightly with your first point. It sounds like he is using some kind of LAMP stack, in which case if his architecture is share nothing then scaling horizontally with VPS's is pretty easy. – Mark May 07 '10 at 08:25
  • The point on 1 is that only cloud allows you to get up FAST with the amount of power you have - up and down. VPS normally require longer commitments. YOu can not scale up for 4 hours while you are mentioned on TV ;) Pretty much the ONLY advantage clouds have. Financially they are stupid for a base load - way too expensive when run a complete month. But scaling they do ;) Fast up and fast down. – TomTom May 07 '10 at 08:56
0

Cloud services are just a modern name for what many companies have been providing for several years. They try to abstract the services they provide so you don't feel like you're dealing with physical infrastructure. For example, Amazon's EC2 'Elastic Compute Cloud' allows you to boot a number of virtual private servers via their API, instead of ordering a new VPS through a sales person. They also charge by the hour instead of a monthly or yearly contract. This allows you to programmatically manage your resources in a very flexible way. For example, you could commission more servers when load is high, or when you need a testserver, or boot 100 servers when you need to crunch numbers and decommission them the very same day to manage cost. Cloud storage (e.g. Amazon S3) does the same for storage. The flexibility of dealing with infrastructure as if it were a consumable resource is what makes 'the cloud' so appealing.

To answer your questions:

  1. It would be easiest to start using Amazon EC2 or Rackspace Cloud Servers, because if you can set up your application to be flexible, using a VPS provider that can commission more or faster servers and only charges for what you use, it makes it easier to control your cost and still provide a good service when your popularity grows. If you don't need that kind of flexibility you could try Slicehost, a regular VPS hoster.
  2. I would definitely go with a virtual machine for its flexibility as described above. A physical machine would be recommended of you have high I/O needs or cannot accept the slight overhead of a virtual machine.
  3. Most providers can provide some sort of backup for their servers. You can also provide your own off-site backup. I'm personally quite fond of Jungle Disk Server Edition, which is affordable, secure and uses Amazon S3 or Rackspace Cloudfiles as storage providers. In effect you only pay for what you use, so your cost will only grow when your dataset or retention period grows. There are many offerings, so be sure to shop around and test their claims. Backups are only useful when proven.
Martijn Heemels
  • 7,728
  • 7
  • 40
  • 64