3

We recently moved from a traditional Data Center to cloud computing on AWS. We are developing a product in partnership with another company, and we need to create a database server for the product we'll release.

I have been using Amazon Web Services for the past 3 years, but this is the first time I received a spec with this very specific hardware configuration.

I know there are trade-offs and that real hardware will always be faster than virtual machines, and knowing that fact forehand, what would you recommend?

1) Amazon EC2? 2) Amazon RDS? 3) Something else? 4) Forget it baby, stick to the real hardware

Here is the hardware requirements

This server will be focused on I/O and MySQL for the statistics, memory size and disk space for the images hosting.

Server 1

I/O The very main part on this server will be I/O processing, FusionIO cards have proven themselves extremely efficient, this is currently the best you can have in this domain. o Fusion ioDrive2 MLC 365GB (http://www.fusionio.com/load/-media-/1m66wu/docsLibrary/FIO_ioDrive2_Datasheet.pdf)

CPU MySQL will use less CPU cores than Apache but it will use them very hard, the E7 family has 30M Cache L3 wichi provide boost performance : o 1x Intel E7-2870 will be ok.

Storage SAS will be good enough in terms of performance, especially considering the space required. o RAID 10 of 4 x SAS 10k or 15k for a total available space of 512 GB.

Memory o 64 GB minimum is required on this server considering the size of the statistics database. Warning: the statistics database will grow quickly, if possible consider starting with 128 GB directly, it will help. This server will be focused on I/O and MySQL for the statistics, memory size and disk space for the images hosting.

Server 2

I/O The very main part on this server will be I/O processing, FusionIO cards have proven themselves extremely efficient, this is currently the best you can have in this domain. o Fusion ioDrive2 MLC 365GB (http://www.fusionio.com/load/-media-/1m66wu/docsLibrary/FIO_ioDrive2_Datasheet.pdf)

CPU MySQL will use less CPU cores than Apache but it will use them very hard, the E7 family has 30M Cache L3 wichi provide boost performance : o 1x Intel E7-2870 will be ok.

Storage SAS will be good enough in terms of performance, especially considering the space required. o RAID 10 of 4 x SAS 10k or 15k for a total available space of 512 GB.

Memory o 64 GB minimum is required on this server considering the size of the statistics database. Warning: the statistics database will grow quickly, if possible consider starting with 128 GB directly, it will help.

Thanks in advance.

Best,

rhossi
  • 33
  • 4
  • 2
    You'll never get FusionIO-level performance out of AWS... – ceejayoz Mar 28 '12 at 19:51
  • I think he wants to know which Amazon setup would deliver the results closest of the self-hosted solution. – armandomiani Mar 28 '12 at 19:59
  • Exactly. Just an analogy... If I ask for a sports car, someone suggests me a Lamborghini Reventon. But today I deal with Porsche, I know that with Porsche doesn't have any sports car that deliver the same performance of a Lamborghini, but the same way they will cost much less than the Lamborghini - it is a trade-off. The underlying question is: even knowing I won't find the same performance from Lamborghini in any Porsche car, which car from Porsche will bring me some adventure and make me happy? – rhossi Mar 28 '12 at 20:07

3 Answers3

3

In order to get maximum performance from a cloud solution you need to architect for a cloud. If you are worried about how many IOPS you can get from a service that's designed to be fully scalable (both up and down) You are thinking about virtual machines not cloud computing. As far as the amount of IOPS available, I think this article on EBS explains the issues to consider

Jim B
  • 24,081
  • 4
  • 36
  • 60
3

The problems:

  • The two servers you have listed above are absolutely identical.
  • You talk about FusionIO but you also talk about running MySQL and Apache on the same box.
  • You don't mention whether the Apache files or the MySQL database (or parts of it such as the ib_logfile) will be on the FusionIO drives.

The misconception:

It's not necessarily true that "real hardware will always be faster than virtual machines". It is true that on the same hardware the same application will perform better for not being in a virtual machine but since you don't have access to Amazon's hardware, that comparison is moot.

The point about the cloud is that it scales horizontally, so if you can serve 100 simultaneous visitors with one server, you can serve 1000 simultaneous visitors with 10 servers and each visitor receives the same speed of response, no matter how many of them you have.

The cloud:

There are a few key differences with cloud providers compared to colocation. If you are able to take advantage of them, they will make hosting in the cloud a clear winner.

  1. You can spin up and down instances very quickly. If your traffic is very bursty (say, you run a ticket sales website) then you can very easily clone your web tier, database tier and/or storage tier out to hundreds of virtual machines an hour before the Justin Bieber tickets go on sale and shut them all down an hour after to save on money. Hardware based solutions will usually take weeks to increase your capacity and they continue costing money when they aren't being fully utilised.
  2. The up-front cost can be much lower. The hardware you mention probably costs tens of thousands of dollars in addition to your other hosting costs. My Amazon server costs me about $15 per month and yet I could easily scale it up to a much more beefy virtual machine and scale it out to dozens of load-balanced instances with an hours notice.
  3. They do a lot of the work for you. Amazon have other services such as DynamoDB which automatically scale out or in to the workload or storage requirements you give it. They run in SSDs for speed and are replicated to multiple places giving you redundancy and availability.

That said, your application has to be capable of scaling horizontally. You can't simply throw it into the cloud and expect it to scale forever. For instance, default PHP sessions have two problems:

  1. They are stored on a local disk meaning you either need to use sticky sessions or a shared disk which will be a bottleneck.
  2. They are opened with flock() which is an exclusive, blocking file lock. Only one PHP process can be using a session file at a time. This can be a serious problem when you start firing off lots of AJAX calls.

This is only a single example but applications that have not been written with horizontal scaling in mind are usually full of exclusive resources like that one.

If you are running a distributed database (which Amazon's database services are) then your app also needs to be able to deal with the trade-offs inherent in the CAP theorem. This states that you can get two of the three aspects: Consistency, Availability, Partition tolerance. You will need to know which of the three you don't have and have your app compensate for it.

If your application suits hardware, go for hardware. If it suits the cloud, go for the cloud.

Note: I have used Amazon as an example here but there are other cloud hosting providers with similar capabilities of spinning up and down instances very quickly and only charging you for what you actually use.

Ladadadada
  • 26,337
  • 7
  • 59
  • 90
  • Q: The two servers you have listed above are absolutely identical A: True Q: You talk about FusionIO but you also talk about running MySQL and Apache on the same box A: I know. Unfortunately the web software requires that. Q: You don't mention whether the Apache files or the MySQL database (or parts of it such as the ib_logfile) will be on the FusionIO drives A: Fusion drives are for MySQL Database. – rhossi Mar 29 '12 at 00:55
  • Note that sometimes the same hardware performs better on a virtualized platform on the same hardware due to the way virtualization handles certain CPU aspects, and networking aspects, so it's not true to say that the same hardware performs better - it depends on the application, operating system, and hardware. as far as point 2 upfront costs can be lower- however high usage scenarios can be much more expensive very quickly. What is a cost savings is not having to scale up (presuming the app is architected for the cloud) – Jim B Mar 29 '12 at 19:23
  • also your explanation of the real world application CAP theorem is not choose 2 - its choose between ca/cp and ap and in fact forgets the fact that availability is worthless if time to service is too large. rfead Dan Abadi's write up on current availabilty theoroms at http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html – Jim B Mar 29 '12 at 19:28
0

Another point that hasn't been mentioned is the ability to scale the database's capacity in the future. The hardware requirements aren't static, they will change over time as the application grows. @rhossi, you should consider, for each of the hardware options, your scalability path in the future. In brief: (1) For Amazon EC2, to scale up, you can upgrade to a bigger instance [easy], to scale out you'll need to install MySQL on additional instances and define clustering between them [hard, also suboptimal network performance unless you request special cluster instances]. (2) For Amazon RDS, to scale up, same same [easy], to scale out, add more RDS instances and define clustering [hard]. (3) Something else - you might look into Xeround's cloud database solution on EC2 which has auto-scaling and does not require clustering, however I think it's limited in storage capacity so may not be suitable. There may be additional options that offer auto-scaling and I think it would be good to investigate. (4) Real hardware - scaling up requires manually upgrading the hardware [small hassle + upfront cost], scaling out requires more machines and manually defining clustering [medium-hard], but this will easier to achieve than on the cloud, because IPs are static, and better networking performance. HTH

Lena Weber
  • 313
  • 1
  • 4