1

I'm going to try to make this as brief as possible while covering all points - I work as a PHP/MySQL developer currently. I have a mobile app idea with a friend and we're going to start developing it.

I'm not saying it's going to be fantastic, but if it catches on, we're going to have a LOT of data.

For example, we'd have "clients," for lack of a better term, who would have anywhere from 100-250,000 "products" listed. Assuming the best, we could have hundreds of clients.

The client would edit data through a web interface, the mobile interface would just make calls to the web server and return JSON (probably).

I'm a lowly cms-developing kinda guy, so I'm not sure how to handle this. My question is more or less about performance; the most I've ever seen in a MySQL table was 340k, and it was already sort of slow (granted it wasn't the best server either).

I just can't fathom a table with 40 million rows (and potential to continually grow) running well.

My plan was to have a "core" database that held the name of the "real" database, so the user would come in and try to access a client's data, it would go to the core database and figure out which database to get the information from. I'm not concerned with data separation or data security (it's not private information)

Josh Toth
  • 754
  • 9
  • 23
  • Reading between the lines, some form of [NoSQL](http://en.wikipedia.org/wiki/NoSQL) database might serve you better for storing this "product" information. – DaveRandom Jun 06 '12 at 19:17
  • 1
    40 million rows is nothing for a **properly set up** database. – deceze Jun 06 '12 at 19:17
  • @deceze of course I set them up properly (to the best of my ability and knowledge). I've just never dealt with this much data and I don't want to find out down the road that I have to change the way everything works. – Josh Toth Jun 06 '12 at 19:20
  • 3
    That's why, 40 million rows is nothing. *If* your service really takes off eventually, you can worry about redoing everything with the experience you have gained then. Until then, just get *something* out the door without unnecessarily over-engineering it. – deceze Jun 06 '12 at 19:23
  • That.... is a very good point. :) – Josh Toth Jun 06 '12 at 19:24

1 Answers1

2

Yes, it's possible and my company does it. I'm certainly not going to say it's smart, though. We have a SAAS marketing automation system. Some client's databases have 1 million+ records. We deal with a second "common" database that has a "fulfillment" table tracking emails, letters, phone calls, etc with over 4 million records, plus numerous other very large shared tables. With proper indexing, optimizing, maintaining a separate DB-only server, and possibly clustering (which we don't yet have to do) you can handle a LOT of data......in many cases, those who think it can only handle a few hundred thousand records work on a competing product for a living. If you still doubt whether it's valid, consider that per MySQL's clustering metrics, an 8 server cluster can handle 2.5million updates PER SECOND. Not too shabby at all.....

The problem with using two databases is juggling multiple connections. Is it tough? No, not really. You create different objects and reference your connection classes based on which database you want. In our case, we hit the main database's company class to deduce the client db name and then build the second connection based on that. But, when you're juggling those connections back and forth you can run into errors that require extra debugging. It's not just "Is my query valid?" but "Am I actually getting the correct database connection?" In our case, a dropped session can cause all sorts of PDO errors to fire because the system no longer can keep track of which client database to access. Plus, from a maintainability standpoint, it's a scary process trying to push table structure updates to 100 different live database. Yes, it can be automated. But one slip up and you've knocked a LOT of people down and made a ton of extra work for yourself. Now, calculate the extra development and testing required to juggle connections and push updates....that will be your measure of whether it's worthwhile.

My recommendation? Find a host that allows you to put two machines on the same local network. We chose Linode, but who you use is irrelevant. Start out with your dedicated database server, plan ahead to do clustering when it's necessary. Keep all your content in one DB, index and optimize religiously. Finally, find a REALLY good DB guy and treat him well. With that much data, a great DBA would be a must.

bpeterson76
  • 12,918
  • 5
  • 49
  • 82
  • Wonderful answer, very thorough :) I never thought of the debugging/connection potential issues. And with the potential disasters, I think you've scared me into staying with one database, haha. PS I love your Quizno's guy in the icon! – Josh Toth Jun 06 '12 at 19:38
  • "We love the Subs!" Actually, it's a spongmonkey from http://www.rathergood.com/moon_song But it was a brilliant campaign and caused me to buy a LOT of subs. – bpeterson76 Jun 06 '12 at 19:56