-2

At my old job when a report was needed I could sit down with someone and pull up results and get immediate feedback, and then refine my queries and ultimately have the data we needed, in the format we needed within 30-90 minutes.

I just started working for a new company with a database containing millions of records and I spent my whole 8 hours making a report that I feel I could have made in less than 2 hours if it were not for the massive amount of data the queries are working with, and the fact that I couldn't ask the person needing the data to sit down with me and give me feedback as I pulled up results as I am used to.

So I am trying to think of how we can make the server faster... much faster, so that I can have the same level of productivity I'm used to. One thought that just came to mind is that memory is so cheap these days, and by my calculations I could buy 10 8gig ram sticks for 1000 bucks. What I have never heard of though is a device that would let me combine these into a huge ram drive.

So I'd like to know if any such device exists, and if not what is the largest ram drive I could realistically make and how would I go about doing so?

EDIT: To you guys who are saying the database shema needs to be analyzed... you can't make a query such as "Select f1, f2, f3, etc from SomeTable" run any faster by normalizing or indexing the table. What I'm talking about IS ABSOLUTELY a need for improved performance at the hardware level.

I am used to having results come back to me in a few seconds, not a few minutes or much less a half an hour. Maybe that's what you guys are used to who have 100 billion record tables and you feel like that's fast, but I'm looking for results back from tables with about 10 million records to come back to me withing less than half a minute TOPS.

Brandon Moore
  • 456
  • 2
  • 6
  • 17
  • "[M]assive amount of data the queries are working with"--are full table scans occurring? – Mark Wagner Mar 22 '12 at 06:04
  • 6
    To make a huge [RAM](http://en.wikipedia.org/wiki/Dodge_Ram) drive, first you must obtain the keys to the RAM. Then you get in, put the key in the ignition and turn it to Start. Then you should be able to make the RAM drive. – Tom O'Connor Mar 22 '12 at 09:53
  • @MarkWagner Pretty much. For example I was asked to include results for one report where a particular field was >= 5. Well, that was about 96% of the records in the table. – Brandon Moore Mar 23 '12 at 04:48
  • @TomO'Connor Thanks. That was at least as helpful as a couple of other answers I've gotten :) – Brandon Moore Mar 23 '12 at 04:52

6 Answers6

6

One thought that just came to mind is that memory is so cheap these days, and by my calculations I could buy 10 8gig ram sticks for 1000 bucks. What I have never heard of though is a device that would let me combine these into a huge ram drive.

This is one of those ideas that look good if you do not know what you do and fall apart when you think that through.

See, the problem is - you talk about making an 80gb RAM drive, but why not plug the RAM into the server and have SQL Server caching use the memory? 80GB is nothing these days for a server.

There is absolutely no need to use a RAM drive as SQL Server just needs enough RAM to cache all needed data.

Otherwise a nice SSD based disc system also helps - very fast and quite cost effective.

Btw:

a database containing millions of records

That is tiny. Come back when you say hundreds of billions these days.

TomTom
  • 51,649
  • 7
  • 54
  • 136
  • 1
    Tom, that's like saying the sun is tiny because there are much larger stars out there; compared to the rock we live on every day it's pretty large though just as a million records is compared to the average db. Did you answer this question because you wanted to be helpful or because you wanted to be arrogant? I found this: http://www.fusionio.com/platforms/iodrive/ that would totally be worth the cost considering that I could be multiple times more productive with this kind of speed and it's a pretty small fraction of what they're paying me. – Brandon Moore Mar 23 '12 at 04:16
  • And can you get results back from a 100 billion record table in a few seconds? Because that's the kind of speed I'm after because I believe it's possible with just a million records. I don't think your solution will provide that. – Brandon Moore Mar 23 '12 at 04:16
  • Yes, you surely can. You can get servers with 512gb ram these days and then put a high end storage solution behind. The database serve rI am working at the moment delivers me 5 ggiabyte per second IO performance for random IO and has a 21000gb storage subsystem based on 3 storage nodes. We have atables that grow by 60 million entries - per day. – TomTom Mar 23 '12 at 05:00
  • Ok TomTom, run this query: "Select * From AnyTable", where 'AnyTable' has a minimum of 20 fields and tell me how fast the query results are returned. That is, not how fast you see results start to come up in SMSS, but how long before the entire query has been processed. – Brandon Moore Mar 23 '12 at 05:05
  • I.e. Not trying to be argumentitive there... I actually would like to know. – Brandon Moore Mar 23 '12 at 05:06
  • (and of course, specify how many records are in AnyTable) – Brandon Moore Mar 23 '12 at 05:07
  • Seconds. Seriously. Our most complex fact table has currently about a billion on the test system and i cn sum the numbers up in about 5 seconds. That table has about 80 or so odd fields - only numberic type. – TomTom Mar 23 '12 at 05:09
  • TomTom, do you realize that summing the numbers takes much less time than doing a select *? – Brandon Moore Mar 23 '12 at 05:12
  • I just did a test 3 times on "select id from customers" versus "select sum(id) from customers". The latter query averaged out to be about 5 times faster. So back to my question, would it be possible for you to actually try that test and let me know a real result? – Brandon Moore Mar 23 '12 at 05:16
  • Also, 80 numeric fields can easily be less than even 20 text fields. We have a mix of text, numeric, and datetime. – Brandon Moore Mar 23 '12 at 05:19
  • Yes, but it may not be representative. See, the problem is not the select... in my case it would be bandwidth. if I pull that from my workstation the data has to trickle down from a connection in a data center. Processing time is irrelevant. Second - whoever does not normalize his text fields... either picks out big bucks or works at mcsdonalds, please. Our database is a data warehouse. No text in fact tables. – TomTom Mar 23 '12 at 06:22
  • Just because it's a text field doesn't mean it's not normalized. Unless you're speaking of normalizing past the 3rd form which can become counterproductive depending on the application, but that's another conversation. Anyway, I was assuming you had the ability to find out what the actual query run time was. Sounds like you don't even have access to see an execution plan though? – Brandon Moore Mar 23 '12 at 22:59
  • We have and that starts making sense. But as you show on another answer you dont even run io na real server (VM, 12g RAM - here is your problem, get that on a highe end dual socket machine with 128gb ram and see it fly. Basically try to fid the database into ram anyway (by having enough ram, not with a RAM drive). – TomTom Mar 24 '12 at 05:39
  • Yes it's on a 12gb VM now, and the idea is to upgrade that situation first. But even with 12gb... that should be more than enough memory to handle a 1,000,000 row table just fine so it leads me to believe there are other bottlenecks besides just the server RAM. – Brandon Moore Mar 24 '12 at 07:36
  • 1 million rows can be larger than that with - well - multiple text fields + indices. Plus VM#s often ahve severe limitations in CPU size. I mean severe for database servers. You could easiyl be CPU bound. Not having enough CPU is a goood starter. – TomTom Mar 24 '12 at 08:39
  • Only 25ish rows, mostly numbers, and text fields are titles of books or stores so never that large. But... for some reason I was thinking that VM's were not limited by CPU except that other VM's on the same machine may be competing for resources. In the case of Microsofts VM's, do you know if that's the case? – Brandon Moore Mar 28 '12 at 00:51
  • Sorry, ALL VM setups have limited CPU capacity. Hyper-V allows 4 cores now, 32 cores in the future. as every virtual core is mapped to one physical core that means a 4 core vm is pathetic in a world where databasse servers have 12+ physical cores. This is extremely low end (the setup I was talking to has 48 cores + hyperthreading). – TomTom Mar 28 '12 at 04:58
  • I see. I don't think that's a bottleneck for us though. At any given time there's generally not more than one query hitting the db so it's not like SQL would ever make use of that many cores if we had them. I think a whole lotta more RAM has got to be our first priority. – Brandon Moore Mar 28 '12 at 05:17
  • Tom, you think more CPUs will move data from the hard drive into RAM faster? The queries I have to run are not exactly that complex, so unless it can do that I don't think it will make any significant difference. – Brandon Moore Mar 28 '12 at 05:23
  • This db is used to create reports for a relatively small number of decision makers. They get their reports on time so there is no big push from them to upgrade. But they do realize that upgrading it will allow me to be more productive... but I'm not gonna be seeing a 32 core server anytime soon :) – Brandon Moore Mar 28 '12 at 05:30
  • Funny given taht this is quite cheap these days. 32 server is a dual opteron - not a lot of money. I just got a new workstation with 12 threads. Not sure we live in the same time, but 32 is the low end in my world for a server today. 2 opterons or 2 octo core xeons. Anything smaller gets virtual ;) – TomTom Mar 28 '12 at 06:16
6

As TomTom already said, a RAM drive won't help a bit, but more RAM and/or faster drives for the database might help.

But other than throwing more hardware at this problem, it might be that your problem is really located elsewhere (and more hardware will have a lot less benefit than you expect):

  • Are you sure your indices and general DB design are good? They can have a huge performance impact.
  • The same is true for your queries. Can they be optimized for your specific data structures? Use your databases analysis tools to help you with both issues.
  • It appears that you work with a trial-and-error approach when creating the report queries, which I think is kind of odd. Is this because you don't get good requirements from the persons needing the report or because you don't fully understand the data structures and/or the query language? Both problems should be fixed, but if that's not possible, maybe you can work with a small subset of your production data to formulate the queries and only run them against the full dataset when your are satisfied with them.
Sven
  • 98,649
  • 14
  • 180
  • 226
  • You say that "a RAM drive won't help a bit", and then say "but ... faster drives for the database might help." You realize that a RAM drive *is* a much faster drive... right? What am I missing here? Because those comments seem to be totally conflicting to me. – Brandon Moore Mar 23 '12 at 04:26
  • To answer your questions: 1) The indexes pretty much suck on this db and I'm working on that, but read the edit and other comments I've made to understand why this isn't relevant for my question. 2) No, the queries can't be further optimized. Just trust me on this one... they definitely aren't that complex to begin with. 3) Like I said it's a new job (only been there a week) and I got to spend almost no time with the previous employee so yes... I'm working on learning the schema and also this is a new industry for me. However... – Brandon Moore Mar 23 '12 at 04:57
  • those impediments are minute compared to the slowness of the db. If I could have gotten results back as fast as I am used to then I could have finished in a fraction of the time. Instead I spent more time creating small subsets of the data to work with to formulate the queries with than I did actually creating the queries. – Brandon Moore Mar 23 '12 at 04:59
  • @BrandonMoore: Re. the RAM drive: Having more actual RAM available for the database's caching usually will help more than have an ultra-fast disk, at least once the data is fully cached. The caching strategies are optimized for the case of faster RAM and slower disks. You can help the the overall performance even more if you increase the speed of the disks (e.g. with SSDs), but not if you sacrifice RAM for a RAM disk. – Sven Mar 23 '12 at 10:57
  • Aaaah, I see where the miscommunication is now. The hope was to somehow run the db on some kind of ram drive, but NOT with the intention that the ram for this drive would be coming from the ram the server itself is running on. I.e. FusionIO is exactly the answer I was looking for, because I previously didn't know if something like it existed or not. – Brandon Moore Mar 23 '12 at 22:51
4

Checkout Fusion-IO's ioMemory Platform. They provide flash-based hardware with high capacity and great performance.

pkhamre
  • 6,120
  • 3
  • 17
  • 27
  • Thanks, I actually found that right after I posted this question and given the performance gain I believe it will offer versus the amount of knowledge and expertise I would have to have to otherwise optimize and micro-optimize to math that... it seems like a no brainer to me. Definitely going to look into getting one of these. – Brandon Moore Mar 23 '12 at 04:29
  • I realize this solutions annoys the heck out of people who have all that knowledge and expertise though... which is probably why my question got downvoted. – Brandon Moore Mar 23 '12 at 04:30
  • Didn't deserve that. Hope you can figure it out! :) – pkhamre Mar 23 '12 at 07:32
3

It sounds like the database is designed pretty badly or the server hardware is seriously underpowered if it is taking such a long time to run queries on a database of that size. As the poster above said, millions of records is not so large these days.

From your original question about adding hardware to ease the issue I would guess that you are either in charge of the server(s) or have some influence over it. Is normalizing the DB an option? What hardware is it running on?

If you really wanted to use a RAM disk an SSD is probably a more reliable and effective means of achieving this these days. However I really would suggest analysing the database schema and whether the current hardware is up to scratch first and then making a decision.

We run databases (SQL Server, MySQL and PostGreSQL) that have millions of records on fairly lowly hardware (think Pentium-D, 7200RPM drives in RAID-10) and have very fast database responses and this is due to the efficient normalization of the DB schemas.

takesides
  • 101
  • 1
  • 10
  • Maybe I should have provided more detail when I wrote the question... running a "select field1, field2, field3, etc from sometable" can take a few minutes for a million records. Not much an index or normalization can do to help that query. It's on a virtual machine with 12 gigs of ram. Like I said... just started at the job and no I'm not a db or networking admin, but I know they are fixing to move to a clustered environment and my understanding is that this only improves availability and not really performance. So I just want to give what input I can to make sure their efforts are fruitful. – Brandon Moore Mar 23 '12 at 04:34
  • See. Crappy hardware. A VM with 12gb ram is laughabl as high end server. The system IW was talking of has 96gbRAM and 24 real cores+ hyperthreading. AND uses multiple such machiens at the same time. – TomTom Mar 24 '12 at 05:38
  • Ha, see what? You were emphasizing the need to analyze the db and 'I' was pointing out it was a hardware issue (though I don't know the best solution). But at least we're in agreement :) – Brandon Moore Mar 24 '12 at 07:33
1

Well, both sql server and oracle support to run in RAM although db still sits physically on hard drive. Also other RDBMS providers support this functionality. In these days bladecenters and other modular server system are in touch, but when looking for something really special, I always look at TYAN motherboards which support huge amount of installed RAM.

But still I think milions of records is not something what couldn't be handled in short time, I would rather first take a look at schema, especially at indexes and relation collumns between table.

I also would like to propose take a look at Fusion-IO platform, but when thinking about big storage it could became really expensive. When rally trying to configure big IO throughput you could think about combine 2 cheap raid array using external raid card to RAID 100 which is really unbeatable. Even there is no array controller which directly supports RAID 100, you could build 2 disk arrays, each in RAID 10 and then connect both array together using RAID 0 using external additional array controller, so you have mirrored storage stripped on 2 levels, the IO performance boost in real is about 150-180 % and this solution could be cheap. In case of you don't like to lost half of storage, you could think about RAID 600 in that case, which is still stable, but uses just 2 disc for parity. This could be solution is better against RAID 100 if startup RAID 6 is configured using at minimum of 5 hard drivers.

kensai
  • 141
  • 2
  • +1. Thanks, I agree that FusionIO seems pretty expensive... but if the results come back fast enough that I can run ad-hoc queries in front of the people needing reports so they can give me immediate feedback, allowing me to refine the reports and give them what they need in 2 hours instead of 8, then I think it's well worth the cost. – Brandon Moore Mar 23 '12 at 04:39
1

Oracle will in fact load the most frequently used data in RAM. And if your database fits in RAM it will place it in RAM for you. It does need to be configured correctly to do it.

I assume that the data you access is sitting along with other data competing for server resources. You could copy the data that you need into a separate database, it could even be oracle. And that is in fact what a lot of people do.