How to go from a full SQL querying to something like a NoSQL?

Question

In one of my process I have this SQL query that take 10-20% of the total execution time. This SQL query does a filter on my Database, and load a list of PricingGrid object. So I want to improve these performance. So far I guessed 2 solutions :

Use a NoSQL solution, AFAIK these are good solutions for improving reading process.

But the migration seems hard and needs a lot of work (like import the data from sql server to nosql in a regular basis)
I don't have any knowledge , I even don't know which one I should use (the first I'd use is Ravendb because I follow ayende and it's done by the .net community).
I might have some stuff to change in my model to make my object ok for a nosql database

Load all my PricingGrid object in memory (in a static IEnumerable)

This might be a problem when my server won't have enough memory to load everything
I might reinvent the wheel (indexes...) invented by the NoSQL providers

I think I'm not the first one wondering this, so what would be the best solution ? Is there any tools that could help me ?

.net 3.5, SQL Server 2005, windows server 2005

There is also the possibility that your query can be speeded up. Can you give more detail on the kind of query you are running, the size of table(s) you are querying and number of results, and the indexing your are using? — DNA, Feb 10 '12 at 16:03
The query concerns mainly 5 tables, I created some indexes using the execution plan (no table scan), there is 60k (for 14k selected) lines on the PricingGrid table (the main one), and there is 5 tables in the output. Posting details about my sql query would totally change the purpose of my question. — remi bourgarel, Feb 10 '12 at 16:16
@remi: The ***stated*** purpose of your question is "So I want to improve these performance." DNA rightly points out that the easiest way to do that is probably better use of your already existing SQL Server and SQL design. I can assure you that unless you have a half dozen servers lying around that you plan to distribute your query across, that there is nothing of substance that NoSQL can do for your performance, that you cannot already get out of SQL Server. *Especially* for a 5-table query, which is likely to be *slower* on a NoSQL solution than on a SQL one. — RBarryYoung, Feb 14 '12 at 04:27
@RBarryYoung, I actually have half dozen servers. I was just curious to see how to migrate from one solution to the other, for evaluating what would be the fastest to implement. This query exists since 4 years and I've already posted multiple time question on how to optimize it (and It succeeded). That's why I'm thinking about other solution instead of reading my execution plan for the 1000th time and finally win 1ms. — remi bourgarel, Feb 14 '12 at 08:15

score 3 · Answer 1 · answered Feb 13 '12 at 21:54

Migrating your data from SQL is only the first step. Moving to a document store (like RavenDB or MongoDB) also means that you need to:

Denormalize your data
Perform schema validation in your code
Handle concurrency of complex operations in your code since you no longer have transactions (at least not the same way)
Perform rollbacks in the event of partial commits (changes)
Depending on your updates, reads and network model you might also need to handle conflicts

You provided very limited information but it sounds like your needs include a single database server and that your data fits well in the relational model.

In such a case I would vote against a NoSQL solution, it is more likely that you can speed up your queries with database optimizations and still retain all the added value of a RDBMS.

Non-relational databases are tools for a specific job (no matter how they sell them), if you need them it is usually because your data doesn't fit well in the relational model or if you have a need to distribute your data over multiple machines (size or availability). For instance, I use MongoDB for a write-intensive high throughput job management application. It is centralized and the data is very transient so the "cost" of having low durability is acceptable. This doesn't sound like the case for you.

If prefer to use a NoSQL solution perhaps you should try using Memcached+MySQL (InnoDB) this will allow you to get the speed benefits of an in-memory cache (in the form of a memcached daemon plugin) with the underlying protection and capabilities of an RDBMS (MySQL). It should also ease data migration and somewhat reduce the amount of changes required in your code. I myself have never used it, I find that I either need NoSQL for the reasons I stated above or that I can optimize the RDBMS using stored procedures, indexes and table views in a way which is sufficient for my needs.

I don't have a single server here but 5 : 1 master that receives all the writing queries, and 4 slaves that're used for querying. My data fit well in the relational model because we made it that way and when we did it we had no other alternative. Here there is no need for reporting and the writing represent maybe 1% of the queries on this data. I'll have a look to memcache or any MS alternative... — remi bourgarel, Feb 14 '12 at 08:09
sounds like caching is the way to go. with 1% writes and a master-slave configuration going NoSQL is more of a matter of how much data you have and what availability you are trying to achieve. BTW, thinking about how to represent your data upfront is a good thing a document storage is usually the way to go when you can't really structure your data in any meaningful way. like for instance, storing JSON object in a CLOB — Asaf, Feb 14 '12 at 09:28
here caching is not really easy or efficient : if I cache all the data, I'll still have to filter them (and that's what costs the more, the data volume is not very big). If I cache all the different request I'll never read the cache because every request is different (some of them are on 10 product , some on 10 000, some are from different source, there is also the date component so there is an infinity of possibility). — remi bourgarel, Feb 15 '12 at 08:55

score 2 · Accepted Answer · answered Feb 15 '12 at 07:08

Asaf has provided great information in regards to the usage of NoSQL and when it is most appropriate. Given that your main concern was performance, I would tend to agree with his opinion - it would take you much more time and effort to adopt a completely new (and very different) data persistence platform than it would to trick out your SQL Server cluster. That said, my answer is mainly to address the "how" part of your question.

Addressing misunderstandings:

Denormalizing Data - You do not need to manually denormalize your existing data. This will be done for you when it is migrated over. More than anything you need to simply think about your data in a different fashion - root aggregates, entity and value types, etc.
Concurrency/Transactions - Transactions are possible in both Mongo and Raven, they are simply done in a different fashion. One of the inherent ways Raven does this is by using an ORM-like "unit of work" pattern with its RavenSession objects. Yes, your data validation needs to be done in code, but you already should be doing it there anyway. In my experience this is an over-hyped con.

How:

Install Raven or Mongo on a primary server, run it as a service.
Create or extend an existing application that uses the database you intend to port. This application needs all the model classes/libraries that your SQL database provides persistence for.

a. In your "data layer" you likely have a repository class somewhere. Extract an interface form this, and use it to build another repository class for your Raven/Mongo persistence. Both DB's have plenty good documentation for using their APIs to push/pull/update changes in the document graphs. It's pretty damn simple.

b. Load your SQL data into C# objects in memory. Pull back your top-level objects (just the entities) and load their inner collections and related data in memory. Your repository is probably already doing this (ex. when fetching an Order object, ensure not only its properties but associated collections like Items are loaded in memory.

c. Instantiate your Raven/Mongo repository and push the data to it. Primary entities become "top level documents" or "root aggregates" serialized in JSON, and their collections' data nested within. Save changes and close the repository. Note: You may break this step down into as many little pieces as your data deems necessary.
Once your data is migrated, play around with it and ensure you are satisfied. You may want to modify your application Models a little to adjust the way they are persisted to Raven/Mongo - for instance you may want to make both Orders and Items top-level documents and simply use reference values (much like relationships in RDBMS systems). Watch out here though, as doing so sort-of goes against the principal and performance behind NoSQL as now you have to tap the DB twice to get the Order and the Items.
If satisfied, shard/replicate your mongo/raven servers across your remaining available server boxes.

Obviously there are tons of little details I did not explain, but that is the general process, and much of it depends on the applications already consuming the database and may be tricky if more than one app/system talks to it.

Lastly, just to reiterate what Asaf said... learn as much as you can about NoSQL and its best use-cases. It is an amazing tool, but not golden solution for all data persistence. In your case try to really find the bottlenecks in your current solution and see if they are solvable. As one of my systems guys says, "technology for technology's sake is bullshit"

I totally agree with your collegue, that's why I'm asking this question, to learn about this new technology and see if the provided solution would be efficient in term of performance, and development time. I really liked your answer because you went this way. — remi bourgarel, Feb 15 '12 at 08:51
@remibourgarel tres bien. in all seriousness, just remember to pick the right tool for the job. spin up a quick single server instance of Raven (if you have a .Net app there's better tooling than Mongo yet), and play with it. Write a little console app that pulls a collection from your DB and stuffs it in Raven... measure your time spent and how you could do it on a large scale. If it's worth your time, great, if not, hit the SQL performance tutorials and books at Tekpub or Pluralsight or all the articles here on SO. Good luck. — one.beat.consumer, Feb 15 '12 at 09:34
very well said, @one.beat.consumer. I will stress that doing "JOIN"s on your data in my view kind-of defeats the purpose of using something like MongoDB. but again, without knowing the exact use-case I can't really make any generalizations — Asaf, Feb 15 '12 at 11:31

How to go from a full SQL querying to something like a NoSQL?

2 Answers2

Addressing misunderstandings:

How: