3

Edit: I need help to rewrite these linq querys to SQL querys for highest possible performance.

I have a table with about 10 million rows. It consists of 7 columns including Id. First is Id, then three keys to "TradeObjectModel", finally three integers keeping the different TradeObjectModels rating values. Like this: enter image description here

When a user, eg To1Id (TradeObjectModel1 owner) with key 71 handles her ratings of other Trade objects only one row is sufficent for the current view.

My attempt to solve this looks like this(explanation below code sample):

IEnumerable<RatingListTriangleModel> allTriangleModels1 =
                this._ratingListTriangleRepository.All.Where(
                    ratingListRow =>
                    ratingListRow.To1Id == myTradeObject.TradeObjectId);
            var filteredallTriangleModels1 = from row in allTriangleModels1
                                             group row by row.To2Id into g
                                             select g.First();


            IEnumerable<RatingListTriangleModel> allTriangleModels2 =
                this._ratingListTriangleRepository.All.Where(
                    ratingListRow =>
                   ratingListRow.To2Id == myTradeObject.TradeObjectId);
            var filteredallTriangleModels2 = from row in allTriangleModels2
                                             group row by row.To3Id into g
                                             select g.First().

            IEnumerable<RatingListTriangleModel> allTriangleModels3 =
                this._ratingListTriangleRepository.All.Where(
                    ratingListRow =>
                   ratingListRow.To3Id == myTradeObject.TradeObjectId);

            var filteredallTriangleModels3 = from row in allTriangleModels3
                                             group row by row.To1Id into g
                                             select g.First();


            var fileredallTriangleModels =
                filteredallTriangleModels1.Union(filteredallTriangleModels2).Union(filteredallTriangleModels3).ToList();


            ViewBag.TriangleCount = fileredallTriangleModels.Count();

            foreach (var ratingListRow in fileredallTriangleModels)
            {
                //Find which one is my ad and set me as setter and their object as receiver
                if (ratingListRow.To1Id == customer.TradeObjectId)
                {
                    var ri = new TriangleViewModel(
                        customer.TradeObjectId,
                        this._customerRepository.FindTradeObjectId(ratingListRow.To2Id),
                        ratingListRow,
                        this._tradeobjectRepository.Find(ratingListRow.To2Id));

                    model.Models3.Add(ri);
                    continue;
                }

                if (ratingListRow.To2Id == customer.TradeObjectId)
                {
                    var ri = new TriangleViewModel(
                        customer.TradeObjectId,
                        this._customerRepository.FindTradeObjectId(ratingListRow.To3Id),
                        ratingListRow,
                        this._tradeobjectRepository.Find(ratingListRow.To3Id));

                    model.Models3.Add(ri);
                    continue;
                }

                if (ratingListRow.To3Id == customer.TradeObjectId)
                {
                    var ri = new TriangleViewModel(
                        customer.TradeObjectId,
                        this._customerRepository.FindTradeObjectId(ratingListRow.To1Id),
                        ratingListRow,
                        this._tradeobjectRepository.Find(ratingListRow.To1Id));
                    model.Models3.Add(ri);
                }
            }

First I get all rows where my object is on the first column, groups them to select only one and then continues to do the same with me on the second and third column. The ToList() here is just temporary for med to be able to run stopwatch on them, each of these takes 0-12 seconds. Then I join them and run through them all to create the model used by the webgrid in front-end code.

This causes two problems: 1. It takes much to long. and 2. If my tradeobject id is on more than one column I will get more than one row presenting more than one o the Tradeobject I'm interested in.

Magnus Karlsson
  • 3,549
  • 3
  • 31
  • 57
  • 1
    If repository is coming from remote source, one optimization is remove `.ToList()` call on individual list, and apply it on `union` – Tilak Jan 01 '13 at 19:10
  • Tilak-They are placed where they are so I can measure the performance of the Linq expressions. – Magnus Karlsson Jan 01 '13 at 19:13
  • 6
    @MagnusKarlsson They are most likely *affecting* your performance. – Andrew Barber Jan 01 '13 at 19:26
  • @Andrew. Yes they are, but they were only there temporary to see what part I need to optimize. – Magnus Karlsson Jan 01 '13 at 22:50
  • Its's actually in the question" The ToList() here is just temporary for med to be able to run stopwatch on them". – Magnus Karlsson Jan 01 '13 at 23:28
  • 4
    Linq is hiding a lot complexity, but the side effect is that Linq is also hiding what happen under the hood. This is ok for simple query, when you can "feel" what happens, but when queries are going complex, I think it's far more preferable to write a pure SQL Query, optimized, in order to control exactly what happens at the database layer. – Steve B Jan 02 '13 at 09:36
  • 2
    @MagnusKarlsson But `ToList()` is a *major* change in how the query is run. You simply can *not* compare performance in this way. ToList() causes your queries to be materialized before they would otherwise be. – Andrew Barber Jan 02 '13 at 11:57
  • @AndrewBarber Ok, I now have my stopwatch beginning above the first linq statement and the last stopwatch right below the Union statement where the materialization now occurs. It takes 25 seconds. Total amount of 42 million records in db. If I remove the ToList() from the Union statement and let each and every linq query materialize I get 24.5 seconds. This way I have concluded it is not the union statemnet that is due to optimize but the linq querys befor that. – Magnus Karlsson Jan 02 '13 at 12:29
  • Please post which SQL product, edition, release your using. E.g. SQL Server, standard, 2008 R2, or MySQL, Community, 5.1 etc. – JM Hicks Jan 05 '13 at 17:50
  • 2
    Please post the execution plan. It is impossible to give a useful answer without that. – usr Jan 05 '13 at 19:46
  • I would simply rewrite the query into a view and then call the data from the view. This also follows the Idea of Steve B. – Schuere Jan 09 '13 at 11:35
  • It would help to post the generated SQL, `db.Log = Console.Out;` see http://msdn.microsoft.com/en-us/library/bb386961.aspx – Colonel Panic Jan 10 '13 at 18:30
  • @JMHicks Used to be localdb locally and SQL azure in production but I changed to SQL server 2008R2 locally to get you guys a profile with querys etc. Haven't had time to figure out how to get something useful from it yet though. – Magnus Karlsson Jan 10 '13 at 20:06
  • @usr I dont know how to do that. "Please post the execution plan." – Magnus Karlsson Jan 10 '13 at 20:07

3 Answers3

1

Try using Database Engine Tuning Advisor to see if adding/removing/changing the indices on your tables significantly improves the performance of the workload presented by your LINQ query.

tomfanning
  • 9,552
  • 4
  • 50
  • 78
  • I couldn't get anything useful from it. Takes a little too much time to learn it... – Magnus Karlsson Jan 10 '13 at 20:10
  • Unless you really understand the suggestions, please don't just blindly trust the "solution" offered by DTA. It might cause you more troubles. – milivojeviCH Jan 11 '13 at 06:29
  • Quite right- it's a reasonable starting point. @magnuskarlsson seriously, persist- or another option is to pick up a good book that covers SQL Server indexing. – tomfanning Jan 11 '13 at 11:27
  • 1
    @tomfanning I read up on indexes and made a workload with sql profiler, ran it with DTA and based on the results I realized I will have to rewrite the workflow of the application and make some smaller changes and then get back to evaluating indexes. Thanks for your time, since you suggested indexing as a performance enhancer first I'll mark your question as the right answer even though it led me in a new direction rather than answer the specific question. BR Magnus – Magnus Karlsson Jan 12 '13 at 20:39
1

Try capturing your queries with the profiler and isolate the top 3 longest running ones. Copy them into the SSMS and execute them. Look for the actual execution plan. Look for table scans or a huge discrepancy between estimated record counts and actual record counts. From here, either statistics are off, or you might consider placing an index to cover the query.

milivojeviCH
  • 1,580
  • 14
  • 29
1

From a performance perspective it may be better to use some sort of stored procedure. LINQ tends to slow things down a lot with these types of queries/lookups. However, if a stored procedure is not enough, there are a few things you can do.

First off, you may want to have a look at Incremental Search , which is written by basically just keeping a search going or "deferring" execution for a given amount of time. That should work on any IEnumerable.

The next thing I would recommend would be to see if you can potentially integrate something similar to the Incremental Search above (by using the Add functions, etc) to make a similar implementation for whatever works with your program.

Seriously though - I have seen HUGE improvements from stored procs in the past that will definitely increase the speed (in my case one query was reduced almost 10 fold!)