3

I am currently having to work on a project which uses linq2sql as its database accessing framework, now there are a lot of linq queries which basically do the following:

var result =    from <some_table>
                join <some_other_table>
                join <another_table>
                select <some_other_domain_model> // This is a non linq2SQL poco

return result.Where(<Some_Predicate>);

So for example assume you read 3 tables, and then collate the contents into one big higher level model, for sending to a view. Now ignore the mixing of domains, as that doesn't bother me too much, its the final where clause which does.

Now I have not used Linq2Sql much before so would I be right in saying what is going to happen is:

  1. Generate SQL based off the from, join, join, select linq
  2. Retrieve all rows
  3. Map all this data into one big model (in memory)
  4. Loop through all models and then return only the applicable ones

As this is the crux of my question, it would make sense in my mind if the above flow is what would happen, but it has been debated by people who apparently know the framework a lot better than the 4th step is somehow factored into the SQL generation so it will not be pulling back all records, but I dont know how it could be doing that as it NEEDS all the data up front to populate this which it then applies a separate where clause on, so I assume by the 4th point the rows have all been read and are already in memory.

I am trying to push for them to move their where clause into the linq so that it filters out un-needed records at the database level, however I was wondering if anyone can advise as to if my assumptions above are right?

== Edit ==

Have added comment to draw more attention to the fact that the is not a linq2sql generated object and is some random poco hand rolled elsewhere, just to narrow down where my main focus is on the context of the question. As the question is LESS about "does it matter where I put the where clause" and more about "Does the where clause still get factored into the underlying query when it is applied to a non linq2sql object generated from a linq2sql query".

Here is another more concise example of what I mean hopefully drawing the point more towards where my lack of understanding is:

/*
    I am only going to put auto properties into the linq2sql entities,
    although in the real world they would be a mix of private backing
    fields with public properties doing the notiftying.
*/

[global::System.Data.Linq.Mapping.TableAttribute(Name="dbo.some_table_1")]
public class SomeLinq2SqlTable1
{
    [global::System.Data.Linq.Mapping.ColumnAttribute(Storage="some_table_1_id", AutoSync=AutoSync.OnInsert, DbType="Int NOT NULL IDENTITY", IsPrimaryKey=true, IsDbGenerated=true)]
    public int Id {get;set;}
}

[global::System.Data.Linq.Mapping.TableAttribute(Name="dbo.some_table_2")]
public class SomeLinq2SqlTable2
{
    [global::System.Data.Linq.Mapping.ColumnAttribute(Storage="some_table_2_id", AutoSync=AutoSync.OnInsert, DbType="Int NOT NULL", IsPrimaryKey=true, IsDbGenerated=true)]
    public int Id {get;set;}

    [global::System.Data.Linq.Mapping.ColumnAttribute(Storage="some_table_2_name", AutoSync=AutoSync.OnInsert, DbType="Varchar NOT NULL", IsPrimaryKey=false)]
    public string Name {get;set;}
}

[global::System.Data.Linq.Mapping.TableAttribute(Name="dbo.some_table_3")]
public class SomeLinq2SqlTable3
{
    [global::System.Data.Linq.Mapping.ColumnAttribute(Storage="some_table_3_id", AutoSync=AutoSync.OnInsert, DbType="Int NOT NULL", IsPrimaryKey=true, IsDbGenerated=true)]
    public int Id {get;set;}

    [global::System.Data.Linq.Mapping.ColumnAttribute(Storage="some_table_3_other", AutoSync=AutoSync.OnInsert, DbType="Varchar NOT NULL", IsPrimaryKey=false)]
    public string Other {get;set;}
}

/*
    This is some hand rolled Poco, has NOTHING to do with Linq2Sql, think of it as 
    a view model of sorts.
*/
public class SomeViewModel
{
    public int Id {get;set;}
    public string Name {get;set;}
    public string Other {get;set;}
}

/*
    Here is psudo query to join all tables, then populate the
    viewmodel item from the query and finally do a where clause
    on the viewmodel objects.
*/
var result =    from // Linq2SqlTable1 as t1
                join // Linq2SqlTable2.id on Linq2SqlTable1.id as t2
                join // Linq2SqlTable3.id on Linq2SqlTable1.id as t3
                select new ViewModel { Id = t1.Id, Name = t2.Name, Other = t3.Other }

return result.Where(viewModel => viewModel.Name.Contains("some-guff"));

So given the example above, will the final Where statement be factored into the underlying query, or will the where on the viewModel cause a retrieval and then evaluate in memory?

Sorry for the verbosity to this question but there is very little documentation about it, and this is quite a specific question.

Grofit
  • 17,693
  • 24
  • 96
  • 176

3 Answers3

5

You do not need to push the Where clause any higher. It is fine where it is, as long as result is IQueryable<T> (for some T). LINQ is composable. Indeed, there's absolutely no difference between using the LINQ syntax as using the extension-method syntax, and either would work identically. Basically, when you create a query, it is only building a model of what has been requested. Nothing is executed until you start iterating it (foreach, ToList(), etc). So adding an extra Where on the end is fine: that will get built into the composed query.

You can verify this very simply by monitoring the SQL connection; you'll see that it includes the where clause in the TSQL, and filters at the SQL server.

This allows for some interesting scenarios, for example a flexible search:

IQueryable<Customer> query = db.Customers;
if(name != null) query = query.Where(x => x.Name == name);
if(region != null) query = query.Where(x => x.Region == region);
...
if(dob != null) query = query.Where(x => x.DoB == dob);
var results = query.Take(50).ToList();

In terms of your assumptions, they are incorrect - it is really:

  1. build composable query, composing (separately) from, join, join, select
  2. further compose the query, adding a where (no different to the above compositions)
  3. at some point later, iterate the query
    1. generate sql from the fully-composed query
    2. retreive rows
    3. map into model
    4. yield the results

note that the sql generation only happens when the query is iterated; until then you can keep composing it all day long. It doesn't touch the SQL server until it is iterated.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • ok so its all lazy loaded, that makes sense, the bit I dont get though is say if there were 100 rows and I were to be taking the result of these 3 joined tables and putting it into some random model i.e. `Select new RandomModel { id = table1.Id, name = table2.Name, other = table3.data}` then applying a where clause, such as `randomModel => randomModel.Name.Contains("foo");` surely it can only do this by either pulling back all 100 rows, composing the list then evaluating each one in memory, or pull back each individual row to create the object, then in memory do the where clause. – Grofit Nov 29 '12 at 10:28
  • @Grofit there is absolutely no difference **whatsoever** between including the `where` in the original LINQ, vs applying a `Where` separately. It can compose the where *into the SQL*. LINQ syntax is **just** a convenience wrapper around the underling .Where, .Join, etc. All LINQ queries are composed with multiple separate calls. The system **won't even be able to detect** that the `Where` was added later, vs added in the original LINQ. Tl;dr: it can put the `where` into the TSQL. – Marc Gravell Nov 29 '12 at 10:39
  • How does it know though? I am more than happy to accept its space magic and move on with my life, but in your example you are working on a linq2sql entity called Customer and filtering on that, which is fine thats 100% comprenhedable. However in my scenario I am doing the `Where` clause against a non linq2sql object, which is just some other random domain entity, which has no knowledge of linq2sql it is just a poco. So how does linq2sql know that when I do Where on the selected pocos the `randomModel.name.contains("foo")` would need to generate `table2.name LIKE '%foo%'` in the sql? – Grofit Nov 29 '12 at 10:45
  • Sorry to pester, but I have edited the original question to hilight the part i'm having trouble wrapping my head around. As your example only works with Linq2Sql objects and mine is a mix of both I was wondering if you could just confirm that it would act as you say given the 2nd example above? Then I will be happy to give out the answer once this specific area of the question is answered. – Grofit Nov 30 '12 at 09:20
  • @Grofit have you tried simply profiling the sql generated? either sql-trace, or just set `db.Log = Console.Out;`? that is *fine*. It never actually creates a `SomeViewModel` per row - it just tracks that the view-model `Name` is the same as `t2.Name`, and uses that. If you look at how `join` and `let` etc work under the hood, the reason that this is trivial becomes clear: in reality, there are **lots** of hidden wider types being generated to represent the composed joined model (they are called something like "opaque identifiers" in the spec). The parser already knows how to do that. – Marc Gravell Nov 30 '12 at 09:26
  • Thanks for clearing this up, I would LOVE to go on the SQL server and look at the trace calls, but I am unable to due to political reasons. Anyway you have been SUPER helpful and I will give you the answer! – Grofit Nov 30 '12 at 09:31
  • 1
    @Grofit you don't have to: LINQ-to-SQL has the `.Log` property, to which you can attach any text output (such as `Console.Out`, or a `StringWriter` IIRC) - allowing you to monitor it. Alternatively, tools like [mini-profiler](http://nuget.org/packages/miniprofiler) can be used to log all ADO.NET traffic (for developers, obviously). If I (as a developer) load a page on my sites, I can see **all** the SQL operations that happened. And lots of other things. – Marc Gravell Nov 30 '12 at 09:46
0

I did my little research about LINQtoSQL best practices, cause I'm always using this technology with my projects. Take a look to my blog post. maybe it can help you.

http://msguy.net/post/2012/03/20/LINQ-to-SQL-Practices-and-approaches.aspx

Michael Samteladze
  • 1,310
  • 15
  • 38
  • Thanks for the link, I never use it and prefer NHibernate or if MS tech is required EntityFramework, I believe microsoft stopped supporting Linq2Sql as it was just a filler until EF was released. So you may want to look towards one of the newer frameworks if you are still using this framework. – Grofit Nov 29 '12 at 16:54
0

The provider knows how the populated properties from your custom model are mapped (because of the select clause in your query) to the actual columns on the database table. So it knows what column on the table it needs to filter when you filter on a property of your custom model. Think of your selected model (weather it be a designer defined entity with all of the columns of the table, or a custom model defined somewhere, or an anonymous type with just the data you need) as just the selected columns before the FROM clause in the SQL query. selecting an anonymous model makes it easily recognizable that the fields in the model correspond to the SELECT list in SQL.

Most important: always remember that var result = from ... is just a query... until it gets iterated result.ToArray(). Try to call your variable query instead of result, and the world may get new new colors when you look again.

Lawrence Ward
  • 549
  • 1
  • 5
  • 17