67

I have to perform the following SQL query:

select answer_nbr, count(distinct user_nbr)
from tpoll_answer
where poll_nbr = 16
group by answer_nbr

The LINQ to SQL query

from a in tpoll_answer 
where a.poll_nbr = 16 select a.answer_nbr, a.user_nbr distinct 

maps to the following SQL query:

select distinct answer_nbr, distinct user_nbr
from tpoll_answer
where poll_nbr = 16

So far, so good. However the problem raises when trying to GROUP the results, as I'm not being able to find a LINQ to SQL query that maps to the first query I wrote here (thank you LINQPad for making this process a lot easier). The following is the only one that I've found that gives me the desired result:

from answer in tpoll_answer where answer.poll_nbr = 16 _
group by a_id = answer.answer_nbr into votes = count(answer.user_nbr)

Which in turns produces the follwing ugly and non-optimized at all SQL query:

SELECT [t1].[answer_nbr] AS [a_id], (
    SELECT COUNT(*)
    FROM (
        SELECT CONVERT(Bit,[t2].[user_nbr]) AS [value], [t2].[answer_nbr], [t2].[poll_nbr]
        FROM [TPOLL_ANSWER] AS [t2]
        ) AS [t3]
    WHERE ([t3].[value] = 1) AND ([t1].[answer_nbr] = [t3].[answer_nbr]) AND ([t3].[poll_nbr] = @p0)
    ) AS [votes]
FROM (
    SELECT [t0].[answer_nbr]
    FROM [TPOLL_ANSWER] AS [t0]
    WHERE [t0].[poll_nbr] = @p0
    GROUP BY [t0].[answer_nbr]
    ) AS [t1]
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [16]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.1

Any help will be more than appreciated.

Brian Webster
  • 30,033
  • 48
  • 152
  • 225
Leandro López
  • 2,185
  • 1
  • 15
  • 17

6 Answers6

105

There isn't direct support for COUNT(DISTINCT {x})), but you can simulate it from an IGrouping<,> (i.e. what group by returns); I'm afraid I only "do" C#, so you'll have to translate to VB...

 select new
 {
     Foo= grp.Key,
     Bar= grp.Select(x => x.SomeField).Distinct().Count()
 };

Here's a Northwind example:

    using(var ctx = new DataClasses1DataContext())
    {
        ctx.Log = Console.Out; // log TSQL to console
        var qry = from cust in ctx.Customers
                  where cust.CustomerID != ""
                  group cust by cust.Country
                  into grp
                  select new
                  {
                      Country = grp.Key,
                      Count = grp.Select(x => x.City).Distinct().Count()
                  };

        foreach(var row in qry.OrderBy(x=>x.Country))
        {
            Console.WriteLine("{0}: {1}", row.Country, row.Count);
        }
    }

The TSQL isn't quite what we'd like, but it does the job:

SELECT [t1].[Country], (
    SELECT COUNT(*)
    FROM (
        SELECT DISTINCT [t2].[City]
        FROM [dbo].[Customers] AS [t2]
        WHERE ((([t1].[Country] IS NULL) AND ([t2].[Country] IS NULL)) OR (([t1]
.[Country] IS NOT NULL) AND ([t2].[Country] IS NOT NULL) AND ([t1].[Country] = [
t2].[Country]))) AND ([t2].[CustomerID] <> @p0)
        ) AS [t3]
    ) AS [Count]
FROM (
    SELECT [t0].[Country]
    FROM [dbo].[Customers] AS [t0]
    WHERE [t0].[CustomerID] <> @p0
    GROUP BY [t0].[Country]
    ) AS [t1]
-- @p0: Input NVarChar (Size = 0; Prec = 0; Scale = 0) []
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.1

The results, however, are correct- verifyable by running it manually:

        const string sql = @"
SELECT c.Country, COUNT(DISTINCT c.City) AS [Count]
FROM Customers c
WHERE c.CustomerID != ''
GROUP BY c.Country
ORDER BY c.Country";
        var qry2 = ctx.ExecuteQuery<QueryResult>(sql);
        foreach(var row in qry2)
        {
            Console.WriteLine("{0}: {1}", row.Country, row.Count);
        }

With definition:

class QueryResult
{
    public string Country { get; set; }
    public int Count { get; set; }
}
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • 1
    Thank you Marc. Yes, I also considered this approach, in fact a very similar query that returns the correct results. Sadly it is a shame the SQL generated is not optimal, but it works and that´s enough for the time being. – Leandro López Jan 16 '09 at 10:26
  • 4
    @Leandro - it might be more TSQL, but I'd compare the actual query plans before claiming anything about optimal - it might well be identical to the optimiser. – Marc Gravell Jan 16 '09 at 10:34
  • IF the sql is like crap, why don't you write it in plain sql instead in the first place? – marko Aug 29 '10 at 10:14
  • 1
    This is great, thanks! It's exactly what I needed for my POCO collections :-) I'm not using a SQL backend, so plain SQL wasn't an option for me. – Bennett Dill Mar 11 '11 at 03:35
  • FYI, distinct is totally supported now. – mcfea Oct 03 '16 at 22:43
  • I compared both queries in execution plan and it shows linq generated query cost as 55% and the optimum query as 45%. So, sadly the query generated by linq is not optimal :( – Oncel Umut TURER Aug 18 '17 at 10:40
11

The Northwind example cited by Marc Gravell can be rewritten with the City column selected directly by the group statement:

from cust in ctx.Customers
where cust.CustomerID != ""
group cust.City /*here*/ by cust.Country
into grp
select new
{
        Country = grp.Key,
        Count = grp.Distinct().Count()
};
Alex Kamburov
  • 405
  • 4
  • 10
  • I was excited about this syntax, but the SQL generated exactly the same as above cited by Marc Gravell – Sam May 25 '22 at 23:05
1

Linq to sql has no support for Count(Distinct ...). You therefore have to map a .NET method in code onto a Sql server function (thus Count(distinct.. )) and use that.

btw, it doesn't help if you post pseudo code copied from a toolkit in a format that's neither VB.NET nor C#.

Frans Bouma
  • 8,259
  • 1
  • 27
  • 28
  • Thank you Frans, I have considering creating a view or stored procedure. Thank you very much Marc. – Leandro López Jan 16 '09 at 10:13
  • @Leandro - you might also consider a table-valued function (UDF) in place of a stored procedure; the system can get the metadata more accurately, and it is composable at the server. Only works with LINQ-to-SQL, though - not Entity Framework (AFAIK). – Marc Gravell Jan 16 '09 at 10:31
1

This is how you do a distinct count query. Note that you have to filter out the nulls.

var useranswercount = (from a in tpoll_answer
where user_nbr != null && answer_nbr != null
select user_nbr).Distinct().Count();

If you combine this with into your current grouping code, I think you'll have your solution.

GeekyMonkey
  • 12,478
  • 6
  • 33
  • 39
  • Thank you GeekyMonkey. Columns are NOT NULL, so it is one less thing to worry about. If I'm not mistaken, I think your query the total count and not a grouped by count. – Leandro López Jan 16 '09 at 10:24
1

simple and clean example of how group by works in LINQ

http://www.a2zmenu.com/LINQ/LINQ-to-SQL-Group-By-Operator.aspx

rs.emenu
  • 35
  • 1
-5

I wouldn't bother doing it in Linq2SQL. Create a stored Procedure for the query you want and understand and then create the object to the stored procedure in the framework or just connect direct to it.