1

I always thought LINQ to SQL equivalent for an exists query is to use Any(). But i recently wrote a query in LINQ , which basically is trying to find if duplicate records exists in single table.

Anycontext.Contacts.Any(c => ((c.FirstName == contact.FirstName && c.LastName == contact.LastName && c.AddressLine1 == contact.AddressLine1 && c.Zip == contact.Zip)|| (!String.IsNullOrEmpty(contact.Email) && c.Email == contact.Email)))

matching criteria is simple to find contacts with same FirstName, LastName and AddressLine1 or same Email. This query times out in 30 sec(default), there are just 500K rows in this table.

Wherecontext.Contacts.Where(c => ((c.FirstName == contact.FirstName && c.LastName == contact.LastName && c.AddressLine1 == contact.AddressLine1 && c.Zip == contact.Zip)|| (!String.IsNullOrEmpty(contact.Email) && c.Email == contact.Email))).Count()>0

I was forced to use Where clause and then do count greater than 0 to find if any duplicate exists in the set. What i can not understand is, why LINQ to SQL on simple Any clause timing out. Any explanation will be really great here.

EDIT

SQL From from LINQ Pad ANY

SELECT 
(CASE 
    WHEN EXISTS(
        SELECT NULL AS [EMPTY]
        FROM [Accounts].[Contacts] AS [t0]
        WHERE ([t0].[CompanyID] = @p0) AND ((([t0].[FirstName] = @p1) AND ([t0].[LastName] = @p2) AND ([t0].[AddressLine1] = @p3) AND ([t0].[Zip] = @p4)) OR (([t0].[FirstName] = @p5) AND ([t0].[LastName] = @p6) AND (EXISTS(
            SELECT NULL AS [EMPTY]
            FROM [Accounts].[PhoneNumbers] AS [t1]
            WHERE ([t1].[ContactNumber] = @p7) AND ([t1].[ContactID] = [t0].[ContactID])
            ))))
        ) THEN 1
    ELSE 0
 END) AS [value]

Where

SELECT [t0].[ContactID]
    ,[t0].[CompanyID]
    ,[t0].[CompanyTitle]
    ,[t0].[FirstName]
    ,[t0].[LastName]
    ,[t0].[AddressLine1]
    ,[t0].[AddressLine2]
    ,[t0].[City]
    ,[t0].[State]
    ,[t0].[Zip]
    ,[t0].[Email]
    ,[t0].[Salutation]
    ,[t0].[IsActive]
FROM [Accounts].[Contacts] AS [t0]
WHERE ([t0].[CompanyID] = @p0)
    AND (
        (
            ([t0].[FirstName] = @p1)
            AND ([t0].[LastName] = @p2)
            AND ([t0].[AddressLine1] = @p3)
            AND ([t0].[Zip] = @p4)
            )
        OR (
            ([t0].[FirstName] = @p5)
            AND ([t0].[LastName] = @p6)
            AND (
                EXISTS (
                    SELECT NULL AS [EMPTY]
                    FROM [Accounts].[PhoneNumbers] AS [t1]
                    WHERE ([t1].[ContactNumber] = @p7)
                        AND ([t1].[ContactID] = [t0].[ContactID])
                    )
                )
            )
        )
valverij
  • 4,871
  • 1
  • 22
  • 35
Apurav
  • 198
  • 1
  • 11
  • 1
    Problably the parser of linq to sql in `Any()` method is not generating a good query. Do you have a profile to check it? – Felipe Oriani May 29 '13 at 16:44
  • That what is thought, but i dont have the generated SQL. I used LINQ pad and generated SQL was almost same for both except the count part – Apurav May 29 '13 at 16:52
  • I hate to say it, but this is really where a nice LINQ performance tool comes in handy. You might want to take a look at http://stackoverflow.com/questions/1389930/tools-and-techniques-to-optimize-a-linq-to-sql-query I've used Hibernating Rhino's performance tool with great success. It's quite interesting to see what your LINQ queries generate. It helped me a lot in understanding exactly what NOT to do ;-) – Robert McKee May 29 '13 at 17:05
  • Thanks, i ll check it out. i have edited my post with LINQ Pad's SQL equivalent. The query still times out, but if i execute the generated SQL in Mangements studio the result is instantaneous. Know i m getting very confused. – Apurav May 29 '13 at 17:10

1 Answers1

0

Not completely sure if this is what you want, but you could compare, I'm guessing you wont run the test often, as it would be better to test for existence before you input the data to the DB.

If you want to find the duplicates then

var queryA = from a in db.someTable
            select a.value;
foreach(var row in queryA){
Console.Write(queryA.Where(b => b == row).Count() > 1 ? row: "");
}

If you just want to test if it exist then.

var queryA = from a in db.someTable
            select a.value;
var queryB = queryA;
queryB.Distinct();
Console.Write(queryB.Count() != queryA.Count() ? "Yes" : "No");
Thomas Andreè Wang
  • 3,379
  • 6
  • 37
  • 53
  • This will work, but i have to do this run multiple times. Same logic is used to search records and also before insert. And i think foreach will actually increase the algorithm time. so will not be useful. – Apurav Jun 04 '13 at 18:05
  • If you check the important columns before you insert you don't have to do it after. You want to do a double loop on the table anyways as you have to check the data against it self with itself. – Thomas Andreè Wang Jun 05 '13 at 05:17