Improving a foreach using LINQ

Question

Possible Duplicate:
Delete object and all its child objects in Entity Framework?

This code:

        int WebsiteID = int.Parse(Request.QueryString["id"]);
        Website websiteObj = db.Websites
            .SingleOrDefault(x => x.website_id == WebsiteID);

        foreach (BusinessObjects.Page pageObj in websiteObj.Pages)
        {
            foreach (SubPage subpageObj in pageObj.SubPages)
            {
                pageObj.SubPages.Remove(subpageObj);
            }
            websiteObj.Pages.Remove(pageObj);
        }

        foreach (Sector sectorObj in websiteObj.Sectors)
        {
            foreach (Product productObj in sectorObj.Products)
            {
                sectorObj.Products.Remove(productObj);
            }
            websiteObj.Sectors.Remove(sectorObj);
        }
        db.Websites.DeleteObject(websiteObj);

A website has multiple pages, a page has multiple subpages. A website also has multiple sectors and each sector has multiple products.

I want to delete the website and clear all relationships + entities related to it. I'm sure there are better ways to write the above.

Is there way to improve the logic?

Those are the times where I wonder why people don't just use SQL... — Lucero, Aug 26 '12 at 10:37
Wouldn't it be worse to use SQL? Clearing all relationships and deleting rows etc...? — user1027620, Aug 26 '12 at 10:40
@Lucero Just imagine if databases had some kind of cascade delete functionality — podiluska, Aug 26 '12 at 10:41
No, because SQL works on sets - and if done properly you can have foreign keys cascade the deletion anyways, so that doing a simple delete of the website might even be enough. — Lucero, Aug 26 '12 at 10:41
@Lucero, this are times where I wonder why EF is considered the future rather than Linq2SQL (which has a set delete method though still often not not as clean as the SQL relevant call). — Jon Hanna, Aug 26 '12 at 10:50

score 4 · Accepted Answer · edited Aug 26 '12 at 12:54

The simplest way, is to set up the relationships in the database as ON DELETE CASCADE

Then the Linq becomes:

var sites = db.Websites;
var site_id = int.Parse(Request.QueryString["id"]);
var site = sites.FirstOrDefault(x => x.website_id == site_id);
sites.DeleteObject(site);

The only real changes I've made to your linq apart from removing what is no longer needed if deletes cascade is:

Using FirstOrDefault for a slight performance boost. This isn't too big a deal with databases, as it just does a TOP 2 to implement. It's worth not doing Single when First works, because in other cases it can mean having to examine every single object (only way to do Single against a List<T>)*. Of course, if there's any possibility of there being two sites with the same id, then I've just introduced a bug, but then the real bug is in not having the id as a primary key to ensure that this could never happen.

Taking Obj out of the name of the object. This seemed to be short for "object" and hence to be tautologous - every other variable in every piece of .NET code ever written is for an object, so there's no point in pointing this out.

*A bit more on this because I don't want to encourage bad habits. When there can be only one matching object, then First gets that object while Single gets that object and also makes sure it is the only one (the same applies to the OrDefault variants except they differ when there is no object found at all).

The only way to know there's only one matching object, is to try to find at least two of them and then check you only got one. This is of minor impact with a database table if it's indexed on columns relevant to your search (it does a TOP 2 instead of a TOP 1), but can be serious otherwise.

The question is how important that is for that check to be made. Sometimes you don't care; use First. Sometimes you should have a strong degree of confidence that there is only one such object; use First but do check on the reason for that confidence to make sure you don't have a bug (in this case, check there's a primary or unique key on the index column). If you really need to check then do use Single as using First in such a case is the very epitome of a bad optimisation - being faster but wrong is never the goal.

The only downside is that some people are wary of cascading anything, but their reason is precisely to make sure someone goes to the effort of deleting the related objects and can't delete a whole swathe of data with one delete - so it's not really a relevant objection. — Jon Hanna, Aug 26 '12 at 10:59
@JonHanna exactly, I was just thinking how a simple mistake could wipe a whole lot of data. Should I just force the users to do a manual delete then enable removal of a website? — user1027620, Aug 26 '12 at 11:02
@JonHanna also, another question please. Is it preferable to have the cascade feature on foreign keys in junction tables so that relationships clear when deleting a parent object? or it doesn't work that way. — user1027620, Aug 26 '12 at 11:04
I love `ON DELETE CASCADE` because it lets me do the above (or equivalent with non-EF approaches). The thing to think about is "will I always want to delete the lot, or might I want to transfer some sub-objects to another object instead". In the former case, no bug can be caused by it, in the latter it can though I still wouldn't rule it out - just be that bit more careful about any code (or manual op) that does the delete. It can be v. useful on junction tables in particular, because it only deletes the junction entry. `ON UPDATE CASCADE` can be slightly more surprising so I'm less keen on it — Jon Hanna, Aug 26 '12 at 11:17

Improving a foreach using LINQ

1 Answers1