How does the Repository Pattern work if Entities are related to each other?

Question

There is a question about IRepository and what it is used for, that has a seemingly good answer.

My problem though: How would I cleanly deal with entities that are related to each other, and isn't IRepository then just a layer without real purpose?

Let's say I have these business objects:

public class Region {
    public Guid InternalId {get; set;}
    public string Name {get; set;}
    public ICollection<Location> Locations {get; set;}
    public Location DefaultLocation {get; set;}
}

public class Location {
    public Guid InternalId {get; set;}
    public string Name {get; set;}
    public Guid RegionId {get; set;}
}

There are rules:

Every Region MUST have at least one location
Newly created Regions are created with a location
No SELECT N+1 please

So how would my RegionRepository look like?

public class RegionRepository : IRepository<Region>
{
    // Linq To Sql, injected through constructor
    private Func<DataContext> _l2sfactory;

    public ICollection<Region> GetAll(){
         using(var db = _l2sfactory()) {
             return db.GetTable<DbRegion>()
                      .Select(dbr => MapDbObject(dbr))
                      .ToList();
         }
    } 

     private Region MapDbObject(DbRegion dbRegion) {
         if(dbRegion == null) return null;

         return new Region {
            InternalId = dbRegion.ID,
            Name = dbRegion.Name,
            // Locations is EntitySet<DbLocation>
            Locations = dbRegion.Locations.Select(loc => MapLoc(loc)).ToList(),
            // DefaultLocation is EntityRef<DbLocation>
            DefaultLocation = MapLoc(dbRegion.DefaultLocation)
         }
     }

     private Location MapLoc(DbLocation dbLocation) {
         // Where should this come from?
     }
}

So as you see, a RegionRepository needs to fetch locations as well. In my example, I use Linq To Sql EntitySet/EntiryRef, but now Region needs to deal with mapping Locations to Business Objects (because I have two sets of objects, business and L2S objects).

Should I refactor this to something like:

public class RegionRepository : IRepository<Region>
{
    private IRepository<Location> _locationRepo;

    // snip

    private Region MapDbObject(DbRegion dbRegion) {
         if(dbRegion == null) return null;

         return new Region {
            InternalId = dbRegion.ID,
            Name = dbRegion.Name,
            // Now, LocationRepo needs to concern itself with Regions...
            Locations = _locationRepo.GetAllForRegion(dbRegion.ID),
            // DefaultLocation is a uniqueidentifier
            DefaultLocation = _locationRepo.Get(dbRegion.DefaultLocationId)
         }  
  }

Now I have nicely separated my data layer into atomic repositories, only dealing with one type each. I fire up the Profiler and... Whoops, SELECT N+1. Because each Region calls the location service. We only have a dozen regions and 40 or so location, so the natural optimization is to use DataLoadOptions. The problem is that RegionRepository doesn't know if LocationRepository is using the same DataContext or not. We are injecting factories here after all, so LocationRepository might spin up it's own. And even if it doesn't - I'm calling a service method that provides business objects, so the DataLoadOptions may not be used anyway.

Ah, I overlooked something. IRepository is supposed to have a method like this:

public IQueryable<T> Query()

So now I would do

         return new Region {
            InternalId = dbRegion.ID,
            Name = dbRegion.Name,
            // Now, LocationRepo needs to concern itself with Regions...
            Locations = _locationRepo.Query()
                        .Select(loc => loc.RegionId == dbRegion.ID)
                        .ToList(),
            // DefaultLocation is a uniqueidentifier
            DefaultLocation = _locationRepo.Get(dbRegion.DefaultLocationId)
         }

That looks good. At first. On second inspection,I have separate business and L2S objects, so I still don't see how this avoids SELECT N+1 since Query can not just return GetTable<DbLocation>.

The problem seems to be having two different sets of objects. But if I decorate Business Objects with all the System.Data.LINQ attributes ([Table], [Column] etc.), that breaks the abstraction and defeats the purpose of IRepository. Because maybe I want to also be able to use some other ORM, at which point I would now have to decorate my Business Entities with other attributes (also, if the business entities are in a separate .Business assembly, consumers of it now need to reference all ORMs as well for the attributes to be resolved - yuck!).

To me, it seems that IRepository should be IService, and the above class should look like this:

public class RegionService : IRegionService {
      private Func<DataContext> _l2sfactory;

      public void Create(Region newRegion) {
        // Responsibility 1: Business Validation
        // This could of course move into the Region class as
        // a bool IsValid(), but that doesn't change the fact that
        // the service concerns itself with validation
        if(newRegion.Locations == null || newRegion.Locations.Count == 0){
           throw new Exception("...");
        }

        if(newRegion.DefaultLocation == null){
          newRegion.DefaultLocation = newRegion.Locations.First();
        }

        // Responsibility 2: Data Insertion, incl. Foreign Keys
        using(var db = _l2sfactory()){
            var dbRegion = new DbRegion {
                ...
            }

            // Use EntitySet to insert Locations as well
            foreach(var location in newRegion.Locations){
                var dbLocation = new DbLocation {

                }
                dbRegion.Locations.Add(dbLocation);
            }

            // Insert Region AND all Locations
            db.InsertOnSubmit(dbRegion);
            db.SubmitChanges();
        }
      }
}

This also solves a chicken-egg problem:

DbRegion.ID is generated by the database (as newid()) and IsDbGenerated = true is set
DbRegion.DefaultLocationId is a non-nullable GUID
DbRegion.DefaultLocationId is a FK into Location.ID
DbLocation.RegionId is a non-nullable GUID and a FK into Region.ID

Doing this without EntitySet is pretty much impossible, so unless you sacrifice data integrity on the database and move it into the business logic, it's impossible to keep responsibility about Locations out of the Region provider.

I see how this posting can be seen as not a real question, subjective and argumentative, so please allow me to formulate a objective questions:

What exactly is the Repository Pattern supposed to abstract away?
In the real world, how do people optimize their database layer without breaking the abstraction the Repository Pattern is supposed to achieve?
Specifically, how does the real world deal with SELECT N+1 and data integrity concerns?

I guess my real question is this:

When already using an ORM (like Linq To Sql), isn't DataContext already my Repository, and thus a Repository on top of DataContext is just abstracting the very same thing again?

score 4 · Answer 1 · answered Nov 12 '11 at 09:06

4

When designing your repositories you should think about what is known as the aggregate root. Essentially this means that if an entity can exist alone within the domain it will more than lkely have it's own repository. In your case this would be the Region.

Consider the classic customer/order scenario. The Customer repository would provide access to Orders since an order cannot exist without a customer and therefore unless you have a valid business case for it, you are unlikely to need a separate Order repository.

In a simple application your assumption may well be correct but remember that unless you provide an abstraction of your L2S context you'll struggle to perform effective unit testing. Coding against an interface, whether that be an IServiceX, IRepositoryX or whatever affords you that level of separation.

The decision as to whether Service interfaces come into the design is generally related again to the complexity of the business logic and the need for an extensible Api into that logic that maybe consumed by several disparate clients.

answered Nov 12 '11 at 09:06

Darren Lewis

8,338
3
35
55

1

The problem I see is with the Order example is a ripple effect: An Order has OrderItems, which cannot exist without an Order. However, I may have a function for Inventory Tracking that tracks how many units were sold. This is a simple `SUM(Quantity) FROM OrderItems GROUP BY ItemId` SQL call, but now my Inventory has to go to the Customer Repository to get all Orders to get all OrderItems. Unless CustomerRepoistory provides a `IDictionary GetItemSalesQuantities` which seems way out of place on a CustomerRepository. – Michael Stum Nov 12 '11 at 09:13
1

I wouldn't consider it unreasonable to see an InventoryRepository in this instance. There's no direct relationship between customer and inventory and therefore no requirement for a set based operation across the two here which is what causes the pain and selectn+1. – Darren Lewis Nov 12 '11 at 09:22
And now we are getting a new Business Rule: I want the counts, but only for VIP Customers in South America, and only for Orders placed in the last three months. (Something like that actually happened :(). This is still the job of the InventoryRepository, but now it needs to access more than one database table. At that point I'd argue it's no longer a simple repository but a business service class? – Michael Stum Nov 12 '11 at 09:27
1

Your rule covers Customers and Orders and therefore is a good fit in the Customer repository as I wouldn't see this as a specific inventory query. However, we are getting into the design of your specific system here so I fear this question may get closed if we continue. I'd happily continue this discussion over on programmer if needed? – Darren Lewis Nov 12 '11 at 09:39

score 1 · Answer 2 · answered Nov 12 '11 at 08:53

I have several thoughts about all this: 1. AFAIK Repository pattern was invented a bit earlier then ORM. Back in the days on plain SQL queries it was quite a good idea to implement Repository, and buy this abstract your code from the actual database used. 2. I could say that Repository is completely not needed now, but unfortunately, from my experience I can't say, that any ORM can truly abstract you from all database details. E.g. I could not once create an ORM mapping and just use it with any other DB server, that ORM claim to support (particularly I'm talking about Microsoft EF). So if you really want to be able to use different database servers, then you propably still need to use Repository. 3. Another concern is very simple: code duplication. For sure, there some queries that you call frequently it your code. If you leave only ORM as your repository, then you'll be duplicating those queries, so it will be better to have some level of abstraction over ORM container, that would hold those common used queries.

How does the Repository Pattern work if Entities are related to each other?

2 Answers2