0

I understand that the repository pattern abstracts the persistence of domain objects, allowing a developer to read/write/delete objects from persistent storage without knowing how the object is stored (SQL, NoSQL, flat files, etc). I'm quite fond of the repository pattern and find it works well in many situtations, e.g., abstracting the business logic from the persistence logic, allowing lazy-loading of objects where appropriate, etc.

However, what I'm not clear on is whether or not the repository object maintains a reference to all objects or not? For example:

Repository repository;
std::shared_ptr<Person> pPerson = repository.retrievePersonById("bob");

p->updateDetails("Bob", "the Builder");
repository.savePerson(p);
  1. In the above hypothetical C++ example, should repository maintain a reference to the Person instance returned? Yes - thanks Guillaume31 for your in-memory analogy.
  2. If the answer to Q1 is "yes", when and how should repository remove that instance of Person? Presumably this is when the reference count reaches zero.
  3. If the answer to Q1 is "no", how do you handle situations when another area of wants access to the same object, but because repository no longer stores an internal reference to it, it hydrates a fresh copy from the database, and you effectively have two instances of Person when you really should be referring to the same instance? Given that the answer to Q1 was "Yes", then the repository always maintains a reference and so should always return the same object.

I'm actually writing a PHP application, despite the fact that the above example is in C++.

Magnus
  • 101
  • 5

1 Answers1

1

A more appropriate metaphor for a repository might be that it is the illusion of an in-memory collection of objects. Take your basic collection type from any OO language. If you get an element from that collection and modify the element, you typically don't have to save it back to the collection afterwards, because it has never ceased being in the collection.

Same goes for a repository -- it serves objects, can add objects to itself, but doesn't expose any features for saving modifications to an underlying storage. Indeed, it's all about hiding the existence of an underlying storage. It doesn't expose any methods to "update" the state of an entity either, because the entity it served is in memory and you can modify it freely, it's never out of sync.

It's also better if the Repository keeps its hands off transaction management and committing units of work. You should delegate that to the client (see Domain Driven Design p. 156).

To answer your questions, inside a business transaction, you shouldn't assume anything about the freshness of objects returned by a Repository. They just reflect the states of some entities at some point in time, all you have to do is take them as they are and use them. At a more global level, some external mechanism (typically, an ORM tool) will provide you with the ability to manage how you isolate your little business transaction from others, usually in the form of a Unit of Work implementation. Flushing changes to the database and handling potential stale entities problems is not a per repository query decision, it's a more global, business transaction level decision that should happen only when you decide your use case is finished and you want to commit it.

guillaume31
  • 13,738
  • 1
  • 32
  • 51
  • This answer is correct, and the question has been updated to reflect this. However, it still doesn't answers Q2. E.g., if the repository is never "flushed" (that is, it internally is reduced in size - not that the information is flushed to a storage), then for a long running process, the entire database (potentially hundereds of TB) would be stored in memory. How would one get around this situation? – Magnus Jan 18 '15 at 21:39
  • You inferred that my answer to your first question was yes, but it isn't. A Repository is the *illusion* of an in-memory collection in that it *externally* behaves as such. But the Repository pattern doesn't say anything about whether it should keep references to returned objects. In an implementation where it fetches data from a third party, why would it ? – guillaume31 Jan 19 '15 at 08:11
  • The reason for my question was so that I didn't hydrate multiple instances of a class that represent the same object. If the object's are not cached for some predefined period (e.g., per HTTP request), then the illusion of an in memory data structure reveals itself as an illusion, and you end up with multiple `Person` objects with the same ID but different properties (my domain is actually not a simple `Person` object). I'm happy to edit my question if you can illuminate to me how to resolve this issue? – Magnus Jan 19 '15 at 21:37
  • 1
    I get your point. Some ORMs implement the Identity Map pattern (http://martinfowler.com/eaaCatalog/identityMap.html) to solve that problem. You could have it in your own Repository implementation if you don't use an ORM. However, it's a cache that only acts inside a unit of work, I wouldn't recommend it to span the duration of a "long running process". Also, it's always a good idea to override equality in your entities so that it compares ID's instead of references. – guillaume31 Jan 20 '15 at 09:33