2

I'm trying to build a lazy loading tree using Doctrine MongoDB. My document is structured as follows:

/**
 * @ODM\Document(repositoryClass="CmsPage\Repository\PageRepository")
 */
class Page
{
    /**
     * @ODM\String
     * @var string
     */
    protected $title;

    /**
     * @ODM\ReferenceOne(targetDocument="CmsPage\Document\Page", inversedBy="children")
     * @ODM\Index
     * @var Page
     */
    protected $parent;

    /**
     * @ODM\ReferenceMany(
     *     targetDocument="CmsPage\Document\Page", mappedBy="parent",
     *     sort={"title": "asc"}
     * )
     * @var array
     */
    protected $children;

    /**
     * Default constructor
     */
    public function __construct()
    {
        $this->children = new ArrayCollection();
    }

    /**
     * @return ArrayCollection|Page[]
     */
    public function getChildren()
    {
        return $this->children;
    }

    /**
     * @param ArrayCollection $children
     */
    public function setChildren($children)
    {
        $this->children = $children;
    }

    /**
     * @return Page
     */
    public function getParent()
    {
        return $this->parent;
    }

    /**
     * @param Page $parent
     */
    public function setParent($parent)
    {
        $this->parent = $parent;
    }

    /**
     * @return string
     */
    public function getTitle()
    {
        return $this->title;
    }

    /**
     * @param string $title
     */
    public function setTitle($title)
    {
        $this->title = $title;
    }
}

The following code will retrieve all children for a given page:

$page = $pageRepo->find('foo');
$children = [];

foreach ($page->getChildren() as $childPage) {
    $children[] = [
        'id' => $childPage->getId(),
        'slug' => $childPage->getSlug(),
        'leaf' => ($childPage->getChildren()->count() == 0)
    ];

This is working as expected but will execute a seperate query for each child page to check if it is a leaf. When dealing with a large tree with lots of child nodes it will not be efficient.

I could introduce a boolean isLeaf in my Page document and update it when persisting. But this also means I have to update the parent when adding or removing a child.

Do you have any pointers to solve this problem?

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Bram Gerritsen
  • 7,178
  • 4
  • 35
  • 45

1 Answers1

4

The most efficient way I know of in MongoDB to test that an array is not empty is to search for the presence of the "first" element in the array using "dot notation" and $exists. There is access in the query builder for this:

$qb = $dm->createQueryBuilder('Page')
    ->field('children.0')->exists(true);

That's the same as this in the shell:

db.collection.find({ "children.0": { "$exists": true } })

So 0 is the index of the first element in an array and is only present when there is some content in that array. Empty arrays do not match this condition.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • Thanks for your answer. I have 2 concerns. This will only retrieve all pages which have at least one child. I need all pages and a flag which indicates if the page has children. Also this method will still trigger 1*n queries for all pages imo, but I'll need to test that first. – Bram Gerritsen Jan 14 '15 at 08:53
  • I can't use this method because I have modelled the tree using [parent references](http://docs.mongodb.org/manual/tutorial/model-tree-structures-with-parent-references/). My document only has a `parent` field in MongoDB. – Bram Gerritsen Jan 14 '15 at 09:09
  • @BramGerritsen Does your model not define children as an array? It appears so. It would also be the logical thing to do. – Neil Lunn Jan 14 '15 at 09:11
  • This is doctrine magic. `ReferenceMany` and mappedBy parent. This will allow you to retrieve children using lazy loading. MongoDB has a [reference](http://docs.mongodb.org/manual/applications/data-models-tree-structures/) which describes the different tree implementations. You can choose eighter child references of parent references, but not both. parent references best suits my use case as it is easy to implement and the children change a lot, so the other modelling options will be more difficult to implement. – Bram Gerritsen Jan 14 '15 at 09:18
  • @BramGerritsen Sooo Here's the rub. You need to find children as your common usage pattern. Do you A. Store a parent reference on each child and re-inspect your collection based on the current value of the present document node ( lots of queries ). Or B. Store an array of a nodes immediate children that you can access from the node itself. My point is it's not always about adapting the solution to what you have, but adapting your design to the best solution. – Neil Lunn Jan 14 '15 at 09:24
  • Yes you are right, option A is how I do it now. I'll try to implement option B and see if it fits my needs, this meens I need to change some services and repositories. If it does work I can use the queryBuilder as in your example. Alternatively I can keep the parent reference model and persist a boolean property as I mentioned in my initial question. This will make it super fast to query, but needs some extra persist logic. – Bram Gerritsen Jan 14 '15 at 09:40
  • @BramGerritsen Boolean properties are great and of course you can use an index, so even better. But don't discount the benefits of keeping a list of child nodes in the parent, as that has other uses as well, such as immediately knowing which nodes are the children. I prefer "two way" binding when modelling references. Parent has list of children, Child has list/reference to parent. If you think about it, that's the best way to do it in RDBMS, but with an additional "table" in between. – Neil Lunn Jan 14 '15 at 09:45