1

I'm going to have a database of articles, or pages, sort of like wiki pages. All article names will need to be unique, much like a wiki page. They're going to be organized in a flexible-depth hierarchy, where each article can be a child or parent of multiple other articles--all categories are represented by articles.

A user should be able to jump straight to any page--perhaps by typing in its unique name (either into a box on the home page, or domain.com/articles/name), or searching and choosing a result, or clicking a link from another article. However, they should also be able to drill down through the categories, with breadcrumbs visible for all categories the current article belongs to (there shouldn't be more than 2 or 3, generally), and the ability to view all (or as many as are reasonable) child articles.

For example, there may be:

  • People
    • Male
      • Bob
      • Jim
    • Female
      • Alice
      • Sally
    • Employees
      • Bob
      • Sally
  • Edibles
    • Food
      • Snacks
        • Chips
          • Salt & Vinegar
          • Sour Cream & Onion
        • Popcorn
      • Pizza
      • Sushi
    • Drink
      • Soda
        • Cola
        • Root Beer
      • Juice
        • Apple Juice
        • Orange Juice
        • Grape Juice

For example, if someone goes to Bob's page, they'll see all the information about Bob, as well as:

Parents:
Articles > People > Male > (Bob)
Articles > People > Employees > (Bob)
Children:
(none)

If someone goes to the page for Snacks, they'd see the article about snacks, as well as:

Parents: 
Articles > Edibles > Food > (Snacks)
Children:
(Snacks) > Chips
(Snacks) > Chips > Salt & Vinegar
(Snacks) > Chips > Sour Cream & Onion
(Snacks) > Popcorn

I don't even have an idea of where to begin, here. What kind of nightmare am I going to have on my hands, in regards to setting up SQL tables to do this, and what are these queries going to look like? I plan on implementing some kind of caching, but I'd rather not have to rely on it for performance. To keep things working fine on average shared hosting, and to avoid mixing up my databases, I'd rather keep all of this in MySQL or PostgreSQL, which are necessary for the rest of the site. PHP will be used for the rest of the site, and I'd like to offload whatever work I can into PHP when it makes sense to do so. If another technology, like a non-relational database, would make this massively easier, I might be able to work with it.

I can already see the issue of Article 1 > Article 2 > Article 1 causing horrible issues, but I don't see any harm in simply not allowing an article to list one of its ancestors as a child.

Aside from help with trying to figure out how to implement all this in the first place, are there any other major pitfalls I'm missing?

Eph
  • 53
  • 5

2 Answers2

3

This is pretty trivial, really, it's a run-off-the-mill tree. :)
The table schema for this will look like:

id            unique id, perhaps the unique name
parent_id     id of parent element
lft           MPTT left field
rght          MPTT right field
...           any other fields you like

INDEX UNIQUE(id)

And that's it. Each article has one and only one parent article. The top elements may either have NULL as their parent id, or you'll have only one top node with NULL which all other articles are a child of. That depends on your preference. That's actually all you need, but to ease fetching of records, you'll use the lft and rght fields in an MPTT logic. Read this introductory article to learn what they're for. With these fields you can also avoid setting an entry as a child of its own child with one simple check.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • yes, i think this is the best and simplest way to build your structure. if you use pear then you can use the nested set package http://pear.php.net/package/DB_NestedSet. – Daniel Jan 28 '12 at 10:00
  • You say "Each article has one and only one parent article." Does this still allow for any item being able to be in multiple categories (Bob being in both Male and Employees, and probably also listed directly in People), and have multiple chains of children? Also, how would I fetch all the children and parents of a given article at once? – Eph Jan 28 '12 at 10:04
  • +1, great mention of MPTT. Haven't used that since my Advanced Data Structures class 20 years ago, but it's way more efficient. – davidethell Jan 28 '12 at 10:04
  • @Eph No, that would not be possible. I did not see in your question that you want to do that. If that's a requirement, then MPTT isn't for you. – deceze Jan 28 '12 at 10:08
  • @Eph In that case a simple many-to-many relationship between articles is the way to go, but this will get rather messy. IMHO that's even messy for the user, if there are several paths to get to the same article. I'd rather suggest a structured *category tree* independent of articles, and each article can belong to many categories at a time. – deceze Jan 28 '12 at 10:11
  • 1
    I've always called those adjacency lists. There was a very good article on the MySQL website, but it is not there anymore. I found a copy here: http://www.vbmysql.com/articles/database-design/managing-hierarchical-data-in-mysql – nico Jan 28 '12 at 10:13
  • @nico Excellent! I was looking for that article, but couldn't find it anymore. – deceze Jan 28 '12 at 10:17
0

From the database side, your relationship is very simple. Something like this will get you started:

CREATE TABLE articles (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
parent_id INT NOT NULL DEFAULT '0'
name VARCHAR(100) NOT NULL,
content TEXT);

Now you just have to decide where to enforce the "no ancestors as children" rule. If you want to enforce that in the database you can write an INSERT/UPDATE trigger that checks if the incoming parent_id is already a descendant and if so disallow the action. In typical usage, this will never happen because on the PHP side when you create content you'll usually be starting at a parent level and adding a child, never adding a parent to an existing piece of content.

As for finding the ancestors and children for your record, that is probably best done in PHP with two loops based on these queries:

// The ancestors:
$sql = "SELECT * FROM articles WHERE id = {$currentRecord->parent_id}";

// The children:
$sql = "SELECT * FROM articles WHERE parent_id = {$currentRecord->id";

Hope that gets you started.

davidethell
  • 11,708
  • 6
  • 43
  • 63