1

I was reviewing this slide show [slide 134] (Ian Varely/salesforce.com at HBase Con 2012) where he states that you can nest entities two levels deep.

Here is an example he gives of nesting an entity one level deep:

Entities: Band, Shows; where Band 1:M Shows.

Table: Band
CF:"CF"
    Qualifiers:
        "Name":<name>
        "Genre":<genre>
        "Show_<id>":venue_<id>_date_<date>_start_time_<start_time>_cover_price_<cover_price>

However, he doesn't give an example of how to nest two levels deep. My best guest from slide 134 would be something like the following...

Entities: Customer, Meeting, Attendees; where Customer 1:M Meetings and Meetings 1:M Attendees.

Table: Customer
CF: "CF"
    Qualifiers:
        "Company_name":<company_name>
        "Capacity":<capacity>
        "Meeting_<id>":host_<id>_start_time_<start_time>_attendee_<id>_attendee_join_time_<join_time>

However the attributes of the meeting entity (host_id, start_time) are repeated in every column unnecessarily. Moving the meeting attributes to its key maintains the problem:

"Meeting_<id>_host_<id>_start_time_<start_time>":attendee_<id>_attendee_join_time_<join_time>

Here is another option I thought of which seems to make more sense, the use of JSON:

"Meeting_<id>_host_<id>_start_time_<start_time>":[{attendee_id:<id>,join_time:<time>}, ..]

However, why not just use one column as a giant JSON string containin the meetings and columns?

Is this what is meant by nesting two levels deep in an HBase schema, or is there much better way to do it?

Matthew Moisen
  • 16,701
  • 27
  • 128
  • 231

2 Answers2

2

Use an easy serialization format, like JSON, to store your nested data, not some custom underscore-delimited string. In your example, Customer 1:M Meetings and Meetings 1:M Attendees, you first need to decide what kind of cell granularity you want.

For a single Customer, should each Attendee be in its own cell? Or would having each Meeting be in its own cell be enough granularity?

You could use column qualifiers like this:

meeting:17          (Meeting 17)
attendee:17:5       (Meeting 17, Attendee 5)
Timothy Shields
  • 75,459
  • 18
  • 120
  • 173
  • Ok, so custom delimited string is fine for the column qualifier name, but I should use JSON/xml for the corresponding value? – Matthew Moisen Mar 20 '14 at 22:23
  • @MatthewMoisen Yes. Using custom delimited strings for column qualifiers makes sense because HBase interacts with them so directly. For example, scanning the Customer table with a qualifier filter that only matches "attendee:*:5" to find all the Meetings Attendee 5 is in. But typically cell contents are just treated as a byte[] from HBase's point of view, so it makes sense to use something very easy to work with. – Timothy Shields Mar 20 '14 at 23:09
1

If your tables exist in a parent-child, master-detail, or other strict one-to-many relationship, it’s possible to model it in HBase as a single row. The rowkey will correspond to the parent entity. The nested values will contain the children, where each child entity gets one column qualifier into which their identifying attributes are stored, with the remainder of the non-identifying attributes stashed into the value . The real HBase row defines the parent record ; records of the child entity are stored as individual columns. You can put in nested entities by using HBase’s flexibility because of the way columns are designed. HBase doesn’t necessarily have special abilities to store nested entities. There are some limitations to this, of course. First, this technique only works to one level deep: your nested entities can’t themselves have nested entities. You can still have multiple different nested child entities in a single parent, and the column qualifier is their identifying attributes. Second, it’s not as efficient to access an individual value stored as a nested column qualifier inside a row, as compared to accessing a row in another table, as you learned earlier in the chapter. Still, there are compelling cases where this kind of schema design is appropriate. If the only way you get at the child entities is via the parent entity, and you’d like to have transactional protection around all children of a parent, this can be the right way to go.

From in Action