6

In Datomic, how do you efficiently perform queries such as 'find all people living in Washington older than 50' (city and age may vary)? In relational databases and most of NoSQL databases you use composite indexes for this purpose; Datomic, as far as I'm aware of, does not support anything like this.

I built several, say, medium-sized web-apps and not a single one would perform quick enough, if not for composite indexes. How are Datomic users dealing with this? Or are they just playing with datasets small enough not to suffer from this? Am I missing something?

Tomas Kulich
  • 14,388
  • 4
  • 30
  • 35
  • Same question here. Did you find out any solution to your problem? Thanks. – Felipe Jul 09 '14 at 04:28
  • I'm merely playing with Datomic, so I don't have an actual problem :) However, I would like to know, what are the limitations of it and whether I can use it in some real project. – Tomas Kulich Jul 19 '14 at 18:42
  • 1
    One - quite ugly - approach that comes to mind is to create special 'indexing' attribute, in which multiple other attributes are concatenated (so, given example above, its value is such as 'washington-1983-01-10'). Now you can query for entities within range 'washington-startdate' and 'washington-enddate'. It works, but it smells a lot. – Tomas Kulich Jul 19 '14 at 18:49

3 Answers3

3

This problem and its solution are not identical in Datomic due to the structure of data (datoms) in Datomic. There are two performance characteristics/strategies that may add some shading to this:

(1) When you fetch data in Datomic, you fetch an entire leaf segment from the index tree (not an individual item) - with segments being composed of potentially many thousands of datoms. This is then cached automatically so that you don't have to reach out over the network to get more datoms.

If you're querying a single person - i.e., a single entity, for their age and where they live, it's very likely the query's navigation of the EAVT or AEVT indexes may have cached everything you need. You've effectively cached the datom, how to navigate to it to it, and related datoms (by locality in the index).

(2) Partitions can provide a manual means to specify locality of reference. Partitions impact the entity ID's value (it's encoded in the high bits) and ensure that related entities are sorted near each other. So for an alternative implementation of the above problem, if you needed information from the city and person entities both, you could include them in the same partition.

Ben Kamphaus
  • 1,655
  • 10
  • 13
2

I've written a library to handle this: https://github.com/arohner/datomic-compound-index

Allen Rohner
  • 1,108
  • 1
  • 10
  • 16
0

Update 2019-06-28: Since 0.9.5927 (Datomic On-Prem) / 480-8770 (Datomic Cloud), Datomic supports Tuples as a new Attribute Type, which allows you to have compound indexes.

Valentin Waeselynck
  • 5,950
  • 26
  • 43
  • While tuples are really cool, does this address the problem? Say I want to build a composite index on two columns: integer and date. Without tuples, I can encode values such as `left_padded_number-YYYY-MM-DD` strings and benefit from lexicographical indexing of such a field. Sure, with tuples, I don't have to bother with parsing strings (great!) but I still have to organize my data in quite a strange manner only to gain the indexing capabilities..? Or do I get it wrong? – Tomas Kulich Jun 30 '19 at 17:53
  • @TomasKulich I haven't used Tuples yet, but I think Composite Tuples are what you are looking for: https://docs.datomic.com/cloud/schema/schema-reference.html#composite-tuples – Valentin Waeselynck Jul 01 '19 at 08:16
  • Ah, I see, this is awesome! This seems to be exactly the composite index semantics! From the brief skimming of the docs, even range queries should work just fine. – Tomas Kulich Jul 03 '19 at 08:56