7

I'm working with a data set that looks something like this:

"bitrates": [
  {
    "format":  "mp3" ,
    "rate":  "128K"
  } ,
  {
    "format":  "aac" ,
    "rate":  "192K"
  }
] ,
"details": [ ... ] ,
"id": 1 ,
"name":  "For Those About To Rock We Salute You" ,
"price": 1026 ,
"requires_shipping": false ,
"sku":  "ALBUM-1" 
}

And I wanted to create a secondary index on bitrates, flexing {multi:true}. This was my attempt:

r.db("music").table("catalog").indexCreate("bitrates", {multi: true})

The index built just fine, but when I query it, nothing returns - which seems contrary to every example I've read here:

http://rethinkdb.com/docs/secondary-indexes/javascript/

The query I wrote is this:

r.db("music").table("catalog").getAll(["mp3", "128K"], {index : "bitrates"})

There is no error, just 0 results (and I have 300 or so documents with this exact data).

I'm using RethinkDB 2.0 RC1.

2 Answers2

13

When you create an index for a column, the values in the column are used literally as the keys of the index. In your case, the keys for your bitrates index would be the objects within the bitrates array in the document.

It seems like what you want is an index that's derived from the values in a field of the document. To do that, you want to define a custom indexing function that reduces the document to just the data you care about. The easiest way to experiment with them is to start by writing a query, and once you're happy with the results, converting it into an indexCreate() statement.

Here's a statement that grabs your sample document (with id 1), and plucks the format and rate terms from all of the objects in its bitrate array, and then merges them together to create a distinct set of strings:

r.db('music').table('catalog').get(1).do(function(row) {
  return row('bitrates').map(function(bitrate) {
    return [bitrate('format'), bitrate('rate')];
  }).reduce(function(left, right) {
    return left.setUnion(right);
  })
})

Running this statement will return the following:

["mp3", "128K", "aac", "192K"]

This looks good, so we can use our function to create an index. In this case, since we're expecting the indexing function to return a set of items, we also want to specify {multi: true} to ensure we can query by the items in the set, not the set itself:

r.db('music').table('catalog').indexCreate('bitrates', function(row) {
  return row('bitrates').map(function(bitrate) {
    return [bitrate('format'), bitrate('rate')];
  }).reduce(function(left, right) {
    return left.setUnion(right);
  })
}, {multi: true})

Once created, you can query your index like this:

r.db('music').table('catalog').getAll('mp3', {index: 'bitrates'})

You can also supply multiple query terms, to match rows that match any of the items:

r.db('music').table('catalog').getAll('mp3', '128K', {index: 'bitrates'})

However, if a single document matches more than one term in your query, it will be returned more than once. To fix this, add distinct():

r.db('music').table('catalog').getAll('mp3', '128K', {index: 'bitrates'}).distinct()

If necessary, you might also consider using downcase() to normalize the casing of the terms used in the secondary index.

You could also skip all of the indexing business entirely and use a filter() query:

r.db('music').table('catalog').filter(function(row) {
  return row('bitrates').map(function(bitrates) {
    return [bitrates('format'), bitrates('rate')];
  }).reduce(function(left, right) {
    return left.setUnion(right);
  }).contains('mp3');
})

That said, if you're almost always querying your table in the same manner, generating a secondary index using a custom function will result in dramatically better performance.

Nate Kohari
  • 2,216
  • 17
  • 13
  • 1
    Thanks Nate - great answer! I think I should have added in my description that my goal was to use a multi index on an array of objects and it sounds like that's not possible with a straight up `{multi:true}`. This is more of an exploratory thing I'm doing - either way your answer is stellar thank you :) –  Apr 02 '15 at 15:49
1

Keys of secondary indexes can't be objects right now:

> r.table('foo').indexCreate('bitrates', {multi: true})
> r.table('foo').getAll({format: "mp3", rate: "128K"}, {index: 'bitrates'})
RqlRuntimeError: Secondary keys must be a number, string, bool, pseudotype, or array

You can track this issue at https://github.com/rethinkdb/rethinkdb/issues/2773 .

For a work-around, you can do this:

> r.table('foo').indexCreate('bitrates', function(row){
    return row('bitrates').map(function(bitrate){return bitrate.coerceTo('array');})
  }, {multi: true});
> r.table('foo').getAll(r.expr({format: "mp3", rate: "128K"}).coerceTo('array'), {index: 'bitrates'})
mlucy
  • 5,249
  • 1
  • 17
  • 21
  • Thanks... user359whatever :). As mentioned I'm using 2.0 RC1, which doesn't do what you're suggesting above - it works just fine without the error. –  Apr 02 '15 at 19:52
  • Rob -- your post said `r.db("music").table("catalog").getAll(["mp3", "128K"], {index : "bitrates"})` didn't produce an error, does `r.table('foo').getAll({format: "mp3", rate: "128K"}, {index: 'bitrates'})` not produce an error either? – mlucy Apr 03 '15 at 23:41