11

I'm working with a dataset composed by probabilistic encrypted elements indistinguishable from random samples. This way, sequential encryptions of the same number results in different ciphertexts. However, these still comparable through a special function that applies algorithms like SHA256 to compare two ciphertexts.

I want to add a list of the described ciphertexts to a MongoDB database and index it using a tree-based structure (i.e.: AVL). I can't simply apply the default indexing of the database because, as described, the records must be comparable using the special function.

An example: Suppose I have a database db and a collection c composed by the following document type:

{
  "_id":ObjectId,
  "r":string
}

Moreover, let F(int,string,string) be the following function:

F(h,l,r) = ( SHA256(l | r) + h ) % 3

where the operator | is a standard concatenation function.

I want to execute the following query in an efficient way, such as in a collection with some suitable indexing:

db.c.find( { F(h,l,r) :{ $eq: 0 } } )

for h and l chosen arbitrarily but not constants. I.e.: Suppose I want to find all records that satisfy F(h1,l1,r), for some pair (h1, l1). Later, in another moment, I want to do the same but using (h2, l2) such that h1 != h2 and l1 != l2. h and l may assume any value in the set of integers.

How can I do that?

Community
  • 1
  • 1
Pedro Alves
  • 1,667
  • 4
  • 17
  • 37

2 Answers2

5

You can execute this query use the operator $where, but this way can't use index. So, for query performance it's dependents on the size of your dataset.

db.c.find({$where: function() { return F(1, "bb", this.r) == 0; }})

Before execute the code above, you need store your function F on the mongodb server:

db.system.js.save({
    _id: "F",
    value: function(h, l, r) {
        // the body of function
    }
})

Links:

Shawyeok
  • 1,186
  • 1
  • 8
  • 15
  • I believe that, for now, this is the best answer. This will not create an efficient index structure for my database but will at least move processing to DBMS rather than the application. – Pedro Alves Jul 15 '16 at 18:00
0

I've tried a solution that store the result of the function in your collection, so I changed the schema, like below:

{
  "_id": ObjectId,
  "r": {
    "_key": F(H, L, value),
    "value": String
  }
}

The field r._key is value of F(h,l,r) with constant h and l, and the field r.value is original r field. So you can create index on field r._key and your query condition will be:

db.c.find( { "r._key" : 0 } )
Shawyeok
  • 1,186
  • 1
  • 8
  • 15
  • Yes, indeed your solution works. However I believe I was not clear enough in my question. H and L are not constants, but only chosen arbitrarily. I.e.: I want to find all records that satisfy F(H1,L1,r), for some pair (H1, L1). Later, in another moment, I want to do the same but using (H2, L2) such that H1 != H2 and L1 != L2. I will update the question with this constraint. – Pedro Alves Jul 12 '16 at 11:24