How to querying and filtering efficiently on Fauna DB?

Question

For example, let’s assume we have a collection with hundreds of thousands of documents of clients with 3 fields, name, monthly_salary, and age.

How can I search for documents that monthly_salary is higher than 2000 and age higher than 30?

In SQL this would be straightforward but with Fauna, I´m struggling to understand the best approach because terms of Index only work with an exact match. I see in docs that I can use the Filter function but I would need to get all documents in advance so it looks a bit counterintuitive and not performant.

Below is an example of how I can achieve it, but not sure if it’s the best approach, especially if it contains a lot of records.

Map(
  Filter(
    Paginate(Documents(Collection('clients'))),
    Lambda(
      'client',
      And(
        GT(Select(['data', 'monthly_salary'], Get(Var('client'))), 2000),
        GT(Select(['data', 'age'], Get(Var('client'))), 30),
        )
      )
    ),
    Lambda(
      'filteredClients',
      Get(Var('filteredClients'))
      )
  
  )

Is this correct or I´m missing some fundamental concepts about Fauna and FQL? can anyone help?

Thanks in advance

ptpaterson · Answer 1 · 2021-09-15T13:14:16.010

Efficient searching is performed using Indexes. You can check out the docs for search with Indexes, and there is a "cookbook" for some different search examples.

There are two ways to use Indexes to search, and which one you use depends on if you are searching for equality (exact match) or inequality (greater than or less than, for example).

Searching for equality

If you need an exact match, then use Index terms. This is most explicit in the docs, and it is also not what your original question is about, so I am not going to dwell much here. But here is a simple example

given user documents with this shape

{
  ref: Ref(Collection("User"), "1234"),
  ts: 16934907826026,
  data: {
    name: "John Doe",
    email: "jdoe@example.com,
    age: 50,
    monthly_salary: 3000
  }
}

and an index defined like the following

CreateIndex({
  name: "users_by_email",
  source: Collection("User"),
  terms: [ { field: ["data", "email"] } ],
  unique: true // user emails are unique
})

You can search for exact matches with... the Match function!

Get(
  Match(Index("user_by_email"), "jdoe@example.com")
)

Searching for inequality

Searching for inequalities is more interesting and also complicated. It requires using Index values and the Range function.

Keeping with the document above, we can create a new index

CreateIndex({
  name: "users__sorted_by_monthly_salary",
  source: Collection("User"),
  values: [ 
    { field: ["data", "monthly_salary"] },
    { field: ["ref"] }
  ]
})

Note that I've not defined any terms in the above Index. The important thing for inequalities is again the values. We've also included the ref as a value, since we will need that later.

Now we can use Range to get all users with salary in a given range. This query will get all users with salary starting at 2000 and all above.

Paginate(
  Range(
    Match(Index("users__sorted_by_monthly_salary")),
    [2000],
    []
  )
)

Combining Indexes

For "OR" operations, use the Union function. For "AND" operations, use the Intersection function.

Functions like Match and Range return Sets. A really important part of this is to make sure that when you "combine" Sets with functions like Intersection, that the shape of the data is the same.

Using sets with the same shape is not difficult for Indexes with no values, they default to the same single ref value.

Paginate(
  Intersection(
    Match(Index("user_by_age"), 50), // type is Set<Ref>
    Match(Index("user_by_monthly_salary, 3000) // type is Set<Ref>
  )
)

When the Sets have different shapes they need to be modified or else the Intersection will never return results

Paginate(
  Intersection(
    Range(
      Match(Index("users__sorted_by_age")),
      [30],
      []
    ), // type is Set<[age, Ref]>
    Range(
      Match(Index("users__sorted_by_monthly_salary")),
      [2000],
      []
    ) // type is Set<[salary, Ref]>
  )
)

{
  data: [] // Intersection is empty
}

So how do we change the shape of the Set so they can be intersected? We can use the Join function, along with the Singleton function.

Join will run an operation over all entries in the Set. We will use that to return only a ref.

Join(
  Range(Match(Index("users__sorted_by_age")), [30], []),
  Lambda(["age", "ref"], Singleton(Var("ref")))
)

Altogether then:

Paginate(
  Intersection(
    Join(
      Range(Match(Index("users__sorted_by_age")), [30], []),
      Lambda(["age", "ref"], Singleton(Var("ref")))
    ),
    Join(
      Range(Match(Index("users__sorted_by_monthly_salary")), [2000], []),
      Lambda(["age", "ref"], Singleton(Var("ref")))
    )
  )
)

tips for combining indexes

You can use additional logic to combine different indexes when different terms are provided, or search for missing fields using bindings. Lot's of cool stuff you can do.

Do check out the cook book and the Fauna forums as well for ideas.

BUT WHY!!!

It's a good question!

Consider this: Since Fauna is served as a serverless API, you get charged for each individual read and write on your documents and indexes as well as the compute time to execute your query. SQL can be much easier, but it is a much higher level language. Behind SQL sits a query planner making assumptions about how to get you your data. If it cannot do it efficiently it may default to scanning your entire table of data or otherwise performing an operation much more expensive than you might have expected.

With Fauna, YOU are the query planner. That means it is much more complicated to get started, but it also means you have fine control over the performance of you database and thus your cost.

We are working on improving the experience of defining schemas and the indexes you need, but at the moment you do have to define these queries at a low level.