0

I am building a system in Scala for feature engineering where the end user API receives aggregations on list of objects/events. For example, a client of this tool might pass into it a function that given an array of past pageviews for a specific web user, filters for the ones coming from a specify country and count them. The output of this call will then be a number.

It can be thought as a very simple reduce operation. I am figuring out how to build the API for such system. I could write a simple custom language to perform counts and filters but I am sure that it is not the best approach, especially because it will not be expressive enough, unless designed with care.

Are you aware of something like an expression language that could be used to express simple functions without the need for me to build one from scratch? The other option would be to allow end users to pass custom code to this library, which might be dangerous at runtime.

I am aware of apache Calcite to plug SQL into different data structures and db. It is a good option, however it forces me to think in "columnar" sql way, while here I am looking more for something row based, similar to the map-reduce way of programming.

alexlipa
  • 1,131
  • 2
  • 12
  • 27

1 Answers1

0

You mention Apache Calcite, but did you know that you can use it without writing SQL? If you use Calcite's RelBuilder class you can build algebra directly, which is very similar to the algebraic approach of MapReduce. See algebra in Calcite for more details.

Julian Hyde
  • 1,239
  • 7
  • 10
  • That's exactly what I am looking for! Thanks! I don't find in the docs an example of calcite directly converting a string expression to a RelBuilder. Is this something I should implement or am I missing something? The idea is to somehow get from the user a string expression and execute it – alexlipa Nov 22 '19 at 07:51
  • You have a few options. 1. Call the `RelBuilder` API directly (in Java or other JVM language) to build an expression. 2. Devise your own language and write a parser that will translate a query in that language into a sequence of `RelBuilder` calls. 3. Use Calcite's `RelJson` class to convert algebra represented in JSON into algebra Java objects. 4. Use SQL as your query language and parse using Calcite's SQL parser. All of these approaches should yield similar results: a tree of Calcite algebra objects; the difference is the source language. – Julian Hyde Dec 06 '19 at 22:20