3

I have created a mapping with a parent-child join datatype. I want to get the child with the maximal value for each parent, in a single query.

Is it possible? I've tried a few things, such as inner_hits definition and aggregations like top_hits, children, has_parent and has_child.

My mapping is based on the classes Post, Question and Answer in this elasticsearch_dsl example.

A solution with elasticsearch_dsl code would be great, but a simple elasticsearch query would also help.

Thanks :)

Edit: I'm attaching my code, hoping that it help.

class LoggerLogBase (based on class Post):

class LoggerLogBase(Document):
    """
    A base class for :class:`~data_classes.Log` and :class:`~data_classes.Logger` data classes.
    """

    logger_log = Join(relations={'logger': 'log'})

    @classmethod
    def _matches(cls, hit):
        """
        Returns whether a hit matches this class or not.
        """
        return False

    class Index:
        """
        Meta-class for defining the index name.
        """
        name = 'logger-log'

class Logger (based on class Question):

class Logger(LoggerLogBase):
    """
    A class to represent a temperature logger.
    """
    name = Keyword()
    display_name = Keyword()
    is_displayed = Boolean()

    @classmethod
    def _matches(cls, hit):
        """
        Returns whether a hit matches this class or not.
        """
        return hit['_source']['logger_log'] == 'logger'

    @classmethod
    def search(cls, **kwargs):
        """
        Creates an :class:`~elasticsearch_dsl.Search` instance that will search
        over this index.
        """
        return cls._index.search(**kwargs).filter('term', logger_log='logger')

    def add_log(self, timestamp, heat_index_celsius, humidity, temperature_celsius):
        """
        Save a new log which was logged by this logger.
        """
        log = Log(
            _routing=self.meta.id,
            logger_log={'name': 'log', 'parent': self.meta.id},
            timestamp=timestamp,
            heat_index_celsius=heat_index_celsius,
            humidity=humidity,
            temperature_celsius=temperature_celsius
        )

        log.save()
        return log

    def search_logs(self):
        """
        Returns the search for this logger's logs.
        """
        search = Log.search()
        search = search.filter('parent_id', type='log', id=self.meta.id)
        search = search.params(routing=self.meta.id)
        return search

    def search_latest_log(self):
        """
        Returns the search for this logger's latest log.
        """
        search = self.search_logs()\
                        .params(size=0)
        search.aggs.metric('latest_log',
                           'top_hits',
                           sort=[{'timestamp': {'order': 'desc'}}],
                           size=1)
        return search

    def save(self, using=None, index=None, validate=True, **kwargs):
        """
        Saves the document into elasticsearch.
        See documentation for elasticsearch_dsl.Document.save for more information.
        """
        self.logger_log = {'name': 'logger'}
        return super().save(using, index, validate, **kwargs)

class Log (based on class Answer):

class Log(LoggerLogBase):
    """
    A class to represent a single temperature measurement log.
    """
    timestamp = Date()
    heat_index_celsius = Float()
    humidity = Float()
    temperature_celsius = Float()

    @classmethod
    def _matches(cls, hit):
        """
        Returns whether a hit matches this class or not.
        """
        return isinstance(hit['_source']['logger_log'], dict) \
            and hit['_source']['logger_log'].get('name') == 'log'

    @classmethod
    def search(cls, using=None, **kwargs):
        """
        Creates an :class:`~elasticsearch_dsl.Search` instance that will search
        over this index.
        """
        return cls._index.search(using=using, **kwargs).exclude('term', logger_log='logger')

    @property
    def logger(self):
        """
        Returns the logger that logged this log.
        """
        if 'logger' not in self.meta:
            self.meta.logger = Logger.get(id=self.logger_log.parent, index=self.meta.index)
        return self.meta.logger

    def save(self, using=None, index=None, validate=True, **kwargs):
        """
        Saves the document into elasticsearch.
        See documentation for elasticsearch_dsl.Document.save for more information.
        """
        self.meta.routing = self.logger_log.parent
        return super().save(using, index, validate, **kwargs)

My current solution is calling logger.search_latest_log() for each logger, but it takes N queries. I want to be able to do it in a single query, to improve the performance of this action.

Or B
  • 1,675
  • 5
  • 20
  • 41

1 Answers1

1

I think your solution is a mixture of Child Aggregation and top_hits:

POST logger-log/_search?size=0
{
  "aggs": {
    "top-loggers": {
      "terms": {
        "field": "name"
      },
      "aggs": {
        "to-logs": {
          "children": {
            "type" : "log" 
          },
          "aggs": {
            "top-logs": {
              "top_hits": {
                    "size": 1,
                    "sort": [
                        {
                            "timestamp": {
                                "order": "desc"
                            }
                        }
                    ]
                }
            }
          }
        }
      }
    }
  }
}
  • Let me know whether it worked or any problem happened ;-) – Amir Masud Zare Bidaki Sep 23 '18 at 07:50
  • 1
    Nice catch, It works! (If changing "name.keyword" to just "name". I don't like the "terms" query inside "top-loggers", but I think I'll find a nicer way to handle this (I also want to add some condition to it, so I guess I'll find a way from here). Thanks! – Or B Sep 24 '18 at 14:52