1

I'm trying to debug an ElasticSearch query. I've enabled explain for the problematic query, and that is showing that the query is doing a product of intermediate scores where it should be doing a sum. (I'm creating the query request using elastic4s.)

The problem is I cannot see what the generated query actually is. I want to determine whether the bug is in elastic4s (generating the query request incorrectly), in my code, or in elasticsearch. So I've enabled logging for the embedded elasticsearch instance used in the tests using the following code:

ESLoggerFactory.setDefaultFactory(new Slf4jESLoggerFactory())
val settings = Settings.settingsBuilder
  .put("path.data", dataDirPath)
  .put("path.home", "/var/elastic/")
  .put("cluster.name", clusterName)
  .put("http.enabled", httpEnabled)
  .put("index.number_of_shards", 1)
  .put("index.number_of_replicas", 0)
  .put("discovery.zen.ping.multicast.enabled", false)
  .put("index.refresh_interval", "10ms")
  .put("script.engine.groovy.inline.search", true)
  .put("script.engine.groovy.inline.update", true)
  .put("script.engine.groovy.inline.mapping", true)
  .put("index.search.slowlog.threshold.query.debug", "0s")
  .put("index.search.slowlog.threshold.fetch.debug", "0s")
  .build

but I can't find any queries being logged in the log file configured in my logback.xml. Other log messages from elasticsearch are appearing there, just not the actual queries.

Robin Green
  • 32,079
  • 16
  • 104
  • 187

2 Answers2

1

You can't, at least not directly, at least not in ES versions currently available. It's something that has been discussed at some length (eg https://github.com/elastic/elasticsearch/issues/9172 and https://github.com/elastic/elasticsearch/issues/12187) it seems like this may change soon, with the rewrite of the tasks API. In the meantime, you can use things like ES Restlog (https://github.com/etsy/es-restlog) and/or put nginx in front of ES and capture the queries in the nginx logs. You can also use tcpdump (eg tcpdump -vvv -x -X -i any port 9200) and capture the query as it's running on the server. One last option is to modify your application and echo the query instead of executing it (and/or inserting the query into ES itself before you execute it, since the query itself is JSON).

Evan Volgas
  • 2,900
  • 3
  • 19
  • 30
1

In the specific case of elastic4s, it offers the ability to call .show on the elastic4s query object to generate what the JSON body part of the request would have been if the JSON-over-HTTP protocol had been used to send the request, for most types of request. This can then be logged at a convenient point in your code, e.g. if you have one method that generates all ES search queries. The code in Elasticsearch that generates the fake JSON could still have bugs of course, so it should not entirely be trusted. However, it's worth trying to reproduce the issue with the output of .show using Sense against a real Elasticsearch cluster over HTTP - if you can, you (a) know that it's not an elastic4s bug, and (b) can easily manipulate the JSON to try to figure out what's causing the problem.

show calls toString in some cases, so with the plain Elasticsearch API or another JVM-based wrapper on top of it, you can call that to get the JSON string to log.

With embedded Elasticsearch, this is as good as you're going to get in terms of logging - short of putting a breakpoint on the builder invocations and observing the actual Java Elasticsearch request objects that are created (which is the most accurate approach).

Robin Green
  • 32,079
  • 16
  • 104
  • 187