Questions tagged [significant-terms]

16 questions
14
votes
2 answers

Efficiently Computing Significant Terms in SQL

I was introduced to ElasticSearch significant terms aggregation a while ago and was positively surprised how good and relevant this metric turns out to be. For those not familiar with it, it's quite a simple concept - for a given query (foreground…
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
14
votes
3 answers

ElasticSearch circuit_breaking_exception (Data too large) with significant_terms aggregation

The query: { "aggregations": { "sigTerms": { "significant_terms": { "field": "translatedTitle" }, "aggs": { "assocs": { "significant_terms": { "field": "translatedTitle" } …
esp
  • 7,314
  • 6
  • 49
  • 79
4
votes
1 answer

Elasticsearch significant terms aggregation: meaning of doc_count and bg_count

I am having trouble finding documentation to explain the doc_count and bg_count fields in the response to the significant terms aggregation. For example I would expect that, if I do not set a background filter, the bg_count should the the total…
Metropolis
  • 2,018
  • 1
  • 19
  • 36
4
votes
2 answers

Asking for significant terms but returns nothing

I am having an issue with Elasticsearch (version 2.0), I am trying to get the significant terms from a bunch of documents but it always returns nothing. Here is the schema of my index : { "documents" : { "warmers" : {}, "mappings" :…
VinL
  • 51
  • 3
4
votes
3 answers

Significant Terms Aggregation of "flat" structures

I currently try to prototype a product recommendation system using the Elasticsearch Significant Terms aggregation. So far, I didn't find a good example yet which deals with "flat" JSON structures of sales (here: The itemId) coming from a relational…
Tobi
  • 31,405
  • 8
  • 58
  • 90
2
votes
0 answers

Use Elasticsearch significant-tems aggregation with SparkSQL

I am writing/reading datas between Spark dataframes and Elasticsearch using the following code : df.write.format("org.elasticsearch.spark.sql") .option("es.nodes" , [MY_ES_IP]) .option("es.port",[MY_ES_PORT]) …
Nakeuh
  • 1,757
  • 3
  • 26
  • 65
2
votes
0 answers

Elasticsearch significant terms minimum

I've got something like this: GET index_*/_search?search_type=count { "aggs": { "products": { "terms": { "field": "products_id", "size": 100 }, "aggs": { "significant_products": { …
1
vote
1 answer

How do I extract variables that have a low p-value in R

I have a logistic model with plenty of interactions in R. I want to extract only the variables and interactions that are either interactions or just predictor variables that are significant. It's fine if I can just look at every interaction that's…
Antonio
  • 417
  • 2
  • 8
1
vote
1 answer

Elasticsearch significant terms aggregation doc_count differs from hits when doing a match phrase search for the same term

I am using the significant terms aggregation, which gives me n significant terms with their doc_count and bg_count using the following query: { "query" : { "terms" : {"user_id": ["x"]} }, "aggregations" : { "word_cloud" : { …
Piyush Das
  • 610
  • 1
  • 7
  • 18
1
vote
1 answer

terms.fomula data argument invalid

I am new to R Studio and now want to make a "cca". I followed a describtion but R says NO. This is what Iam working with: PreAbscca<- read.table("PreAbsenz.csv", header = TRUE, row.names = NULL) UVcca<- read.table("UV.csv", header = TRUE, row.names…
1
vote
2 answers

Elasticsearch significant terms on nested objects

For my masterthesis I am using Elasticsearch to measure significance of sentences, paragraphs and documents to the rest of the index. I've used 3 different indexes to enable fast querying. Everything works fine, but I want to evaluate if it is even…
boraas
  • 929
  • 1
  • 10
  • 24
0
votes
0 answers

Data-mining an ElasticSearch Instance

Have recently been experimenting with elasticsearch to make item to item recommendations via the significant terms aggregation. Get some pretty good results but I'd like to export results for each item in my catalogue out of ES and store in a…
0
votes
1 answer

Exclude Significant Term Aggregation With Different Field

Is it possible to filter the bucket list result of significant term aggregations using multiple fields to be filtered? I am trying to create a recommendation feature using ES based on this article at medium…
0
votes
0 answers

ElasticSearch Significant Terms Aggregation: doc_count and bg_count unequal for search term

I'm unsure here if there is something incorrect with my query, my document structure, or my interpretation of the doc_count and bg_count fields. When running a significant terms aggregation and sorting results by score, the search term is always,…
Metropolis
  • 2,018
  • 1
  • 19
  • 36
0
votes
1 answer

Different set of results for "significant terms" in Elasticsearch using REST Api or Transportclient

We use the new significant terms plugin in elasticsearch. Using the transport client I get less results compared to that when I use the REST API. I don't understand why. Using the node client is unfortunately not possible, since my service using ES…
Chris W.
  • 2,266
  • 20
  • 40
1
2