Questions tagged [morelikethis]

Apache Lucene functionality which provides information about similar documents

Apache Lucene functionality which provides information about similar documents. The same feature is also available on Solr.

149 questions
7
votes
3 answers

Measuring similarity between document sets

For illustration purposes, let's assume this is a forum service. I need to calculate the "similarity" among each users' posts, so that the result would be something like: among posts by user A, similarity 60% among posts by user B, similarity…
jodeci
  • 966
  • 2
  • 11
  • 18
7
votes
1 answer

How does Solr's MoreLikeThis component internally work to get results?

I'm new to Apache Solr and am currently exploring/trying to make use of MoreLikeThis as a search component (instead of dedicated request handler). I'm finding difficult to understand clearly on how this works internally to get more-like-this…
Gnanam
  • 10,613
  • 19
  • 54
  • 72
7
votes
2 answers

Zend Lucene MoreLikeThis

I'm using Zend_Search_Lucene for my search engine. Sadly it is missing an implementation of the MorelikeThis methods which can find similar documents in the index. Has anybody come across a decent Zend port of this function? I found a drupal module…
Neil Aitken
  • 7,856
  • 3
  • 41
  • 40
7
votes
1 answer

Elasticsearch: How to store term vectors

I am working on a project where I heavily use Elasticsearch and leverage the moreLikeThis query to implement some features. The official documentation for the MLT query states the following: In order to speed up analysis, it could help to store…
Nicola Miotto
  • 3,647
  • 2
  • 29
  • 43
7
votes
2 answers

Elasticsearch More Like This Query

I'm trying wrap my mind around how the more like this query works, and I seem to be missing something. I read the documentation, but the ES documentation is often somewhat...lacking. The goal is to be able to limit results by term frequency, as…
Sloan Ahrens
  • 8,588
  • 2
  • 29
  • 31
7
votes
2 answers

Solr MoreLikeThis boosting query fields

I am experimenting with Solr's MoreLikeThis feature. My schema deals with articles, and I'm looking for similarities between articles within three fields: articletitle, articletext and topic. The following query works…
JBradshaw
  • 151
  • 1
  • 9
6
votes
0 answers

Percentage of matched terms in Elasticsearch

I am using elasticsearch to find similar documents. Below is the query I am using: { "query": { "more_like_this":{ "like": { "_index": "docs", "_type": "pdfs", "_id": "pdf_1" …
swan8060
  • 95
  • 8
5
votes
1 answer

How to constrain/filter More Like This results in Solr?

In Solr, I am wondering if it's possible to constrain/filter the "More Like This" result set from a standard (dismax) query - e.g., without having to use the specific MoreLikeThis request handler? For example, I have a Solr index which has…
cambo
  • 973
  • 4
  • 11
  • 22
5
votes
1 answer

Limiting the output from MoreLikeThis in Solr

I'm trying to use MoreLikeThis to get all similar documents but not documents with a specific contenttype. So the first query needs to find the one document that I want to get "More Like This" of - and the second query needs to limit the similar…
Svenn
  • 111
  • 2
  • 5
5
votes
1 answer

EarlyTerminatingCollectorException in MLT Component of SOLR 4.4

I send a query to SOLR, which returns exactly one document. It's a "id:some_doc_id" search. Here are the parameters as shown in the response: params: { mlt.mindf: "1", mlt.count: "5", mlt.fl: "text", fl: "id,,application_id,...…
Achim
  • 15,415
  • 15
  • 80
  • 144
4
votes
1 answer

How to get MoreLikeThis result

I'm trying to understand how Solr MorelIkeThis works. Steps I've done - In schema.xml I've written - field name="path_exact" type="string" indexed="true" stored="true" termVectors="true"/> field name="title" type="text_general" indexed="true"…
Pakira
  • 1,951
  • 3
  • 25
  • 54
4
votes
1 answer

Boosting in more like this elasticsearch

I was trying to do a simple POC for related items using the elasticsearch's http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html#query-dsl-mlt-query, But I was not getting how to use the boosting so that…
Global Warrior
  • 5,050
  • 9
  • 45
  • 75
4
votes
1 answer

Difference between fuzzy like this and more like this?

What is the difference between Lucene's MoreLikeThis (mlt) and FuzzyQuery (flt)? I am evaluating both query types through Elasticsearch (ES) and I found they are conceptually very similar: mlt: compare an existing documents fields with other…
miku
  • 181,842
  • 47
  • 306
  • 310
4
votes
1 answer

How to score similar documents in Lucene?

I want to score the similar documents in Lucene. Let me explain you my scenario. For example lets say I have the following records in my file on which I created index. ID|First Name|Last Name|DOB 1 |John |Doe |03/18/1990 1 |John …
Huzaifa
  • 1,111
  • 5
  • 20
  • 37
4
votes
1 answer

Why restricts Lucene's MoreLikeThis its TermQueries to the field with the highest docFreq?

I'm currently working on a modified version of Lucenes MoreLikeThis, to fit my own purposes. There is one thing I still can't understand. When creating the queue, MoreLikeThis searches for the field with the highest docFreq for this term. // go…
Michael A. Schaffrath
  • 1,992
  • 1
  • 14
  • 23
1
2 3
9 10