Questions tagged [xapian]

Open Source Search Engine Library with bindings to allow use from Perl, Python, PHP, Java, Tcl, C#, Ruby and Lua

Xapian is a toolkit which allows developers to add advanced indexing and search facilities to their applications. It supports the Probabilistic Information Retrieval model and supports a rich set of boolean query operators.

105 questions
2
votes
1 answer

How to count all phrases efficiently in a large collection?

I need to create a phrase frequency table, counting all phrases in a very large collection of a few million words words. The end result would be a table such as what is created here: http://www.hermetic.ch/wfca/phrases.htm What would be an…
ylluminate
  • 12,102
  • 17
  • 78
  • 152
2
votes
2 answers

Django-haystack not updating index

Using django-haystack 2.0.0 and xapian-haystack 2.0.0, migrated all code from 1.1.5 as it said in docs. Now my search_indexes.py looks like: from haystack import indexes from app.models import Post class PostIndex(indexes.SearchIndex,…
amureki
  • 773
  • 1
  • 8
  • 15
2
votes
1 answer

Xapian multiple-language searching with stop words?

I have two Xapian databases, let's call one "EN" and the other "DE", and let's say the former contains some documents in English, and the latter in German. If I want users to be able to search both at once, I can easily load both of the databases. …
Sean
  • 1,668
  • 1
  • 18
  • 28
1
vote
1 answer

haystack xapian numeric range

Trying to setup a price range with haystack and xapian. We had it working with solr by passing in a query like this via HTTP GET. To get a price from 2 to 3 dollars: selected_facets=price:[2+TO+3] But using the xapian backend, it returns nothing. I…
leech
  • 8,293
  • 7
  • 62
  • 78
1
vote
1 answer

How to implement searching for a specified user's documents?

In my current project, users can like songs, and now I'm going to add a song search so that a user can search for some song she has liked before. I have implemented search engines using xapian before, which involves building indexes of documents…
satoru
  • 31,822
  • 31
  • 91
  • 141
1
vote
1 answer

Multi language full text search including stemming in Django / Python

Currently we use Djapian + Xapian in our Django-based multi-language projects for full text search. In order to use stemming for each language, we create a different search index for each language. Inside Django, we decide based on the user's…
Simon Steinberger
  • 6,605
  • 5
  • 55
  • 97
1
vote
1 answer

Document search in Lucene/Solr, Whoosh, Sphinx, Xapian

I am comparing Lucene/Solr, Whoosh, Sphinx and Xapian for searching documents in DOC, DOCX, HTML and PDF. Only Solr is documented to have a document parser (Tika) which directly indexes documents. So it seems a clear winner. But to level the playing…
Jesvin Jose
  • 22,498
  • 32
  • 109
  • 202
1
vote
1 answer

indexing code documentation sites (sphinx, postgres full-text, xapian...?)

I was wondering which of the indexing libraries out there would be suitable for a site of code documentation where you can't just ignore "punctuation" as insignificant. (In some of the languages I'm interested in, punctuation can be part…
1
vote
2 answers

Java CSS Crawler

I'm looking for a web crawler with the ability to grab the page's CSS. I don't need any other fancy crawling abilities. I'm trying to make my way through Xapian, Nutch and Heritrix. They all seem to be a bit complex. If anyone has any experience or…
Trevor
  • 1,858
  • 4
  • 21
  • 28
1
vote
0 answers

How to get Xapian working with MSVC?

I would like to use Xapian search engine with a Qt application I am developing. The compiler used is MSVC (Visual Studio 2013). As it turns out, the Xapian download page (https://xapian.org/download) which was supposed to hold the link to a set of…
Hari Mohan
  • 53
  • 8
1
vote
1 answer

Search range of int values using djapian

I'm using djapian as my search backend, and I'm looking to search for a range of values. For example: query = 'comments:(0..10)' Post.indexer.search(query) would search for Posts with between 0 and 10 comments. I cannot find a way to do this in…
Blue Peppers
  • 3,718
  • 3
  • 22
  • 24
1
vote
0 answers

Is more suitable for my Django searching feature a DB-full-text or Haystack module?

I'm using Django with Python3 and Postgresql I've read that Haystack uses Elastic Search (and I dislike Java), but I see Xapian-Haystack doesn't work with Python3 (but I've heard about Xapian before and I think like it). djorm-ext-pgfulltext is a…
1
vote
1 answer

xapian auto-complete

Has anyone ever used Xapian for implementing an Auto-Complete/Auto-Suggest feature? i.e. providing possible set of suggestions as the user types a.k.a. Google's Auto-Suggest. I have about 2 million phrases for which I am considering using Xapian as…
Srikar Appalaraju
  • 71,928
  • 54
  • 216
  • 264
1
vote
0 answers

ImportError: The Python module 'xapian_backend' has no 'XapianEngine' class

I upgraded Django from 1.5 to 1.7 with django-haystack django-haystack==2.0.0 and xapian xapian-haystack==1.1.5b0 getting an error. [Mon Nov 16 13:32:48.685396 2015] [wsgi:error] [pid 692193:tid 139901570070272] [remote 127.0.0.1:25439] File…
user4910881
1
vote
0 answers

Prioritize items on Xapian query

I am currently using Xapian to perform some queries over Debian packages. I am using the tfidf algorithm to weight package terms over all my installed packages and then I search the apt-xapian-index with the terms with most significant…
lucasmoura
  • 275
  • 1
  • 10