Questions tagged [whoosh]

Whoosh is a fast, featureful, full-text indexing and searching library implemented in pure Python.

Fast, pure-Python, full text indexing, search and spell checking library. Whoosh on the Python Package Index

Whoosh Documentation

373 questions
1
vote
2 answers

Multiprocessing search without duplicating index in memory

I have to search a large table of scientific journal articles for some specific articles I have in a separated file. My approach is to build a search index from the large table using Whoosh, and then search for each article of the separated file in…
smint75
  • 11
  • 1
1
vote
1 answer

Saving the index in Whoosh

I was looking a suggestion or preferably an example, to store or save the index of Whoosh. I am using Python 2.7 on Windows 7 Professional. If anyone may kindly help.
Searcher
  • 11
  • 3
1
vote
0 answers

Facet_count not working using Whoosh backend

I am not getting whoosh facet_counts to work with Haystack in my Django project: Python==2.7.8, Whoosh==2.7, Haystack==2.3.1. I received the error "Warning: Whoosh does not handle faceting" when I run the following: In [1]: from haystack.query…
Chris Wedgwood
  • 660
  • 6
  • 11
1
vote
1 answer

Maximum Whoosh Index size?

I'm using a 32-bit Ubuntu machine. I'm trying to create a Whoosh index of a 27GB file. But my system is crashing after index size of 3GB. Is there any size constraint on Whoosh index size? If not then what can be the problem.
Gaurav Singh
  • 12,707
  • 5
  • 22
  • 24
1
vote
1 answer

Python whoosh - get all unique values from an index field

I have an index of git repositories and I'm saving the name of the repository the file belongs to in each document. The repository field format is {section}/{repo} and it is a TEXT field. I want to achieve pretty simple thing: list of all…
Eugene Sajine
  • 8,104
  • 3
  • 23
  • 28
1
vote
1 answer

Integrating Haystack in Django-CMS omitting Pages with View-Restrictions

I want to integrate haystack with django-cms making a search view. My CMS has pages with view restrictions (only a few authenticated users have access to some pages). The problem is: when making a search, haystack gives me list with results from…
Blue K
  • 21
  • 6
1
vote
1 answer

Flask-whooshalchemy - Changing underlying schema

Getting the following error when I changed a table column from post_text to post_text1. I've updated my model and search accordingly % (name, schema)) UnknownFieldError: No field named 'post_text1' in And heres the…
John
  • 1,677
  • 4
  • 15
  • 26
1
vote
1 answer

Python Whoosh not accepting single character

I am trying to parse a query which has text plus number. Example: Apple iphone 6 results in: Results for And([Term('title', u'apple'), Term('title', u'iphone')]) while Apple iphone 62 results in: Results for And([Term('title', u'apple'),…
blackmamba
  • 1,952
  • 11
  • 34
  • 59
1
vote
0 answers

Is there a way to convert a Lucene index into Whoosh or MongoDB?

Lucene is a popular text indexing tool (http://lucene.apache.org/). But installing lucene for pythonic usage is a heck of a work (Building Pylucene on ubuntu 14.04(trusty tahr)). Whoosh is a python based indexing library…
alvas
  • 115,346
  • 109
  • 446
  • 738
1
vote
1 answer

Whoosh - Slop Operator Behaviour

# Text: income tax expense resulting from the utilization of net operating loss carry forwards Query Formats tried: q = QueryParser(u"content", ix.schema).parse(u"income utilization~3") q = QueryParser(u"content", ix.schema).parse(u"'income…
Siva Arunachalam
  • 7,582
  • 15
  • 79
  • 132
1
vote
1 answer

How do I preserve new lines when extracting text from html using lxml.text_content()

I am trying to learn to use Whoosh. I have a large collection of html documents I want to search. I discovered that the text_content() method creates some interesting problems for example I might have some text that is organized in a table that…
PyNEwbie
  • 4,882
  • 4
  • 38
  • 86
1
vote
2 answers

Indexing CSV file contents in Python

I have a very large CSV file contaning only two fields (id,url). I want to do some indexing on the url field with python, I know that there are some tools like Whoosh or Pylucene. but I can't get the examples to work. can someone help me with this?
Hossein
  • 40,161
  • 57
  • 141
  • 175
1
vote
0 answers

Django Haystack exact Query not behaving as expected

I am using django haystack's Exact match (with Whoosh) to filter and getting unexpected results. search_index.py class AssetIndex(indexes.SearchIndex, indexes.Indexable): text = indexes.CharField(document=True, use_template=True) …
imns
  • 4,996
  • 11
  • 57
  • 80
1
vote
1 answer

"Directory does not exist" error while setting up Whoosh on Python

I have a collection of documents and I want to create a search engine for my website. The documents are static and in another question, they suggested me Whoosh. However, I cannot even setup through the documentation help code. from whoosh.fields…
Sfinos
  • 279
  • 4
  • 15
1
vote
1 answer

Full text search with Python

I have a huge HTML file with text, tables and images (with alt info). I have a full text search function only for this file, but at the moment I use a strict way with string comparison. I want to improve the function and return the top 5 paragraphs…
Sfinos
  • 279
  • 4
  • 15