0

I want to use flask_whooshalchemyplus to manually index two tables named "Traduzioni" and "TraduzioniDlg" for full indexing. I created a simple endpoint on Flask, triggered by JQuery ajax request. The process stops apparentely without raising any exception. The tables contain text in English, Italian and Arabic, I think it could be related with the error. How can I manage different charsets with Whoosh?

## FILE views.py
from app import app
# ....
import whoosh
import flask_whooshalchemyplus
from flask_whooshalchemyplus import index_all

# .... A LOT OF STUFF HERE

@app.route("/createIndexes", methods=['GET'])
@login_required
def createIndexes():
    d = ""
    try:
        index_all(app)
    except e:
        d = e
    stjson = {'mimetype':'application/json', 'status_code':200, "rows":d}
    return jsonify(resp=stjson)

1) No xhr status returned on Firefox console.

2) On server side (Pythonanywhere) 499 error is raised, here is the output:

93.41.1.147 - archeo [21/Jan/2020:15:22:49 +0000] "GET /createIndexes HTTP/1.1" 499 0 "https://fabioquintilii.pythonanywhere.com/admin" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0" "93.41.1.147" response-time=8.700

3) Server log stops after processing the first table: "2020-01-21 15:22:40 Indexing Traduzioni... "

4) In search.db/ folder the structure is as follows:

.
├── Traduzioni
│   ├── MAIN.tmp
│   │   ├── 3izw6phrod2o1ojvhs55ymaywukg.ctmp
│   │   ├── kkfrncvehj353od2zr7qdfz0ype7.ctmp
│   │   └── ogkmwxvb86vl3od6kwm533d3l658.ctmp
│   ├── MAIN_1kzdbfhp5z2389ms.pst
│   ├── MAIN_1kzdbfhp5z2389ms.trm
│   ├── MAIN_1kzdbfhp5z2389ms.vps
│   ├── MAIN_23qeml6mtoagefdb.pst
│   ├── MAIN_23qeml6mtoagefdb.trm
│   ├── MAIN_23qeml6mtoagefdb.vps
│   ├── MAIN_87ifp68y3amsfxmo.pst
│   ├── MAIN_87ifp68y3amsfxmo.trm
│   ├── MAIN_87ifp68y3amsfxmo.vps
│   ├── MAIN_WRITELOCK
│   └── _MAIN_0.toc
├── TraduzioniDlg
│   └── _MAIN_0.toc
└── tree.txt

3 directories, 16 files

2 Answers2

1

Try changing the timeout on your xhr request. Another alternative would be to not run the indexing from a web request, but to do it from a console so you can see what it's doing more easily.

Glenn
  • 7,262
  • 1
  • 17
  • 23
  • Thank you Glenn, it sounds good, but I must give the opportunity to the final user to load a lot of data directly to mysql, and after doing this, to update index from an admin page. – Fabio Quintilii Jan 25 '20 at 14:14
0

A Lot of thanks to Glenn and Filip for your suggestions! I had a timeout setting too short, i fixed the issue in $.AjaxSetup()

$.ajaxSetup({delay:6000, timeout:9000}); -> $.ajaxSetup({delay:6000, timeout:30000});

I tried to study deeply whoosh and I changed my code like this:

class TranslationSchema:
    # a class to create different schema
    TS_SCHEMA_TYPE_TEXT = 1

    def text_schema(self):
        return Schema(content=TEXT(analyzer=StemmingAnalyzer()))

    def get_schema(self, schema_type):
        if schema_type == self.TS_SCHEMA_TYPE_TEXT:
            return self.text_schema()

class IndexManager:
    def __init__(self,app):
        self.app = app

    def create_all_indexes(self):
        #all classes in module
        all_models = self.app.extensions['sqlalchemy'].db.Model._decl_class_registry.values()
        # __searchable__ class only
        models = [i for i in all_models if hasattr(i, '__searchable__')]

        schema = TranslationSchema()
        # delete existing index directory and created a new one with the same name
        if os.path.isdir(os.environ['WHOOSH_BASE']):
            rmtree(os.environ['WHOOSH_BASE'])
            os.mkdir(os.environ['WHOOSH_BASE'])
        else:
            os.mkdir(os.environ['WHOOSH_BASE'])
        # index created
        ix_text = create_in(os.environ['WHOOSH_BASE'], schema.get_schema(schema.TS_SCHEMA_TYPE_TEXT), indexname="text_search")
        # writer created
        writer_text = ix_text.writer()
        # adding documents to index
        for model in models:
            record = model.query.all()
            for rec in record:
                writer_text.add_document(content = unicode(rec.testo))
        writer_text.commit()


        return

The index is created with a tree like below, the *.seg file has a size of 1.1M could it be correct?

search.db/
├── _text_search_1.toc
├── text_search_2uo7ujdyypb7n3se.seg <-- 1.1M
└── text_search_WRITELOCK

0 directories, 3 files