0

I'm building indexes that seems to work like this

    doc = search.Document(doc_id=str(article.key()), fields=[
        search.TextField(name='title', value=article.title),
        search.TextField(name='text', value=article.text),
        search.TextField(name='city', value=article.city),
        search.TextField(name='region', value=article.region),
        search.NumberField(name='cityID', value=city_entity.key().id()),
        search.NumberField(name='regionID', value=region_entity.key().id()),
        search.NumberField(name='category', value=int(article.category)),
        search.NumberField(name='constant', value=1),
        search.NumberField(name='articleID', value=article.key().id()),
        search.TextField(name='name', value=article.name)
        ], language='en')
    search.Index(name='article').add(doc)

The app gets a new article that populates the index by the code above which seems to work. The index is built and I can search the entities with search API. But I don't want older articles than 60 days, so how can I adjust to that? There is a "created" and "updated" timestamp for the entity:

added = db.DateTimeProperty(verbose_name='added', auto_now_add=True)  # readonly
modified = db.DateTimeProperty(verbose_name='modified',
                                   auto_now_add=True)

Should I have a cron job every 24 hrs that rebuilds the entire index, or a cron job every 24 hrs that removes the oldest entities from the index? Now I'm not adding the addedand modified variables to the index which can be useful also in the index, if I want to search for e.g. a certain timestamp in the index(?) so now that I see that it's working I ask if I aslo much act on the index variables and add the added and modified variables to the index?

Niklas Rosencrantz
  • 25,640
  • 75
  • 229
  • 424

1 Answers1

1

Indexes are built automatically and continuously and you have no control over this process. When an entity is changed (or created/removed) the index gets updated. There is no way to exclude certain entities from this.

If you do not need old documents at all then you should remove them.

But in both cases (serving or removing) you'll need to use multiple equality filters (on title, text, city, etc..) and one inequality filter (on created), so you'll need to configure a compound index.

Peter Knego
  • 79,991
  • 11
  • 123
  • 154
  • Thanks for the answer. I take it that if I just remove an entity then the entity will be gone from the index when the index updates. But I haven't read about how indexes update (and I didn't program any index update). – Niklas Rosencrantz Feb 17 '13 at 09:31
  • 1
    Indexes update automatically right after entity is updated, but they are updated asynchronously (= result of entity update is not visible immediately in index): see the apply phase in https://developers.google.com/appengine/articles/life_of_write – Peter Knego Feb 17 '13 at 09:38
  • I do `search.Index(name='article').add(doc)` and I read that might rather be a `.put` than a `.add` but `.add` seems to work while I don't see it on the pages. I can be better off making an antirely new index since this was experimental first try. And a success since I can search the index more flexible than my alternative which didn't allow for such good combining of fields that google search api can. I'll make a new index that also contains the field `created` and remove outdated (> 60 days) enties from the searches. So I suppose I just delete the old entities or experiment with updates. – Niklas Rosencrantz Feb 17 '13 at 12:57