2

I'm having issues getting the indexing process to work. I have a model called Article and in the db I have 943 records in the Article table. For testing locally, I have a sqlite db with 12 articles, and the rebuild_index and update_index run fine. However, when I upload to our web server I get the following output from rebuild_index or update_index:

>python manage.py update_index
>Indexing 943 articles
>Killed

I looked at this answer Django Haystack/ElasticSearch indexing process aborted, but I would like to avoid changing the haystack source code if I can help it. Has anyone else run into this? Also, I'm using whoosh as a backend. Thank you!

Here's the model class:

class Article(models.Model):
    title = models.CharField(max_length=100)
    authors = models.ManyToManyField(User)
    abstract = models.CharField(max_length=500, blank=True)
    full_text = models.TextField(blank=True)
    proquest_link = models.CharField(max_length=200, blank=True, null=True)
    ebsco_link = models.CharField(max_length=200, blank=True, null=True)

def __unicode__(self):
    return self.title

Here's the index class:

class ArticleIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.NgramField(document=True, use_template=True)
    title = indexes.NgramField(model_attr='title')

    #We'll see how this goes
    authors = indexes.NgramField(model_attr='authors')
    abstract = indexes.NgramField(model_attr='abstract')
    proquest_link = indexes.NgramField(model_attr='proquest_link')
    ebsco_link = indexes.NgramField(model_attr='ebsco_link')



def get_model(self):
    return Article

def index_queryset(self, using=None):
    return self.get_model().objects.all()
Community
  • 1
  • 1
AndrewSmiley
  • 1,933
  • 20
  • 32
  • 2
    How much memory has your server? It sounds like it is running out of space? What information have you indexed for the model? – Timmy O'Mahony Dec 18 '13 at 13:48
  • 594M. We're running on an AWS EC2 instance. Yeah that's what it seems like, not sure how to fix it yet. I'll update the question and show you the model & index class – AndrewSmiley Dec 18 '13 at 13:54
  • I had to use `solango` (which is the precursor to haystack) and had awful memory issues as the python/django indexer that sits on top of Solr was reading in the entire dataset before iterating over it so it could have something to do with that although I'm not sure if haystack works around those issues – Timmy O'Mahony Dec 18 '13 at 13:57
  • Yeah that could be. I'm not entirely sure of the steps haystack takes to build the indexes, but I wouldn't be surprised if it was running like that – AndrewSmiley Dec 18 '13 at 14:02

2 Answers2

3

I opened a new terminal window and ran top. Turns out the indexing process was using 99.9% of cpu, which is why it was failing. The memory usage was actually not that bad. So it's not with the code, but rather the server as it is an ec2 micro instance, which is very limited in cpu. Thanks again Timmy O'Mahony for pointing me in the right direction.

AndrewSmiley
  • 1,933
  • 20
  • 32
0

What I did is create my own command on my project safe_update_index which is a copy of the original command plus the change from that other answer about pks_seen.

Jj.
  • 3,160
  • 25
  • 31