0

I have a task that simply creates an entity into the datastore. I now queue up many tasks into a named push queue and let it run. When it completes, I see in the log that all of the task request were run. However, the number of entities created was actually lower than expected.

The following is an example of the code I used to test this. I ran 10000 tasks and the final result only has around 9200 entities in the datastore.

I use RestEasy to expose urls for the task queues.

queue.xml

<queue>
    <name>testQueue</name>
    <rate>5/s</rate>
</queue>

Test Code

@GET
@Path("/queuetest/{numTimes}")
public void queueTest(@PathParam("numTimes") int numTimes) {
    for(int i = 1; i <= numTimes; i++) {
        Queue queue = QueueFactory.getQueue("testQueue");
        TaskOptions taskOptions = TaskOptions.Builder.withUrl("/queuetest/worker/" + i).method(Method.GET);
        queue.add(taskOptions);
    }
}

@GET
@Path("/queuetest/worker/{index}")
public void queueTestWorker(@PathParam("index") String index) {
    DateFormat df = new SimpleDateFormat("MM/dd/yyyy HH:mm:ss");
    Date today = Calendar.getInstance().getTime();        
    String timestamp = df.format(today);

    Entity tObj = new Entity("TestObj");
    tObj.setProperty("identifier", index);
    tObj.setProperty("timestamp", timestamp);

    DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
    Key key = datastore.put(tObj);
}

I have ran this a few times and not once have I seen all of the entities created.

Is it possible that tasks can be discarded if there is too much contention on the queue? Is this the expected behavior for a task queue?

#### EDIT

I followed mitch's suggestion to log the entity IDs that are created and found that they are indeed created as expected. But the logs themselves displayed some weird behavior in which logs from some tasks appear in another task's log. And when that happens, some tasks show 2 entity IDs in a single request.

For the tasks that display 2 entity IDs, the first one it logs are the missing entities in the datastore. Does this mean there is a problem with a high number of puts to the datastore? (The entities i'm creating are NOT part of a larger entity group, i.e. It doesn't refer to @parent)

bighonestjohn
  • 328
  • 3
  • 8
  • Is your app running out of threads in the thread pool and dropping incoming connections? – Colin M Jul 12 '13 at 18:35
  • Are multiple tasks creating the same entry to the datastore and overwriting each other. – Lee Meador Jul 12 '13 at 18:35
  • Which indexes are getting lost? All at the end? Scattered randomly? In clumps? Similarly for the time stamp. – Lee Meador Jul 12 '13 at 18:37
  • @ColinMorelli it could be this but I don't know how to check this, could point me to any documentation about this? Thanks – bighonestjohn Jul 12 '13 at 19:07
  • @LeeMeador each task should be creating a brand new entity each time since I use the DatastoreService's put method without passing a key. – bighonestjohn Jul 12 '13 at 19:10
  • @LeeMeador The missing entities seem random, but I will test again to be sure – bighonestjohn Jul 12 '13 at 19:10
  • This does not answer your question, but is simply an alternate approach. I would add all these items to a pull queue. Inside the pull queue, lease a few hundred, loop through them adding your objects to a list, and then do a batch put. Repeat until the queue is empty (or restart the task). Be sure to handle the occasional TransientError when requesting the leased items. This will be a much, much more efficient use of your instance cpu cycles. HTH -stevep – stevep Jul 12 '13 at 19:22

1 Answers1

0

Why don't you add a log statement after each datastore.put() call which logs the ID of the newly created entity. Then you can compare the log to the datastore contents and you will be able to tell if the problem is that datastore.put() is not being invoked successfully 1000 times or if the problem is that some of the successful put calls are not resulting in entities that you see in the datastore.

  • I did this and I get some weird behavior. I looked at all the requests and some worked ok. But there were some requests that logged 2 different entity IDs and some request that didn't log any at all. So it seems that the log statements are somehow being printed in another request rather than itself. So ran this on a small number of 20 entities and saw 20 different IDs but there were actually only 18 entities created. Is there some datastore concurrency issue with puts? – bighonestjohn Jul 13 '13 at 10:12