0

I have a model named UserModel and I know that it will never grow beyond 10000 entities. I don't have anything unique in the UserModel which I can use for creating a key. Hence I decided to have string keys which are of this format USRXXXXX.

Where XXXXX represent the serial count. e.g USR00001, USR12345

Hence I chose to have a following way to generate the IDs

def generate_unique_id():
    qry = UserModel.query()
    num = qry.count() + 1
    id = 'USR' + '%0.5d' % num
    return id

def create_entity(model, id, **kwargs):
    ent = model.get_or_insert(id, **kwargs)
    # check if its the newly created record or the existing one
    if ent.key.id() != id:
        raise InsertError('failed to add new user, please retry the operation)
    return True

Questions:

  1. Is this the best way of achiving serial count of fixed width. Whethe this solution is optimal and idiomatic?

  2. Does using get_or_insert like above guarantees that I will never have duplicate records.

  3. Will it increase my billing, becuase for counting the number of records I an doing UserModel.query() without any filters. In a way I am fetching all the records. Or billing thing will not come in picture till I user fetch api on the qry object?

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
Vivek Jha
  • 1,520
  • 3
  • 16
  • 26

2 Answers2

2

Since you only need a unique key for the UserModel entities, I don't quite understand why you need to create the key manually. The IDs that are generated automatically by App Engine are quaranteed to be unique.

Regarding your questions, we have the following:

  1. I think not. Maybe you should first allocate IDs (check section Using Numeric Key IDs), order it, and use it.

  2. Even though get_or_insert is strong consistent, the query you perform (qry = UserModel.query()) is not. Thus, you may result in overwriting existing entities. For more information about eventual consistency, take a look here.

  3. No, it will not increase your billing. When you execute Model.query().count(), the datastore under the hood executes a Model.query().fetch(keys_only=True) and counts the number of results. Keys-only queries generate small datastore operations, which based on latest pricing changes by Google are not billable.

Thanos Makris
  • 3,115
  • 2
  • 17
  • 26
1
  1. Probably not. You might get away with what you are trying to do if your UserModel entities have ancestors for stronger consistency.

  2. No, get_or_insert does not guarantee that you won't have duplicates. Although you are unlikely to have duplicates in this case, you are more likely to loose data. Say you are inserting two entities with no ancestors - Model.query().count() might take some time to reflect the creation of the first entity causing the second entity to have the same ID as the first one and thus overwriting it (i.e. you end up with the 2nd entity only that has the ID of the first one).

  3. Model.query().count() is short for len(Model.query().fetch()) (although with some optimizations) so every time you generate an ID you fetch all entities.

Mihail Russu
  • 2,526
  • 1
  • 17
  • 27
  • But using a parent will put every entity in the UserModel in the same entity group, hence there will be write speed limitations. Isn't it a huge price to pay for just getting a sequential ID? Becuase for rest of the application, I am fine with eventual consistency. – Vivek Jha Sep 30 '14 at 22:22
  • Is there any way I can avoid parent queries for creating new records in UserModel and still have a guarantee of no duplicates or data loss? – Vivek Jha Sep 30 '14 at 22:24
  • Sharding counters is one way, but for that I will have to create a separate model just for storing a count. – Vivek Jha Sep 30 '14 at 22:27
  • Unfortunately I am not aware of any good/scalable ways of implementing auto-increment but take a look at this discussion for some ideas: http://stackoverflow.com/questions/3985812/how-to-implement-autoincrement-on-google-appengine – Mihail Russu Sep 30 '14 at 22:49