0

I have spent at least two hours now trying to get this to work. I have seen quite a few different questions on SO and in the Google groups, but none of the answers seem to work for me.

Question: How do I bulk upload data as in the CSV file below to the datastore to create entities that have the key_name defined in the CSV file (same result as using the add function below).

This is my model:

class RegisteredDomain(db.Model):
    """
    Domain object class. It has no fields because it's existence is
    proof that it has been registered. Indivdual registered domains
    can be found using keys.
    """
    pass

Here is how I usually add/delete domains, etc:

def add(domains):
    """
    Add domains. This functions accepts a single domain string or a
    list of domain strings and adds them to the database. The domain(s)
    must be valid unicode strings (a ValueError is thrown if the domain
    strings are not valid.
    """
    if not isinstance(domains, list):
        domains = [domains]

    cleaned_domains = []
    for domain in domains:
        clean_domain_ = clean_domain(domain)
        is_valid_domain(clean_domain_)
        cleaned_domains.append(clean_domain_)

    domains = cleaned_domains

    db.put([RegisteredDomain(key_name=make_key(domain)) for domain in domains])


def get(domains):
    """
    Get domains. This function accepts a single domain string or a list
    of domain strings and queries the database for them. It returns a
    dictionary containing the domain name and RegisteredDomain object or
    None if the entity was not found.
    """
    if not isinstance(domains, list):
        domains = [domains]

    entities = db.get([Key.from_path('RegisteredDomain', make_key(domain)) for domain in domains])
    return dict(zip(domains, entities))

Note: in the above code make_key simply makes the domain lowercase and prepends a 'd'.

So there's that. Now I am going crazy trying to upload some RegisteredDomain entities from a CSV file. Here is the CSV file (note the first char 'd' is there because of the fact that a key name may not start with a number):

key
dgoogle.com
dgoogle11.com
dfacebook.com
dcool.com
duuuuuuu.com
dsdsdsds.com
dffffooo.com
dgmail.com

I have not been able to auto-generate the bulkloader yaml file because app engine has still not update my datastore stats (1 day plus a few hours). So this (and many similar permutations) is what I have come up with (mostly changing the import_transform bit):

python_preamble:
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.api.datastore
- import: google.appengine.ext.db
- import: utils
- import: bulk_helper

transformers:
- kind: RegisteredDomain
  connector: csv
  connector_options:
    encoding: utf-8
  property_map:
    - property: __key__
      external_name: key
      export_transform: bulk_helper.key_to_reverse_str
      import_template: transform.create_foreign_key('RegisteredDomain')

Now for some reason when I try to upload its says that all goes fine and x entities have been transferred etc, but nothing gets updated in the datastore (as I can see from the admin console). Here is how I upload:

appcfg.py upload_data --application=domain-sandwich --kind=RegisteredDomain --config_file=bulk.yaml --url=http://domain-sandwich.appspot.com/remote_api --filename=data.csv 

And finally this is what my datastore viewer looks like: Datastore Viewer

Note: I am doing this both on the dev-server and on appengine (whatever works...).

Thanks for any help!

Community
  • 1
  • 1
o1iver
  • 1,805
  • 2
  • 17
  • 23

1 Answers1

0

The problem is a bug in the appengine bulkloader (or datastore API). I posted a few issue about this problem (issue 1, issue 2, issue 3, issue 4), but here is the text for the bulkloader error for future reference:

VERSION:
release: "1.5.2"
timestamp: 1308730906
api_versions: ['1']

The bulkloader will not import models without properties. Example:

class MetaObject(db.Model):
    """
    Property-less object. Identified by application set key.
    """
    pass

In an application you can use these entities like this:

db.put([MetaObject(key_name=make_key(obj)) for obj in objs])
db.get([Key.from_path('MetaObject', make_key(obj)) for obj in objs])
db.delete([Key.from_path('MetaObject', make_key(obj)) for obj in objs])

Now the problem occurred when I tried to import data using the bulkloader. After looking through the bulkloader code, the bug turned out to be in the EncodeContent method (lines 1400-1406):

1365   def EncodeContent(self, rows, loader=None):
1366     """Encodes row data to the wire format.
1367
1368     Args:
1369       rows: A list of pairs of a line number and a list of column values.
1370       loader: Used for dependency injection.
1371
1372     Returns:
1373       A list of datastore.Entity instances.
1374
1375     Raises:
1376       ConfigurationError: if no loader is defined for self.kind
1377     """
1378     if not loader:
1379       try:
1380         loader = Loader.RegisteredLoader(self.kind)
1381       except KeyError:
1382         logger.error('No Loader defined for kind %s.' % self.kind)
1383         raise ConfigurationError('No Loader defined for kind %s.' % self.kind)
1384     entities = []
1385     for line_number, values in rows:
1386       key = loader.generate_key(line_number, values)
1387       if isinstance(key, datastore.Key):
1388         parent = key.parent()
1389         key = key.name()
1390       else:
1391         parent = None
1392       entity = loader.create_entity(values, key_name=key, parent=parent)
1393
1394       def ToEntity(entity):
1395         if isinstance(entity, db.Model):
1396           return entity._populate_entity()
1397         else:
1398           return entity
1399
1400       if not entity:
1401
1402         continue
1403       if isinstance(entity, list):
1404         entities.extend(map(ToEntity, entity))
1405       elif entity:
1406         entities.append(ToEntity(entity))
1407
1408     return entities

Because (will also post an issue for this one) the datastore Entity object subclasses dict without overriding the nonzero or len methods an Entity that does not contain any properties, but does have a key, will not be True (makes "if not entity" true even whena key has been set) and will thus not be appended to entities.

Here is a diff that fixes this in the bulkloader OR by overriding nonzero in Entity (either one works):

--- bulkloader.py       2011-08-27 18:21:36.000000000 +0200
+++ bulkloader_fixed.py 2011-08-27 18:22:48.000000000 +0200
@@ -1397,12 +1397,9 @@
         else:
           return entity

-      if not entity:
-
-        continue
       if isinstance(entity, list):
         entities.extend(map(ToEntity, entity))
-      elif entity:
+      else:
         entities.append(ToEntity(entity))

     return entities
--- datastore.py        2011-08-27 18:41:16.000000000 +0200
+++ datastore_fixed.py  2011-08-27 18:40:50.000000000 +0200
@@ -644,6 +644,12 @@

     self.__key = Key._FromPb(ref)

+  def __nonzero__(self):
+      if len(self):
+          return True
+      if self.__key:
+          return True
+
   def app(self):
     """Returns the name of the application that created this entity, a
     string or None if not set.

Posted bug reports:

Issue 1: http://code.google.com/p/googleappengine/issues/detail?id=5712

Issue 2: http://code.google.com/p/googleappengine/issues/detail?id=5713

Issue 3: http://code.google.com/p/googleappengine/issues/detail?id=5714

Issue 4: http://code.google.com/p/googleappengine/issues/detail?id=5715

Community
  • 1
  • 1
o1iver
  • 1,805
  • 2
  • 17
  • 23