3

I am trying to download some data from the datastore using the following command:

appcfg.py download_data --config_file=bulkloader.yaml --application=myappname 
                        --kind=mykindname --filename=myappname_mykindname.csv
                        --url=http://myappname.appspot.com/_ah/remote_api 

When I didn't have much data in this particular kind/table I could download the data in one shot - occasionally running into the following error:

.................................[ERROR   ] [Thread-11]
ExportProgressThread:
Traceback (most recent call last):
  File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 1448, in run
    self.PerformWork()
  File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2216, in PerformWork
    item.key_end)
  File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2011, in StoreKeys
    (STATE_READ, unicode(kind), unicode(key_start), unicode(key_end)))
OperationalError: unable to open database file

This is what I see in the server log:

Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 277, in post
    response_data = self.ExecuteRequest(request)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 308, in ExecuteRequest
    response_data)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 86, in MakeSyncCall
    return stubmap.MakeSyncCall(service, call, request, response)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 286, in MakeSyncCall
    rpc.CheckSuccess()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_rpc.py", line 126, in CheckSuccess
    raise self.exception
ApplicationError: ApplicationError: 4 no matching index found. 

When that error appeared I would simply re-run the download and things would work out well.

Of late, I am noticing that as the size of my kind increases, the download tool fails much more often. For instance, with a kind with ~3500 entities I had to run to the command 5 times - only the last of which succeeded. Is there a way around this error? Previously, my only worry was I wouldn't be able to automate downloads in a script because of the occasional failures - now I am scared I won't be able to get my data out at all.

This issue was discussed previously here but the post is old and I am not sure what the suggested flag does - hence posting my similar query again.


Some additional details. As mentioned here I tried the suggestion to proceed with interrupted downloads (in the section Downloading Data from App Engine ). When I resume after the interruption, I get no errors, but the number of rows that are downloaded are lesser than the entity count the datastore admin shows me.This is the message I get:

[INFO    ] Have 3220 entities, 3220 previously transferred
[INFO    ] 3220 entities (1003 bytes) transferred in 2.9 seconds

The datastore admin tells me this particular kind has ~4300 entities. Why aren't the remaining entities getting downloaded?

Thanks!

abhgh
  • 308
  • 1
  • 9

1 Answers1

0

I am going to make a completely uneducated guess at this just based on the fact that I saw the word "unicode" in the first error; I had an issue that was related to my data being user generated from the web. A user put in a few unicode characters and a whole load of stuff started breaking - probably my fault - as I had implemented pretty looking repr functions and a load of other stuff. If you can, take a quick scan of your data via the console utility in your live app, maybe (if it's only 4k records), try converting all of the data to ascii strings to find any that don't conform.

And after that, I started "sanitising" user inputs (sorry, but my "public handle" field needs to be ascii only players!)

Richard Green
  • 2,037
  • 2
  • 20
  • 38
  • Hi, Thanks for your response. I am currently managing with handler that displays the report in a page, and I save it from there. I know its sad, but ... :) Give me a few days to try your suggestion out - I am currently engaged elsewhere. Will let you know how it goes. Thanks again! – abhgh Mar 20 '11 at 15:09