0

I've got a Google App Engine project that uses the Google Cloud Language API, and I'm using the Google API Client Library (Python) to make the API calls.

When running my unit tests, I make quite a few calls to the API. This slows down my testing and also incurs costs.

I'd like to cache the calls to the Google API to speed up my tests and avoid the API charges, and I'd rather not roll my own if another solution is available.

I found this Google API page, which suggests doing this:

import httplib2
http = httplib2.Http(cache=".cache")

And I've added these lines to my code (there is another option to use GAE memcache but won't be persisted between test code invocations) and right after these lines, I create my API call connection:

NLP = discovery.build("language", "v1", API_KEY)

The caching isn't working and the above solution seems too simple so I suspect I am missing something.

UPDATE:

I updated my tests so that App Engine is not used (just a regular unit test) and I also figured out that I can pass the http I created to the Google API client like this:

NLP = discovery.build("language", "v1", http, API_KEY)

Now, the initial discovery call is cached but the actual API calls are not cached,e.g., this call is not cached:

result = NLP.documents().annotateText(body=data).execute()
new name
  • 15,861
  • 19
  • 68
  • 114

3 Answers3

1

The suggested code:

http = httplib2.Http(cache=".cache") is trying to cache to the local filesystem in a directory called ".cache". On App Engine, you cannot write to the local filesystem, so this does nothing.

Instead, you could try caching to Memcache. The other suggestion on the Python Client docs referenced is to do exactly this:

  from google.appengine.api import memcache

  http = httplib2.Http(cache=memcache)

Since all App Engine apps get free access to shared memcache this should be better than nothing.

If this fails, you could also try memoization. I've had success memoizing calls to slow or flaky APIs, but it comes at the cost of increased memory usage (so I need bigger instances).

EDIT: I see from your comment you're having this problem locally. I was originally thinking that memoization would be an alternative, but the need to hack on httplib2 makes that overly complicated. I'm back to thinking about how to convince httplib2 to do the right thing.

Jesse Scherer
  • 1,492
  • 9
  • 26
  • I was thinking the file system version would work since I am testing on my local machine. I tried the memcache version but since memcache isn't persisted between invocations of my unit tests, that doesn't work either. – new name Nov 14 '17 at 19:02
  • is there anything in the .cache directory after a single run? – Jesse Scherer Nov 14 '17 at 20:37
  • Nope, it seems that the unit tests are sandboxed in a similar way to deployed apps so that even the unit tests can't write to disk. One solution might be to separate unit tests that need GAE functionality from those that don't. – new name Nov 14 '17 at 21:33
  • Joe, I modified my tests so that they are outside of GAE, but the .cache directory is still empty. I'd appreciate any further insights you might have. – new name Nov 16 '17 at 12:42
0

If you're trying to make a test run faster by caching an API call result, stop and consider whether you may have taken a wrong turn.

If can you restructure your code such that you can replace the API call with a unittest.mock, your tests will run much, much faster.

Dave W. Smith
  • 24,318
  • 4
  • 40
  • 46
  • Yes, I could definitely do this. The reason I haven't is that it is tedious and time consuming to do this for a whole bunch of API calls. Especially since the return data of the API is somewhat large. A turnkey caching solution would save me a lot of time. ;-) – new name Nov 14 '17 at 23:31
0

I just came across vcrpy which seems to do exactly this. I'll update this answer after I've had a chance to try it out.

new name
  • 15,861
  • 19
  • 68
  • 114