0

I'm trying to pull similar data in from several third party APIs, all of which have slightly varying schemas, and convert them all into a unified schema to store in a DB and expose through a unified API. This is actually a re-write of a system that already does this, minus storing in the DB, but which is hard to test and not very elegant. I figured I'd turn to the community for some wisdom. Here are some thoughts/what I'd like to achieve.

  • An easy way to specify schema mappings from the external APIs schema to the internal schema. I realize that some nuances in the data might be lost by converting to a unified schema, but that's life. This schema mapping might not be easy to do and perhaps overkill from the academic papers I've found on the matter.
  • An alternative solution would be to allow third parties to develop the interfaces to the external APIs. The code quality of these third parties may or may not be known, but could be established via thorough tests.
  • Therefore the system should be easy to test, I'm thinking by mocking the external API calls to have reproducible data and ensure that parsing and conversion is being done correctly.
  • One of the external API interfaces crashing should not bring down the rest of them.
  • Some sort of schema validation/way to detect if the external API schemas have changed without warning
  • This will end up being integrated into a Django project, so it could be written as a Django app, which would likely make unit and integration testing easier. On the other hand, I would like to keep it as decoupled as possible from Django. Although the API interfaces would have to know what format to convert to, could this be specified at runtime?

Am I missing anything in the wishlist? Unrealistic? Headed down the wrong path? Would love to get some feedback.

I'm not sure if there are libraries/OS project which already do some of this. The less wheels I have to reinvent the better. Would any part of this be valuable as an OS project?

In the previous version I spawned a bunch of threads that would handle individual requests. Although I've never used it, I've been told I should look at gevent as a way to handle this.

Andres
  • 2,880
  • 4
  • 32
  • 38

1 Answers1

0

For your second bullet point you should check out Temboo. Temboo normalizes access to over 100 APIs, meaning that you can talk to them all using a common syntax in the language of your choice. In this case you would use the Temboo Python SDK - available here.

(Full disclosure: I work at Temboo)

Cormac Driver
  • 2,511
  • 1
  • 12
  • 10
  • Thanks, but I didn't find any of the APIs I was planning on using. It would be helpful however if you could share how you interact with them – Andres Apr 05 '13 at 08:04
  • Sure. In short, we create a process per API call, which allows us to abstract away the differences you find in individual APIs, and give our users an uniform interface to a huge range of APIs. Our SDKs then call these processes, which in turn talk to underlying APIs. This level of indirection allows us to do things like response format conversion and friendlier error handling/reporting that make it more pleasant to work with APIs. – Cormac Driver Apr 05 '13 at 12:45