I was trying to build a pipeline with luigi. First by getting data from an API, transform and then save it to a mongo db. I'm still new to luigi, my question is how do I implement the output() function which specifies outputs to a mongo db. And how would I create the require() function for subsequent tasks?
The first one, I was trying to attempt the demo here, but it's using MySql instead of mongodb. So I tried
from luigi.contrib.mongodb import MongoTarget
from pymongo import MongoClient
def output(self):
# connect to db
connection = MongoClient(self.host, self.port)
db_client = connection[self.db_name]
collection_name = 'myCollection'
return MongoTarget(db_client, '_id', collection_name)
but it gave me error like this:
TypeError: Can't instantiate abstract class MongoTarget with abstract methods exists
A quick search of the error seems like due to pyMongo, but that solution still doesn't fix it.
For the require part, I'm not sure how to approach it either, I would like to check on if the records existed alraedy so I don't duplicate them. But there is no unique index from my API data, so I guess I have to somehow scan over all the records to make sure there are no duplicates.
There isn't a lot of documentation or examples on using mongo with luigi, any help is appreciated.