I'm trying to create a custom DataSet class within the kedro framework. I need some help understanding how to combine values from the credentials.yml file.
- what is the kedro way of handling the 'mongo_url' property in the catalog entry? how do i map the values from credentials to the catalog entry?
- what does the class init method look like?
catalog.yml
rss_feed_load:
type: kedro_workbench.extras.datasets.RSSDataSet.RSSFeedLoad
mongo_url: "mongodb+srv://<username>:<password>@bighatcluster.wamzrdr.mongodb.net/"
mongo_db: "TBD"
mongo_collection: "TBD"
mongo_table: "TBD"
credentials: mongo_atlas
credentials.yml
mongo_atlas:
username: <username>
password: <password>
my first attempt, not sure
class RSSFeedLoad(AbstractDataSet):
def __init__(self, mongo_url: str, mongo_db: str, mongo_collection: str, mongo_table: str, credentials: Dict[str, Any], data: Any = None):
self._data = data <- this is a list of dictionaries coming from previous node, not sure if I pass in the data when instance is created or in the _load() method.
self._mongo_url = mongo_url <- where do I build the string that gets passed here
self._mongo_db = mongo_db
self._mongo_collection = mongo_collection
self._mongo_table = mongo_table
self._username = credentials['username'] <- do I need/is it a bad idea to store the username/password in the class attributes?
self._password = credentials['password']
where does this custom class get called? do I reference it in the outputs='' of a node?
node(
func=load_rss_feed,
inputs='processed_rss_items',
outputs='rss_feed_load',
name="load_rss_feed",
)
this is my first attempt building a custom DataSet so not very sure if I'm doing things correctly. thanks very much. :)