0

I'm attempting to get hands on Kedro, but don't understand how to build my Data Fetcher (that I used before).

My Data is stored in a MongoDB instance over multiple “Tables”. One table are my usernames. First, I want to fetch them. Thereafter, based on the usernames I get, I would like to fetch Data from three “Tables” and merge them.

How should I do this best in Kedro?

Shall I put everything in a Custom Dataset? Fetch only the Usernames and do the rest in a Part of the pipeline?

corusm
  • 43
  • 6

1 Answers1

0

So this is an interesting one - Kedro has been designed in a way that the tasks have no knowledge of the IO that is required to provide/save the data. This (for good reasons) requires you to cross this boundary.

My recommendation is to go down the custom dataset, but potentially go a little further and make it return the 3 tables you need directly. I.e. do the username filter logic in this stage as well.

It also perfectly fine to raise a NotImplementedError on save() if you're not going do that.

datajoely
  • 1,466
  • 10
  • 13