0

I'm trying to implement the following ETL job in Kiba, in the context of a Rails app:

  • For a given local database record
  • Search its name using a remote application API (Evernote)
  • For each record found with the API, get the GUIDs of all the tags used on this record and consolidate this in a single array
  • Using the same API, retrieve the full list of tags used, in order to get their labels from the GUID
  • Process these tags (keep, ignore, replace) based on internal business logic
  • Save these tags on the local database record

My question is: how would you model the sources in this example?

The starting point is the local database record, but the real data comes from two calls to the API (1/ records returned by the search and 2/ full tags list).

Thanks!

Spone
  • 1,324
  • 1
  • 9
  • 20

1 Answers1

1

You can modelize it that way: assuming you get the list of ids of database records in some variable (e.g. people_ids), you can have something like as a first iteration:

Kiba.parse do
  source Kiba::Common::Sources::Enumerable, -> { people_ids }

  transform do |person_id|
    Enumerator.new do |y|
      # here some HTTP query for search, retrieval of tags etc, and 
      # you would yield each result in a loop
      y << {person_id: person_id, tags: [...] }
    end
  end

  transform Kiba::Common::Transforms::EnumerableExploder
  # SNIP
end      

As for the tags, if they are already in your database and in small volumes, you would load them early on to construct some form of lookup table:

tags = {}
Kiba.parse do
  pre_process do
    # here preload the tags & data required to decide on your business logic
  end
  # SNIP
end

This way you can associate later in the pipeline each incoming tag, with the proper tag id in your database, and simply use ActiveRecord to insert the relation.

Alternatively, if you want an already built solution for fast inserts/upserts with vendor support by us, you can also leverage Kiba Pro, hence supporting the development of the Open-Source version too!

Hope this properly answers your question, feel free to comment more otherwise!

Thibaut Barrère
  • 8,845
  • 2
  • 22
  • 27