1

Is there a way I can skip loading certain rows if I deem the row invalid using the kiba-etl gem?

For example, if there is a validation that must be passed before I load it into the system or errors that occur and I still need to push the data into to sys regardless while logging the problem.

2 Answers2

1

Author of Kiba here! To remove a row from the pipeline, simply return nil at the end of a transform:

transform do |row|
  row_valid = some_custom_operation
  row_valid ? row : nil
end

You could also "write down" the offending rows, and report on them later using a post_process block like this (in this case, require a moderate to low number of bogus rows):

@bogus_row_ids = []

transform do |row|
  # SNIP
  if row_valid(row)
    row
  else
    @bogus_row_ids << row[:id]
    nil # remove from pipeline
  end
end

post_process do
  # do something with @bogus_row_ids, send an email, write a file etc
end

Let me know if this properly answers your question, or if you need a more refined answer.

Thibaut Barrère
  • 8,845
  • 2
  • 22
  • 27
0

I'm dumb. I realized you can just catch your errors within the transformation/loading process and return nil.

  • Don't blame yourself please :-) It's one possibility - I added more ideas in a separate answer, let me know if this correctly answer the question, which other may have, too! – Thibaut Barrère Oct 01 '15 at 19:03