There are various & all quite idiomatic ways to achieve this:
At the source level (passing an array of headers)
You can use CSV
without headers: true
, which offers the opportunity to finely check the headers:
class CSVSource
def initialize(filename:, csv_options:, expected_headers:)
# SNIP
def each
CSV.foreach(filename, csv_options).with_index do |row, file_row_index|
if file_row_index == 0
check_headers!(actual: row.to_a, expected: expected_headers)
next # do not propagate the headers row
else
yield(Hash[expected_headers.zip(row.to_a)])
end
end
end
def check_headers!(actual:, expected:)
# SNIP - verify uniqueness, presence, raise a clear message if needed
end
At the source level (letting the caller define the behaviour using a lambda)
class CSVSource
def initialize(after_headers_read_callback:, ...)
@after_headers_read_callback = ...
def each
CSV.foreach(filename, csv_options).with_index do |row, file_row_index|
if file_row_index == 0
@after_headers_read_callback.call(row.to_a)
next
end
# ...
end
end
The lambda will let the caller define their own checks, raise if needed etc, which is better for reuse.
At the transform level
If you want to further decouple the components (e.g. separate the headers handling from the fact that rows come from a CSV source), you can use a transform.
I commonly use this design, which allows for better reuse (here with a CSV source which will yield a bit of meta-data):
def transform_array_rows_to_hash_rows(after_headers_read_callback:)
transform do |row|
if row.fetch(:file_row_index) == 0
@headers = row.fetch(:row)
after_headers_read_callback.call(@headers)
nil
else
Hash[@headers.zip(row.fetch(:row))].merge(
filename: row.fetch(:filename),
file_row_index: row.fetch(:file_row_index)
)
end
end
end
What's not recommended
In all cases, avoid doing any processing in Kiba.parse
itself. It's a better design to ensure IO will only occur when you are calling Kiba.run
(since it will be more future-proof and will support introspection features in later versions of Kiba).
Also, using pre_process
isn't recommended (while it will work), because it will lead to a bit of duplication etc.
Hope this helps, and let me know if this isn't clear!