My processing has a "condense" step before needing further processing:
Source: Raw event/analytics logs of various users.
Transform: Insert each row into a hash according to UserID.
Destination / Output: An in-memory hash like:
{
"user1" => [event, event,...],
"user2" => [event, event,...]
}
Now, I've got no need to store these user groups anywhere, I'd just like to carry on processing them. Is there a common pattern with Kiba for using an intermediate destination? E.g.
# First pass
source EventSource # 10,000 rows of single events
transform {|row| insert_into_user_hash(row)}
@users = Hash.new
destination UserDestination, users: @users
# Second pass
source UserSource, users: @users # 100 rows of grouped events, created in the previous step
transform {|row| analyse_user(row)}
I'm digging around the code and it appears that all transforms in a file are applied to the source, so I was wondering how other people have approached this, if at all. I could save to an intermediate store and run another ETL script, but was hoping for a cleaner way - we're planning lots of these "condense" steps.