I'm writing a kafka streams app in which I'm producing statistics for web pages. I have a stream of information about web pages which includes the page type (news, gaming, blog, etc.) and the page language (en, fr, ru, etc.) in a struct.
I've filtered this stream to a 2nd stream which includes all languages for a specific page type. For this example, we can assume that the filtered stream includes all events of the "news" pages.
I would now like to output to a topic the value a of the amount of pages per language divided by the total amount of pages of the same type.
I used .count() to create a KTable which counts the events per language . I also used the .count() to create a KTable which includes all events of the same type.
In order to produce the division, I was planning to use a join between the stream which will take the left value and divide it by the right value. Unforauntely, this doesn't seem to work as the left value's keys are the language and the right value's key is the page type.
My code is as following:
ValueJoiner<Long, Long, Float> valueJoiner = (leftVal, rightVal) -> {
if ((rightVal != null) && (leftVal != null))
{
return leftVal.floatValue()/rightVal;
}
return 0f;
};
// the per language table for news pages
KTable<String, Long> langTable = newsStream.selectKey((ignored, value) -> value.getLang()).groupByKey().count();
// the table which counts all events of news pages
KTable<String, Long> allTable = newsStream.groupBy((ignored, value) -> value.getType()).count();
// this is the join that doesn't produce values (as there are no common keys?)
KTable<String, Float> joinedLangs = langTable.join(allTable, valueJoiner);
What would be the best way to make this code work and produce the relative amount values?