From some row data for input in a poids_garmin_brut
topic:
Durée,Poids,Variation,IMC,Masse grasse,Masse musculaire squelettique,Masse osseuse,Masse hydrique,
" 14 Fév. 2022",
06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 13 Fév. 2022",
06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
" 12 Fév. 2022",
06:17,72.2 kg,0.0 kg,22.8,25.3 %,29.7 kg,3.6 kg,54.5 %,
I managed to create two topics poids_garmin_split_date
and poids_garmin_split_valeursPoids
with that small method:
public StreamsBuilder extraire(StreamsBuilder builder) {
KStream<Void,String> streamBrut = builder.stream("poids_garmin_brut");
// Les lignes qui débutent par " " portent des dates
streamBrut.filter((key, value) -> value.startsWith("\" ")).to("poids_garmin_split_date");
// celles qui ne débutent pas par " " ni ne contiennent "Durée" (header du csv) sont des données de poids.
streamBrut.filter((key, value) -> !value.startsWith("\" ") && !value.contains("Durée")).to("poids_garmin_split_valeursPoids");
return builder;
}
The topic poids_garmin_split_date
now contains:
" 14 Fév. 2022",
" 13 Fév. 2022",
" 12 Fév. 2022",
and poids_garmin_split_valeursPoids
:
06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
06:17,72.2 kg,0.0 kg,22.8,25.3 %,29.7 kg,3.6 kg,54.5 %,
Both topics have null = no key, but I need to add one to both of them, to link their content two by two :
123541, " 14 Fév. 2022",
123542, " 13 Fév. 2022",
123543, " 12 Fév. 2022",
and
123541, 06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
123542, 06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
123543, 06:17,72.2 kg,0.0 kg,22.8,25.3 %,29.7 kg,3.6 kg,54.5 %,
for example, so that I can merge these topics into a single one that would be:
123541, " 14 Fév. 2022",06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
123542, " 13 Fév. 2022",06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
123543, " 12 Fév. 2022",06:17,72.2 kg,0.0 kg,22.8,25.3 %,29.7 kg,3.6 kg,54.5 %,
that I can exploit.
If it's the good way that I should do things (I'm a beginner with Kafka),
how can I do this?
map
operation? transform
one?
From your answer, @OneCricketeer, I've attempted this:
KStream<Void,String> streamBrut = builder.stream("poids_garmin_brut");
// Les lignes qui débutent par " " portent des dates
final LongAccumulator compteurDate = new LongAccumulator(Long::sum, 0L);
streamBrut.filter((key, value) -> value.startsWith("\" "))
.map((key, value) -> {
compteurDate.accumulate(1L);
return new KeyValue<>(compteurDate.toString(), value);
})
.to("poids_garmin_split_date");
KStream<String, String> streamSplitDate = builder.stream("poids_garmin_split_date");
// celles qui ne débutent pas par " " ni ne contiennent "Durée" (header du csv) sont des données de poids.
final LongAccumulator compteurValeursPoids = new LongAccumulator(Long::sum, 0L);
streamBrut.filter((key, value) -> !value.startsWith("\" ") && !value.contains("Durée"))
.map((key, value) -> {
compteurValeursPoids.accumulate(1L);
return new KeyValue<>(compteurValeursPoids.toString(), value);
})
.to("poids_garmin_split_valeursPoids");
KStream<String, String> streamSplitValeursPoids = builder.stream("poids_garmin_split_valeursPoids");
streamSplitDate.join(streamSplitValeursPoids,
(String date, String valeursPoids) -> date + valeursPoids,
JoinWindows.ofTimeDifferenceWithNoGrace(Duration.ofMinutes(5)))
.to("poids_garmin_join_date_valeurs");
that is resulting to a topic poids_garmin_join_date_valeurs
having this content:
" 14 Fév. 2022",06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 13 Fév. 2022",06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
" 12 Fév. 2022",06:17,72.2 kg,0.0 kg,22.8,25.3 %,29.7 kg,3.6 kg,54.5 %,
" 11 Fév. 2022",05:54,72.2 kg,0.1 kg,22.8,25.6 %,29.7 kg,3.5 kg,54.3 %,
" 10 Fév. 2022",06:14,72.3 kg,0.0 kg,22.8,25.9 %,29.7 kg,3.5 kg,54.1 %,
" 9 Fév. 2022",06:06,72.3 kg,0.5 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 8 Fév. 2022",07:14,71.8 kg,0.7 kg,22.7,26.3 %,29.6 kg,3.5 kg,53.8 %,
But I don't know how much that manner of doing things is acceptable.