-2

I am trying to set up a flume agent that is able to recognize two specific codes for currency: EUR or USD. I also need to keep (but not identify) any other string.

I am following a very good example. My configuration is as follows:

a.sources.s.interceptors = i
a.sources.s.interceptors.i.type = regex_extractor
a.sources.s.interceptors.i.regex = ^(EUR)|^(USD)|^(?!EUR|USD).*
a.sources.s.interceptors.i.excludeEvents = false
a.sources.s.interceptors.i.serializers = t
a.sources.s.interceptors.i.serializers.t.name = currency


a.sources.s.selector.type = multiplexing
a.sources.s.selector.header = currency
a.sources.s.selector.mapping.EUR = EUR_cnl
a.sources.s.selector.mapping.USD = USD_cnl
a.sources.s.selector.mapping.GNL = GNL_cnl

My problem is that the RegEX seems only to work for EUR or whichever is first. In fact, I am not sure if my mapping is correct for the "catch the rest" option.

Any suggestion is welcome.

Thank you.

Wilmar
  • 558
  • 1
  • 5
  • 16
  • 1
    What exactly are you trying to match, and when a match happens, what do you want to happen? – Tim Biegeleisen Oct 21 '18 at 06:33
  • Hi @TimBiegeleisen, I am trying to match USD, EUR, 'anything else'. After I will push each entry to an HDFS location. I was looking at Corion's answer and I did not know I needed more than one serializer. I will give that a try. The example I was following only shows one serializer for two possible matches. I am still trying to wrap my head around this. Thanks for the reply! – Wilmar Oct 21 '18 at 15:23

1 Answers1

0

The Flume serializer will assign the capture groups to the serializers in order. You are only seeing the first result because you only have one serializer defined. You have two options:

Capture all currencies within one capture group

a.sources.s.interceptors.i.regex = ^(EUR|USD|(?:!EUR|USD).*)

Alternatively capture all currencies separately and add a serializer for each capture group

a.sources.s.interceptors.i.regex = ^(EUR)|^(USD)|^(?!EUR|USD).*
a.sources.s.interceptors.i.excludeEvents = false
a.sources.s.interceptors.i.serializers = t1 t2 t3
a.sources.s.interceptors.i.serializers.t1.name = currency1_EUR
a.sources.s.interceptors.i.serializers.t2.name = currency1_USD
a.sources.s.interceptors.i.serializers.t3.name = currency2_XXX
Corion
  • 3,855
  • 1
  • 17
  • 27
  • Hi @Corion, good point. That is a good hint. So, let us say I decide to go for one serializer for each capture. How would I go after that to assign them to the channels after that? Something like this? (my apologies, I have no idea how to format this better). `a.sources.s.selector.type = multiplexing` `a.sources.s.selector.header = currency1_EUR, currency1_USD currency2_XXX` `a.sources.s.selector.mapping.currency1_EUR= EUR_cnl` `a.sources.s.selector.mapping.currency1_USD = USD_cnl` `a.sources.s.selector.mapping.currency2_XXX= GNL_cn` – Wilmar Oct 21 '18 at 15:38
  • I wouldn't go for more than one serializer. – Corion Oct 21 '18 at 19:45
  • Thank you @Corion. I have managed to have it partially working with one serializer as per you suggestion. `a.sources.s.interceptors.i.regex = ^(EUR|USD|(?!EUR|USD).*)` `a.sources.s.interceptors.i.serializers = t` `a.sources.s.interceptors.i.serializers.t.name = currency` `a.sources.s.selector.type = multiplexing` `a.sources.s.selector.header = currency` `a.sources.s.selector.mapping.EUR = EUR_cnl` `a.sources.s.selector.mapping.USD = USD_cnl` a.sources.s.selector.mapping.GNL = GNL_cnl` It works well for USD and EUR but no the "catch the rest"work. Any suggestion? Thank you – Wilmar Oct 21 '18 at 22:43