0

I've got a simple Trident Topology running in a LocalDRPC where one of the functions outputs the result field, but when I run it the results I get back seem to be all the information from every tuple, instead of just the result field as I would have expected given the DRPC docs. Eg:

[["http:\/\/www.smbc-comics.com\/rss.php",http://www.smbc-comics.com/rss.php,[#document: null],[item: null],[link: null],[description: null],http://feedproxy.google.com/~r/smbc-comics/PvLb/~3/CBpJmAiJSxs/index.php,http://www.smbc-comics.com/comics/20141001.png,"http:\/\/www.smbc-comics.com\/comics\/20141001.png"], ...]

It would be okay to get all the information from every tuple back, but there's no indication of which of the fields is called result. As it stands it's not even valid JSON!

So how can I extract the value that corresponds to a specific field that I specified in the topology?

Peter Szanto
  • 7,568
  • 2
  • 51
  • 53
teryret
  • 240
  • 2
  • 11
  • what you receive depends on how you build your topology. For example if you put a aggregate at the end then it will only return one field e.g. .aggregate(new Count(), new Fields("count")). The question is what do you want? How does your current topology look like? – Peter Szanto Oct 03 '14 at 13:10
  • That's the trick, in my application the user supplies a higher level description that compiles down to a topology, so it's not fixed. I can put constraints on them like "you have to populate a field called result, and that's what you'll get back", but things like "you need to structure things so that an aggregate is the last operation" is way too leaky of an abstraction to pass muster. – teryret Oct 03 '14 at 13:36

1 Answers1

1

Storm returns every field that was processed during the execution chain in a Json array. The order of the values are the same as they were processed, so if you are interested in the result of only the last function then you should read only the last value from the array. If for any reason you are not interested in the intermediate results then you can limit it with the projection method. For example if you have a stream :

stream.each(new Fields("args"), new AddExclamation(), new Fields(EX_1))
    .each(new Fields(EX_1), new AddPlus(), new Fields(P1, P2));

that returns

[["hello","hello!1","hello!1+1","hello!1+2"],["hello","hello!2","hello!2+1","hello!2+2"]]

then by setting projection, you can limit to P2

stream.each(new Fields("args"), new AddExclamation(), new Fields(EX_1))
    .each(new Fields(EX_1), new AddPlus(), new Fields(P1, P2))
    .project(new Fields(P2));

so the output will be only this

[["hello!1+2"],["hello!2+2"]]

You can see this in action here :

https://github.com/ExampleDriven/storm-example/blob/master/src/test/java/org/exampledriven/ExclamationPlusTridentTopologyTest.java

Peter Szanto
  • 7,568
  • 2
  • 51
  • 53
  • That part I'm cool with. My question is (going with your database metaphor) how do I select a specific column? – teryret Oct 03 '14 at 15:13
  • I think you currently return multiple tuples with multiple Field definitions. If you want a single field then your topology should end with something like ".each(new Fields("input"), new SomeBoltThatReturnsASingleTuple(), new Fields("output")". You can also filter what you return by adding a filter to the end. As an example see the storm.trident.operation.builtin.FilterNull – Peter Szanto Oct 03 '14 at 15:24
  • That's not what filter does. Filter filters out tuples, not fields. Trident filters are like SQL's WHERE clauses. – teryret Oct 03 '14 at 15:49
  • agree, so the point is that the last bolt needs to return a single Tuple and the last bolt must define a single field. – Peter Szanto Oct 03 '14 at 19:09
  • I don't care how many tuples come out, I just need to be able to either A) only get a single field of my choosing or B) get some sort of map that associates fields and their names. It seems like the latter would be the default, I guess there are probably performance concerns or something. – teryret Oct 06 '14 at 13:46
  • it is difficult to talk about code without seeing it, so I updated my answer – Peter Szanto Oct 07 '14 at 08:15
  • That's a great update, I think it will help to clarify our mismatch. Note 1 does indeed highlight the problem I'm trying to solve. In your example the only extraneous field is the input, but imagine if the tuple had flowed through a half dozen functions on its way through the topology, then you'd get a result like the one in my question, where each intermediate state is returned and it's not clear which field should be returned to the user. – teryret Oct 07 '14 at 13:15
  • ok, at least I understand your problem now :) I updated the post accordingy. – Peter Szanto Oct 08 '14 at 08:58
  • Aha, `project`! That's exactly what I needed, thanks! I'd +1 you but apparently it takes 15 rep. – teryret Oct 08 '14 at 12:43