I have been reading the Spring XD documentation fairly heavily and can't really get to grips with two things I'd like to achieve in relation to Hadoop YARN.
Maybe they aren't supported yet or won't ever be supported - possibly because I am missing something which makes my scenarios non-sensical...
- In Hadoop YARN it is possible for the ApplicationMaster to request containers to be allocated on specific hosts i.e. 'rack awareness'. This allows processing to be performed close to where the data on HDFS is stored.
Can this kind of functionality be exposed as an evaluated property in a stream Deployment Manifest?
Note that I am not talking about partitioned streams where the same containers handle the same messages for all modules in the stream.
I want to have many instances of a module in the middle of the stream deployed on a set of containers - these containers would also by holding segments of my pre-existing large static data. I want the most appropriate module instance to be selected for each invocation based on an evaluated rule which involves mapping each message being processed to an associated pre-existing large file stored on one of the containers.
- Is it possible to dynamically 'scale out' the deployment of one module across more containers once the stream is deployed. For instance if one module in the stream is proving to be a bottleneck once the stream is deployed can the number of instances of that module be dynamically increased across more containers?
Thanks Nick