0

I've a requirement to download a file from S3 based on a message content. In other words, the file to download is previously unknown, I've to search and find it at runtime. S3StreamingMessageSource doesn't seem to be a good fit because:

  1. It relies on polling where as I need to wait for the message.
  2. I can't find any way to create a S3StreamingMessageSource dynamically in the middle of a flow. gateway(IntegrationFlow) looks interesting but what I need is a gateway(Function<Message<?>, IntegrationFlow>) that doesn't exist.

Another candidate is S3MessageHandler but it has no support for listing files which I need for finding the desired file.

I can implement my own message handler using AWS API directly, just wondering if I'm missing something, because this doesn't seem like an unusual requirement. After all, not every app just sits there and keeps polling S3 for new files.

Abhijit Sarkar
  • 21,927
  • 20
  • 110
  • 219
  • Well, for `InputStream` you still can use `S3RemoteFileTemplate` and its `get()` function or `S3Session.readRaw()` if you definitely need to return the stream. Yes, we may consider to add `InputStream` support to `S3MessageHandler`, but that’s not a bug – Artem Bilan Dec 29 '17 at 21:22
  • @ArtemBilan I looked at `get` but it invokes `callback.doWithInputStream` and then closes the stream. Unlike `S3StreamingMessageSource`, there is no chance for the message to be transmitted downstream, whatever has to be done needs to be done in the callback. I'm going down the path of implementing a smart filter that I can use with `S3StreamingMessageSource` so that I don't have to reinvent the wheel. – Abhijit Sarkar Dec 29 '17 at 21:28
  • Ok. I see. Although that’s not event driven, it’s still pollable, however you can call `receive()` manually, indeed. The `readRaw()` from session should be good for you. – Artem Bilan Dec 29 '17 at 21:34

2 Answers2

1

There is S3RemoteFileTemplate with the list() function which you can use in the handle(). Then split() result and call S3MessageHandler for each remote file to download.

Although the last one has functionality to download the whole remote dir.

Artem Bilan
  • 113,505
  • 11
  • 91
  • 118
  • I looked and sadly it won't work. `S3MessageHandler` has no facility to stream the file, only download it. This seems to be a bug because it should be able to create a `Message` if the payload is not a `File` – Abhijit Sarkar Dec 29 '17 at 20:10
  • I made it work, see my answer. Also see [issue-82](https://github.com/spring-projects/spring-integration-aws/issues/82) – Abhijit Sarkar Dec 31 '17 at 08:20
0

For anyone coming across this question, this is what I did. The trick is to:

  1. Set filters later, not at construction time. Note that there is no addFilters or getFilters method, so filters can only be set once, and can't be added later. @artem-bilan, this is inconvenient.
  2. Call S3StreamingMessageSource.receive manually.

    .handle(String.class, (fileName, h) -> {
    if (messageSource instanceof S3StreamingMessageSource) {
        S3StreamingMessageSource s3StreamingMessageSource = (S3StreamingMessageSource) messageSource;
    
        ChainFileListFilter<S3ObjectSummary> chainFileListFilter = new ChainFileListFilter<>();
        chainFileListFilter.addFilters(
                new S3SimplePatternFileListFilter("**/*/*.json.gz"),
                new S3PersistentAcceptOnceFileListFilter(metadataStore, ""),
                new S3FileListFilter(fileName)
        );
        s3StreamingMessageSource.setFilter(chainFileListFilter);
    
        return s3StreamingMessageSource.receive();
    }
    log.warn("Expected: {} but got: {}.",
            S3StreamingMessageSource.class.getName(), messageSource.getClass().getName());
    return messageSource.receive();
    }, spec -> spec
        .requiresReply(false) // in case all messages got filtered out
    )
    
Abhijit Sarkar
  • 21,927
  • 20
  • 110
  • 219
  • The `MessageSorce` is not designed to be reconfigured at runtime. You may face some race condition when you access it concurrently. However your `addFilter` we can overcome with the custom `ChainFileListFilter` implementation where you can inject only this one into the `S3StreamingMessageSource` and implement your own add/remove logic there. – Artem Bilan Apr 19 '18 at 14:45
  • I think you still can achieve your `InputStream` requirement natural way without calling `MessageSource.receive()` manual. You need a `S3RemoteFileTemplate` and its `getSession()`. This `Session` has `InputStream readRaw(String source)` API. – Artem Bilan Apr 19 '18 at 14:50
  • @ArtemBilan If I use `Session`, I'll have to do the filtering myself. I didn't see any filters in `S3RemoteFileTemplate`. Also, that won't help me with error handling discussed [here](https://stackoverflow.com/q/49911076/839733) – Abhijit Sarkar Apr 20 '18 at 17:00
  • Well, you use `S3RemoteFileTemplate.list()` from the `.handle()` to get a list of S3 resources, then you use `.filter()` for splitted items, and then this mentioned `Session.readRaw()`. You can wrap this flow into the `.gateway()` and have a single point for error handling. – Artem Bilan Apr 20 '18 at 17:20