Alpakka and S3 truncating downloaded files

Question

I have a simple piece of code, based on alpakka examples, which should download some file from S3 for further processing:

S3.download(bucket, file)
  .runWith(Sink.head)
  .flatMap {
    case Some((data, _)) =>
      data.map(_.utf8String).runWith(Sink.head).map(Some.apply)
    case None =>
      Future.successful(None)
    }

The problem is that the file content is getting truncated, the file size from the ObjectMetadata is correct it has ~2M it isn't a huge file.

What I noticed is, when I use Sink.head the file content is from the beginning to middle if I change it to Sink.last it is from the middle to the end. Am I getting chunks from the file but why are they not seem to be streamed?

Can't figure out what's happening and how to fix this. I believe the issue is the same as this other question, unfortunately without answers.

Thanks

score 1 · Accepted Answer · answered Mar 18 '20 at 17:41

1

I've found the solution, in the end it was very clear...

Just need to replace: data.map(_.utf8String).runWith(Sink.head).map(Some.apply)

with: data.map(_.utf8String).runWith(Sink.seq).map(_.mkString).map(Some.apply)

accumullating all the chunks from the file.

Thanks

answered Mar 18 '20 at 17:41

bolo

11
3

You can have a look at [Benji S3](https://zengularity.github.io/benji/s3/usage.html) (I'm a contributor of) – cchantep Mar 18 '20 at 19:48

score 0 · Answer 2 · answered Sep 26 '22 at 13:59

0

This approach took a lot of time in my case. Since i need to accumulate all the byteString and then put it in zip. tested for file size 6 MB and it worked great.

Use

data.reduce(_ + _).map(bytStr => {logic...})

answered Sep 26 '22 at 13:59

rajatag03

13
8

Alpakka and S3 truncating downloaded files

2 Answers2

Linked