3

I am saving zip files to an AWS S3 bucket. I am now trying to create a C# .NET API that will allow me to download a specified key from the bucket and save it to a HttpResponseMessage in the Content key.

I've referred to the following question to set up my response for zip files: How to send a zip file from Web API 2 HttpGet

I have modified the code in the previous question so that it instead reads from a TransferUtility stream.

Problem is I am coming into an error when trying to extract or view the file that looks like the following: Error message when attempting to extract zip

The response I am getting back from the API looks like: enter image description here

The relevant code looks like:

[HttpGet, Route("GetFileFromS3Bucket")]
public HttpResponseMessage GetFileFromS3Bucket(string keyName)
{
    HttpResponseMessage response = new HttpResponseMessage();
    string bucketName = "myBucket";
    RegionEndpoint bucketRegion = RegionEndpoint.ARegion;
    IAmazonS3 s3Client;
    s3Client = new AmazonS3Client(bucketRegion);

    try
    {
        var fileTransferUtility = new TransferUtility(s3Client);
        var stream = fileTransferUtility.OpenStream(bucketName, keyName);
        response.Content = new StreamContent(stream);
        response.Content.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("attachment");
        response.Content.Headers.ContentDisposition.FileName = keyName + ".zip";
        response.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/zip");
        response.StatusCode = HttpStatusCode.OK;
    }
    catch (Exception e)
    {
        response.Content = new StringContent("Something went wrong, error: " + e.Message);
        response.StatusCode = HttpStatusCode.InternalServerError;
    }

    return response;
}

Results of troubleshooting:

  • The file from the Web API comes out with nearly double the expected size based on what is in S3. This is consistent across different files
  • Changing the bucket to be publicly accessible did not help (setting since reverted to not allowing public access)
  • Changing the file type to XML did not display a nicely formatted error (there was a suggestion that you may receive an XML response if an error was provided from S3)
  • Saving the S3 stream to a file and then saving directly to a file resulted in the correct file size. Seems safe to say the stream from S3 is not the problem

It appears that there ia a problem with the way the HTTPResponseMessage is handling the zip file. I'm unsure of whether it is actually on the server side, or whether it is on the client to parse the data and Swagger is simply incapable of doing that. Any help would be greatly appreciated.

Update 1 I do not believe this string is Base64 encoded as the result I got from converting the stream to a string is the following: enter image description here

I've updated the code sample with the two lines showing the conversion from a stream to string.

Update 2 I've confirmed the issue is with how the response is handling the stream, or something in the response itself. Downloading the file stream from S3 and saving to a new file on the local computer resulted in a valid file that opened as expected.

Update 3 Link to GDrive folder with testing files: https://drive.google.com/drive/folders/1q_N3NTHz5E_nebtBQJHor3HfqUZWhGgd?usp=sharing I unfortunately can't provide access to the original file as it contains sensitive data. The provided files are still causing the same problem however. Interesting to note that the test file came out looking like:

Showing issue with underscores on either side of file name

The underscores on either side of the filename are quite strange.

I am running the following relevant packages:

Update 4 I've found the following UTF8 references in various files:

File: configuration91.svcinfo Possilbe UTF-8 encoding issue

I could not find anything that said anything about 'responseEncoding' anywhere in the project.

Stevo
  • 397
  • 6
  • 26
  • Looks like fie is either GZIP or a Base64 String. Both would be larger than original file since the binary is being packed into readable ASCII characters. – jdweng Aug 03 '20 at 12:36
  • @jdweng Would this cause the file to become corrupt? If so, how can I ensure that the output from S3 (or from the Web API, I'm unsure where it would be breaking) is in the .zip format I'd be hoping to provide in my response content? – Stevo Aug 03 '20 at 12:41
  • 2
    If you are getting a Base64 string then you need to use byte[] data = Convert.FromBase64String(string) and then save bytes as binary to a file. – jdweng Aug 03 '20 at 12:45
  • @jdweng Cheers for the reply, I've done some testing and confirm that the data being returned is neither GZIP not a Base64 string. I'm a bit concerned that perhaps there is an issue with the way that the AWS library is handling the data. – Stevo Aug 05 '20 at 11:14
  • Best to look at the data in a sniffer like wireshark or fiddler. The triangles with question mark usually indicate you are using the wrong encoding method. With HTTP that a lots of ways to format messages. 1) Usually HTTP is text and binary must be converted with Base64 string 2) You can have GZIP which is compressed and uses Base64 string 3) You can have a MIME attachment which can be binary. You can have http 1.0 which is stream mode where you get everything in one chunk and you can have http 1.1 which is chunk mode and you need to send a Next Chunk message to get each chunk. – jdweng Aug 05 '20 at 11:46
  • You have an attachment which is GZIP using chunk mode (not the GZIP in a body). – jdweng Aug 05 '20 at 11:47
  • @Stevo the stream is being read once by the reader, which will move the pointer to the end of the stream. The stream is then passed as content to the response. With the pointer at the end of the stream the .response content will be an invalid zip file. – Nkosi Aug 05 '20 at 23:54
  • @Nkosi My apologies, that is actually not in my current code, but you are quite right. I will remove it for clarity. The result is actually that the file size is twice as large as expected, which jdweng believes may have something to do with it being recompressed or something similar. – Stevo Aug 05 '20 at 23:56
  • @Stevo what framework version are you using, Wep API 2.x or Core Web API? – Nkosi Aug 06 '20 at 00:01
  • @Stevo update the shown example with the action/method that encapsulates the code. – Nkosi Aug 06 '20 at 00:06
  • @Nkosi I have updated the code sample. I am NOT using Core Web API. – Stevo Aug 06 '20 at 00:34
  • I am late to this party, but if you look at the dump, it starts with `PK`, which is `Phil Katz` initials, the inventor of the Zip file. So, it's not coming down encoded with Base64 or compressed with anything like gzip... it is a genuine Zip file. – Andy Aug 06 '20 at 02:46
  • 2
    If you do a small zip -- like 5KB zip, does it still double in size? If you could do that, then post the original zip file along with the one that gets "doubled in size", I am pretty sure i could tell you what's wrong. – Andy Aug 06 '20 at 03:38
  • Also check this link: https://stackoverflow.com/a/49361668/2599508 see if it helps – John Aug 06 '20 at 10:08

1 Answers1

4

I am going to throw an answer up, because what's happening to you is unorthodox. I use S3 for many things and have done what you are doing with no problems in the past. To ensure that I am mimicking what you are doing, I duplicated your code:

[HttpGet, Route("GetFileFromS3Bucket/{keyName}")]
public HttpResponseMessage GetFileFromS3Bucket(string keyName)
{
    string bucketName = "testzipfilesagain";
    string awsAccessKey = "AKIAJ********A3QHOUA";
    string awsSecretKey = "IYUJ9Gy2wFCQ************dCq5suFS";

    IAmazonS3 client = new AmazonS3Client(awsAccessKey, awsSecretKey, RegionEndpoint.USEast1);

    var fileTransferUtility = new TransferUtility(client);
    var stream = fileTransferUtility.OpenStream(bucketName, "md5.zip");

    var resp = new HttpResponseMessage();

    resp.Content = new StreamContent(stream);
    resp.Content.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("attachment");
    resp.Content.Headers.ContentDisposition.FileName = keyName + ".zip";
    resp.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/zip");
    resp.StatusCode = HttpStatusCode.OK;

    return resp;
}

These are the packages I have installed:

  <ItemGroup>
    <PackageReference Include="AWSSDK.S3" Version="3.3.111.37" />
    <PackageReference Include="Microsoft.AspNetCore.Mvc.WebApiCompatShim" Version="2.2.0" />
    <PackageReference Include="Swashbuckle.AspNetCore" Version="5.5.1" />
  </ItemGroup>

Everything runs perfectly well.

Trying to troubleshoot your code is going to be fruitless because it works perfectly fine, but there is something wrong with your environment.

So this isn't an answer to your question, but a answer to how you can try to solve the issue at hand and get past this.

  1. Make sure your nuget packages are up to date
  2. Do you have any middleware injected in your pipeline? If so, what?
  3. Post your startup.cs -- maybe something is out of order in your Configure routine.
  4. Could you start a brand new project and try your code in that?
  5. Can you try a small 5KB zip file and post the original and the corrupt so we can look?

I would love to get to the bottom of this as I really like to solve these types of problems.


EDIT 1

So I looked at the zip files and they have been run through a UTF8 encoding process. So, if you take your original zip file, and run this code on it:

    var goodBytes = File.ReadAllBytes("Some test to upload to S3.zip");
    var badBytes = File.ReadAllBytes("_Some test to upload to S3.zip.zip_");

    File.WriteAllText("Some test to upload to S3.zip.utf8", Encoding.UTF8.GetString(goodBytes));
    var utf8EncodedGoodBytes = File.ReadAllBytes("Some test to upload to S3.zip.utf8");

    var identical = badBytes.SequenceEqual(utf8EncodedGoodBytes);

It the results are:

bad bytes

I am going to do some research and figure out what could be causing your stream to become UTF-8 encoded. Is there anything in your config that looks like this? Can you search your entire solution for anything that resembles "utf" or "utf8" or "utf-8"?

Andy
  • 12,859
  • 5
  • 41
  • 56
  • Hi Andy, thanks for the answer, I'll address all those points now and let you know the results. Thanks a tonne, I'm glad someone out there likes configuration stuff, I hate it as I'm prone to finding novel ways to bugger it up – Stevo Aug 06 '20 at 10:44
  • OK, to address these points: 1. Nugets were pretty out of date, scrub move on my part. This did not resolve the issue however 2. No middleware 3. I don't actually have a startup.cs as I am not running .NET Core (please correct me if I have misunderstood this entirely) 4. I'll let you know the results of this shortly 5. Link in Question – Stevo Aug 06 '20 at 11:44
  • This is looking very promising! Out of interest, how did you discover they were UTF8 encoded? Just wondering for my own troubleshooting in the future. Additionally, what encoding are the files meant to be initially? I am going to look to see if there is a way I can force the encoding type of the content for a given HttpResponseMessage – Stevo Aug 06 '20 at 22:13
  • 2
    @Stevo -- i opened it in a hex editor and saw the bytes `0xef 0xbf 0xbd` sequence all over the file which is the UTF-8 token. Anyway, you should really search your solution for the string "utf-8" and see if anything comes up in a config. – Andy Aug 06 '20 at 22:15
  • Oh my apologies, I updated my question and forgot to mention it. Please see the image and notes in Update 4 – Stevo Aug 06 '20 at 22:18