1

What I would like to achieve:

In Swift 3.0, I am currently trying to generate a large XML file that I want to send directly to a webserver via a HTTP POST request. Because this XML file can get very large, I do not want to store it entirely in memory, or first write it to disk and then read it again line-by-line when sending it to the server.

I have implemented the class that generates the XML file in such a way that it can write to an OutputStream. This way, it doesn't matter whether that stream points to a file on disk, a Data object in memory, or (hopefully) the body of an HTTP POST request.

What I plan to do:

After scouring the (somewhat scarce) Swift documentation for the URLSession and Stream classes and its accomplices, I settled on using a URLSession.uploadTask(withStreamedRequest) task. This request requires an InputStream to be delivered through one of the delegate methods:

urlSession(_ session: URLSession, task: URLSessionTask, needNewBodyStream completionHandler: @escaping (InputStream?) -> Void)

Within this callback, I bind an InputStream and OutputStream using Stream.getBoundStreams(), after which I pass the OutputStream to the class that generates the XML and return the InputStream from the delegate method. The delegate method thus looks as follows:

func urlSession(_ session: URLSession, task: URLSessionTask, needNewBodyStream completionHandler: @escaping (InputStream?) -> Void)
{
    //Create the input and output stream and bind them, so that what the 
    //output stream writes ends up in the buffer of the input stream.
    var input: InputStream? = nil
    var output: OutputStream? = nil

    let bufferSize: Int = 1024
    Stream.getBoundStreams(withBufferSize: bufferSize, inputStream: &input, outputStream: &output)

    //This part is not really important for you, it starts the generation of 
    //the XML, which is written directly to the output stream.
    let converter = DatabaseConverterXml(prettyPrint: false)
    let type = ConverterTypeSynchronization(progressAlert: nil)

    type.convert(using: converter, writingTo: [Writable.Stream(output!)])
    {
        successfull in
        print("Conversion Complete! Successfull: \(successfull)" )
    }

    //The input stream is then handed over via the 
    //completion handler of the delegate method.
    completionHandler(input!)
}

The problem I'm experiencing:

Sometimes, the class generating the XML can take a little while before it writes the next line to the OutputStream. If this happens for too long, the InputStream may read so much that it actually clears its entire buffer. When this happens, somehow, the URLSession framework (or perhaps the URLSessionUploadTask itself), thinks the request is now finished and "submits" or "finalizes" it. This is a guess, however, as I am not sure of the inner workings of these classes (and the docs don't seem to help me much). This causes my webserver to receive an incomplete XML file and return a 500 Internal Server Error.

My question:

Is there any way that I can stop the request from finalizing early? Preferably, I would like to "finalize" the input stream in the callback of the type.convert call, as I know with certainty at that point that no more writes will occur (and the OutputStream is in fact closed).

Bonus points:

Is this the right way to approach the problem I am trying to solve? Is there perhaps any way I can directly interact with a stream that writes to the HTTP body? I feel very lost in this URLSession framework and it has taken me a day and a half to get this far, so any advice is extremely appreciated. I'll buy anyone who is able to help me out with this a beer or two!

Thanks in advance for any help!

Edit 1:

As @dgatwood pointed out, some of the variables are not retained properly. I've made the following changes to make sure that they do:

var mInput: InputStream? = nil
var mOutput: OutputStream? = nil
var mConverter: DatabaseConverterXml? = nil
var mType: ConverterTypeSynchronization? = nil

func urlSession(_ session: URLSession, task: URLSessionTask, needNewBodyStream completionHandler: @escaping (InputStream?) -> Void)
{
    //Create the input and output stream and bind them, so that what the 
    //output stream writes ends up in the buffer of the input stream.
    let bufferSize: Int = 1024
    Stream.getBoundStreams(withBufferSize: bufferSize, inputStream: &mInput, outputStream: &mOutput)

    //This part is not really important for you, it starts the generation of 
    //the XML, which is written directly to the output stream.
    mConverter = DatabaseConverterXml(prettyPrint: false)
    mType = ConverterTypeSynchronization(progressAlert: nil)

    mType.convert(using: mConverter, writingTo: [Writable.Stream(mOutput!)])
    {
        successfull in
        print("Conversion Complete! Successfull: \(successfull)" )
    }

    //The input stream is then handed over via the 
    //completion handler of the delegate method.
    completionHandler(mInput!)
}
Teun Kooijman
  • 1,132
  • 2
  • 10
  • 23
  • IIRC, when you use a body stream, the session uses chunked encoding (because it can't know the length ahead of time) and sends out chunks of data as it gets them, up until the writer closes the stream. Does your server support chunked encoding properly? Is your writer closing the stream early? – dgatwood Jan 11 '17 at 05:02
  • Yes I think the server side is fine, as it works correctly for the Android application (where I am in control of the streams myself). Here, the URLSession framework is in control of the inputstream. The framework also takes care of opening the stream for me, so i suppose it is also closing it (too early that is). If only I could find a way for the URLSession framework not to close the stream until I want it to, I think I'd be golden. Sadly the docs are very limited, and all the source code is simply a facade to the Objective C code underneath. – Teun Kooijman Jan 11 '17 at 08:39
  • I'm not talking about that end of the stream. NSURLSession should not be closing the reading end of the pair until your code closes the writing end or the connection times out. Is something retaining all those objects you created in this function? – dgatwood Jan 11 '17 at 16:54
  • Yes, and the retaining object does in fact close the output stream at some point (after it has written all of the XML). It does not appear to make a difference, however, whether I close the stream or not - the same thing keeps happening either way. Where did you learn that the URLSession should not be closing the reading end of the pair until the writing end is closed or the connection times out? If there is some documentation that I have missed, I would love to know where I can find it! – Teun Kooijman Jan 11 '17 at 17:00
  • Where I learned this was from conversations with the engineering team when I wrote the docs. Just to clarify, I wasn't asking what was retaining the stream; obviously the converter object does that. The question was what is retaining the converter object? Have you tried adding a dealloc method to make sure it isn't going away early? – dgatwood Jan 11 '17 at 18:33
  • Ahh, I'm sorry - I misunderstood you then. It is indeed not the case that the converter object is retained by anything. It did not occur to me that ARC would deallocate it. I have just tried extracting the converter, type, input and output variables to be member properties of the class, and it seems now to write the data until the buffer is full, but the reading end doesn't seem to extract anything from it. After 10 or so seconds the request then times out. I'll edit the question to reflect the changes that I've made. Thanks a bunch anyway for getting me so far, I feel like an idiot for this! – Teun Kooijman Jan 11 '17 at 18:55
  • No worries. Now make sure you explicitly close the stream when done. – dgatwood Jan 11 '17 at 18:57
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/132938/discussion-between-teun-kooijman-and-dgatwood). – Teun Kooijman Jan 11 '17 at 19:03

1 Answers1

2

Short answer after a little bit of follow-up in chat:

  • The object that does the writing wasn't being retained, so when it gets released, it releases the stream, which closes it.
  • The object that does the writing, even though it checked hasSpaceAvailable properly, didn't detect the short write (because less space was available than the object being written), so data got lost at each write call.
  • The object that does the writing did not close the stream at the end.

These are, incidentally, pretty much the canonical things that folks do wrong when using stream-based networking APIs. I've made similar mistakes myself when working with related Foundation-level socket APIs.

IMO, it would make a lot more sense for the API to just buffer one object regardless of its length, and then send a space available message if it still had room in the socket's buffer. That wouldn't require any changes to existing clients, and would cause many fewer headaches... but I digress.

dgatwood
  • 10,129
  • 1
  • 28
  • 49