0

I'm trying to upload a file with XHR request using PUT method with Sinatra. My first idea was to upload the file and writing the stream directly into a MongoDB GridFS

@fs.open("test.iso", "w") do |f|
  f.write request.body.read
end 

It works, but, it loads the entire file into the RAM and it write it into the MongoDB GridFS. I'd like to avoid this behavior by writing it continuously to the GridFS (stream the file, not loading it and put it in the GridFS) without loading the entire file into the RAM : because for huge files (like 1GB or more) it's clearly a bad practice (due to RAM consumption).

How can I do that ?

EDIT : Can I have Sinatra / Rack not read the entire request body into memory? method is creating a TempFile, the thing is I want to only work with streams to optimize memory consumption on server-side. Like a php://input would do in PHP.

EDIT 2 :

Here is how I'm currently handling it :

Community
  • 1
  • 1
Bahaïka
  • 699
  • 1
  • 11
  • 36
  • possible duplicate of [Can I have Sinatra / Rack not read the entire request body into memory?](http://stackoverflow.com/questions/3027564/can-i-have-sinatra-rack-not-read-the-entire-request-body-into-memory) – Uri Agassi Jul 11 '14 at 15:17
  • Sorry about the confusion, I think the above will help you.. – Uri Agassi Jul 11 '14 at 15:17
  • I don't see where the answer in the referenced question creates a temp file... All it does is moves the underlying stream to where Sinatra won't read it, leaving it to your code. – Uri Agassi Jul 11 '14 at 16:22
  • It does create a tempfile, I've done some test. – Bahaïka Jul 11 '14 at 16:26
  • then I suspect that a regular POST does the same... – Uri Agassi Jul 11 '14 at 16:29
  • I don't see the issue with streaming from the file? It is memory efficient. – Matt Jul 11 '14 at 16:31
  • I haven't tried with POST, but in Ruby we can reopen class wouldn't it be possible to reopen rack class to modify it? – Bahaïka Jul 11 '14 at 16:36
  • Storing a file to disk to recopy it into mongodb is not efficient. – Bahaïka Jul 11 '14 at 16:37
  • I only said it was memory efficient. I think it's the standard way Rack handlers (i.e. web servers) pass large data back to Rack. In this case you'd need to modify Puma to allow more data in the body of a request but as that's not normal you may run into other memory issues. – Matt Jul 11 '14 at 16:53
  • You might be able to convince the web server to write to a named pipe(FIFO) that you can then read from as you would the file if you are really concerned about the IO. https://github.com/shurizzle/ruby-fifo – Matt Jul 11 '14 at 17:15
  • I don't get how a fifo would solve this problem ? Can you describe a full process please ? – Bahaïka Jul 11 '14 at 21:14
  • This is a known issue with the Rack spec: [“`rewind` must be called without arguments. It rewinds the input stream back to the beginning. It must not raise `Errno::ESPIPE`: that is, it may not be a pipe or a socket. Therefore, handler developers must buffer the input data into some rewindable object if the underlying input stream is not rewindable.”](http://rubydoc.info/github/rack/rack/master/file/SPEC#The_Input_Stream) ... – matt Jul 12 '14 at 10:34
  • The server itself (i.e. Puma here) will be buffering the input (likely in a `StringIO`, or tempfile if it is too big). You could patch the server to avoid this, but you may have issues with middleware. – matt Jul 12 '14 at 10:36

1 Answers1

0

It looks like the streaming support in Sinatra is only in the opposite direction for long running GET requests out to the client.

If you do a POST multipart upload, Sinatra will write the data to a tempfile and provide you with the details in the params Hash.

require 'sinatra'
require 'fileutils'

post '/upload' do
  tempfile = params['file'][:tempfile]
  filename = params['file'][:filename]
  FileUtils.mv(tempfile.path, "test.iso")
  "posted"
end

While in the same sinatra directory:

$ echo "testtest" > /tmp/file_to_upload
$ curl --form "file=@/tmp/file_to_upload" http://localhost:4567/upload
posted
$ cat test.iso
testtest
Matt
  • 68,711
  • 7
  • 155
  • 158
  • There must be a way to open a stream on the input, no ? I mean, PHP can access it with php://input, there is no way to access to rake at a lower level to avoid file writing and creating a stream instead ? – Bahaïka Jul 11 '14 at 15:50
  • Uri's duplicate suggestion shows a way to make Sinatra do it's read on an emtpy IO and save the old IO for you to do what you want with. http://stackoverflow.com/a/3028194/1318694 – Matt Jul 11 '14 at 15:57
  • This method is creating a tempfile : `puts request.env['data.input'].inspect` returns : `` – Bahaïka Jul 11 '14 at 16:08
  • That is how Rack does it then. All that code does is move the variable around so Rack can't `.read` it into memory – Matt Jul 11 '14 at 16:21
  • Since class can be reopened and modified there must be a way no? – Bahaïka Jul 11 '14 at 16:24