2

G'day people,

I am re-implementing an existent custom file upload service in ruby (sinatra) with redis as a backing store.

Client calculates a SHA1 hash and initiates an upload uploads max 64K chunks until finished

Server appends chunks to file calculates SHA1 hash of complete file to verify correct receipt

Now, what I am hoping to do is use ruby (1.9.3) Digest::SHA1 << (update) operator on each chunk, (rather than having to read the ENTIRE file from scratch at the end). [Large files > 1GB].

Unfortunately Digest::SHA1 and Marshal.dump aren't compatible

1.9.3p125 :001 > require 'digest'
 => true 
1.9.3p125 :002 > $digest = Digest::SHA1.new
 => #<Digest::SHA1: da39a3ee5e6b4b0d3255bfef95601890afd80709> 
1.9.3p125 :003 > marshalled_digest = Marshal.dump($digest)
TypeError: no _dump_data is defined for class Digest::SHA1
    from (irb):3:in `dump'
    from (irb):3
    from /Users/rhodry/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in `<main>'
1.9.3p125 :004 > 

Does anyone have any ideas on how to:

  1. Get access to the underlying memory (manipulated in C) and store / restore an object like that?
  2. Obtain an alternative implementation that would allow a similar use-case?

Thanks,

parameme

Update: gist:2280705 implements option 1 using ruby FFI - hope it is useful to someone else

parameme
  • 21
  • 2

1 Answers1

0

Have you considered and are you able to send the SHA1's of the 64k chunks? There would be more checksum data, but you would know where things went wrong, and there would be no need to store internal state of the digest.

Joshua Martell
  • 7,074
  • 2
  • 30
  • 37
  • Joshua - good thinking, yes we are considering the per-chunk hash, and a revision of the upload protocol is on our roadmap. Unfortunately we have a myriad of downlevel clients installations (around 5000 or so) so our thought bubble above was a minor server side performance feature while we gather a completely new client release together. Thanks for your feedback ! parameme – parameme Mar 29 '12 at 04:24