I'm using rubyzip-1.2.0 with ruby 2.2.1 to generate a zip file containing a single file (in this case, a python script). The content file does not change, and the md5sum of the generated zip string remains the same, but once I write and then read the zip string to file, the length increases and the md5sum is different every time. This happens whether I use File.open(zip_file, 'wb') {}
or IO.binwrite(zip_file, zip_string)
.
Just to add to the excitement, on OS X, the zip string and written file sizes are different (and of course, the md5sums differ), but on Ubuntu 14.04, the size remains consistent and the md5sums differ.
If I generate the file multiple times without pause, the checksums are (generally) the same; if I put in the sleep, they differ, which makes me wonder if rubyzip is writing some timestamp of some sort to the file?
I'm probably just missing some nuance of ruby binary file handling.
require 'zip'
require 'digest'
def update_zip_file(source_file)
zip_file = source_file.sub(/py$/, 'zip')
new_zip = create_lambda_zip_file(source_file)
puts "Zip string length: #{new_zip.length}"
md5_string = Digest::MD5.new
md5_string.update IO.binread(zip_file)
puts "Zip string MD5: #{md5_string.hexdigest}"
File.open(zip_file, 'wb') do |f|
puts "Updating #{zip_file}"
f.write new_zip
end
puts "New file size: #{File.size(zip_file)}"
md5_file_new = Digest::MD5.new
md5_file_new.update IO.binread(zip_file)
puts "New file MD5: #{md5_file_new.hexdigest}"
end
def create_lambda_zip_file(source_file)
zip_file = source_file.sub(/py$/, 'zip')
zip = Zip::OutputStream.write_buffer do |zio|
zio.put_next_entry(File.basename(source_file))
zio << File.read(source_file)
end
zip.string
end
(1..3).each do
update_zip_file('test.py')
sleep 2
end
Output on OS X:
Zip string length: 973
Zip string MD5: 2578d03cecf9539b046fb6993a87c6fd
Updating test.zip
New file size: 1019
New file MD5: 03e0aa2d345cac9731d1482d2674fc1e
Zip string length: 973
Zip string MD5: 03e0aa2d345cac9731d1482d2674fc1e
Updating test.zip
New file size: 1019
New file MD5: bb6fca23d13f1e2dfa01f93ba1e2cd16
Zip string length: 973
Zip string MD5: bb6fca23d13f1e2dfa01f93ba1e2cd16
Updating test.zip
New file size: 1019
New file MD5: 3d27653fa1662375de9aa4b6d2a49358
Output on Ubuntu 14.04:
Zip string length: 1020
Zip string MD5: 4a6f5c33b420360fed44c83f079202ce
Updating test.zip
New file size: 1020
New file MD5: 0cd8a123fe7f73be0175b02f38615572
Zip string length: 1020
Zip string MD5: 0cd8a123fe7f73be0175b02f38615572
Updating test.zip
New file size: 1020
New file MD5: 0a010e0ae0d75e5cde0c4c4ae098d436
Zip string length: 1020
Zip string MD5: 0a010e0ae0d75e5cde0c4c4ae098d436
Updating test.zip
New file size: 1020
New file MD5: e91ca00a43ccf505039a9d70604e184c
Any explanation or workaround? I want to make sure the zip file contents differ before rewriting the file.
Edited to fix file md5sum and update output.
EDIT And in fact rubyzip does put the current timestamp in each entry (why?). If I monkey patch it so I can manipulate the entry attributes, the zip string's md5sum will remain constant.
module Zip
class OutputStream
attr_accessor :entry_set
end
class Entry
attr_accessor :time
end
end
...
def create_lambda_zip_file(source_file)
zip_file = source_file.sub(/py$/, 'zip')
zip = Zip::OutputStream.write_buffer do |zio|
zio.put_next_entry(File.basename(source_file))
zio << File.read(source_file)
zio.entry_set.each {|e| puts e.time = Zip::DOSTime.at(File.mtime(source_file).to_i)}
end
zip.string
end