I have an application where users can upload text-based files (xml, csv, txt) that are persisted to S3. Some of these files are pretty big. There are a variety of operations that need to be performed on the data in these files, so rather than read them from S3 and have it time out occasionally I download the files locally, then turn the operations loose on them.
Here's the code I use to download the file from S3. Upload
is the name of the AR model I use to store this information. This method is an instance method on the Upload model:
def download
basename = File.basename(self.text_file_name.path)
filename = Rails.root.join(basename)
host = MyFitment::Utility.get_host_without_www(self.text_file_name.url)
Net::HTTP.start(host) do |http|
f = open(filename)
begin
http.request_get(self.text_file_name.url) do |resp|
resp.read_body do |segment|
f.write(segment) # Fails when non-ASCII 8-bit characters are included.
end
end
ensure
f.close()
end
end
filename
end
So you see that line above where the load fails. This code somehow thinks all files that are downloaded are encoded in ASCII 8-bit. How can I:
1) Check the encoding of a remote file like that 2) Download it and write it successfully.
Here's the error that is happening with a particular file right now:
Encoding::UndefinedConversionError: "\x95" from ASCII-8BIT to UTF-8
from /Users/me/code/myapp/app/models/upload.rb:47:in `write'
Thank you for any help you can offer!