14

I'm building a PDF parser that fires off a Sidekiq worker to OCR parse data from a document stored in S3. After parsing, the data is stored in the Document model.

How do I append the existing S3 bucket file to Document.attachment.attach in ActiveStorage without duplicating the file (via File.open, etc...) in S3?

Shelby S
  • 400
  • 4
  • 17
  • I quickly read activestorage's source and unfortunately I don't think it is possible yet. I would like this feature as well. – Pak Sep 30 '18 at 13:54
  • Moreover this https://github.com/rails/rails/issues/30819 is not going in the right direction – Pak Sep 30 '18 at 13:57

2 Answers2

17

This can be done with a slight manipulation of the blob after it is created.

storage.yml

amazon:
  service: S3
  access_key_id: <%= ENV['AWS_ACCESS_KEY_ID'] %>
  secret_access_key: <%= ENV['AWS_SECRET_ACCESS_KEY'] %>
  region: <%= ENV['AWS_REGION'] %>
  bucket: <%= ENV['S3_BUCKET'] %>

app/models/document.rb

class Document < ApplicationRecord
  has_one_attached :pdf
end

rails console

key = "<S3 Key of the existing file in the same bucket that storage.yml uses>"

# Create an active storage blob that will represent the file on S3
params = { 
  filename: "myfile.jpg", 
  content_type:"image/jpeg", 
  byte_size:1234, 
  checksum:"<Base 64 encoding of the MD5 hash of the file's contents>" 
}
blob = ActiveStorage::Blob.create_before_direct_upload!(params)

# By default, the blob's key (S3 key, in this case) a secure (random) token
# However, since the file is already on S3, we need to change the 
# key to match our file on S3
blob.update_attribute(:key,key)

# Now we can create a document object connected to your S3 file
d = Document.create! pdf:blob.signed_id

# in your view, you can now use
url_for d.pdf

At this point, you can use the pdf attribute of your Document object like any other active storage attachment.

Troy
  • 5,319
  • 1
  • 35
  • 41
13

Troy's answer worked great for me! I also found it helpful to pull the metadata about the object from the s3 instance of the object. Something like:

s3 = Aws::S3::Resource.new(region: "us-west-1")
obj = s3.bucket("my-bucket").object("myfile.jpg")    

params = {
    filename: obj.key, 
    content_type: obj.content_type, 
    byte_size: obj.size, 
    checksum: obj.etag.gsub('"',"")
}

I only have 46 points so I left this as an answer instead of a comment :/

michaelmedford
  • 166
  • 1
  • 5
  • 2
    Depending on your needs, you may want to use `File.basename(obj.key)` for the filename. – Troy Feb 26 '19 at 07:24
  • You also can use `s3 = ActiveStorage::Blob.service` instead of creating new Aws:S3:Resource instance. And then `obj = s3.bucket.object("myfile.jpg")` – Dmitry Ukolov Jun 28 '22 at 04:54