16

I have a sklearn model and I want to save the pickle file on my s3 bucket using joblib.dump

I used joblib.dump(model, 'model.pkl') to save the model locally, but I do not know how to save it to s3 bucket.

s3_resource = boto3.resource('s3')
s3_resource.Bucket('my-bucket').Object("model.pkl").put(Body=joblib.dump(model, 'model.pkl'))

I expect the pickled file to be on my s3 bucket.

the_dummy
  • 317
  • 1
  • 3
  • 15
  • does this result in an error? what is the behavior you are seeing? – JD D Jun 13 '19 at 01:01
  • joblib.dump returns a list of filenames... `Body` needs to be a bytes or a file-like object that can be read. – JD D Jun 13 '19 at 01:06

4 Answers4

17

Here's a way that worked for me. Pretty straight forward and easy. I'm using joblib (it's better for storing large sklearn models) but you could use pickle too.
Also, I'm using temporary files for transferring to/from S3. But if you want, you could store the file in a more permanent location.

import tempfile
import boto3
import joblib

s3_client = boto3.client('s3')
bucket_name = "my-bucket"
key = "model.pkl"

# WRITE
with tempfile.TemporaryFile() as fp:
    joblib.dump(model, fp)
    fp.seek(0)
    s3_client.put_object(Body=fp.read(), Bucket=bucket_name, Key=key)

# READ
with tempfile.TemporaryFile() as fp:
    s3_client.download_fileobj(Fileobj=fp, Bucket=bucket_name, Key=key)
    fp.seek(0)
    model = joblib.load(fp)

# DELETE
s3_client.delete_object(Bucket=bucket_name, Key=key)
Wesley Cheek
  • 1,058
  • 12
  • 22
Alexei Andreev
  • 598
  • 5
  • 17
4

Use following code to dump your model to s3 location in .pkl or .sav format:

import tempfile
import boto3
s3 = boto3.resource('s3')

# you can dump it in .sav or .pkl format 
location = 's3://bucket_name/folder_name/'
model_filename = 'model.sav'  # use any extension you want (.pkl or .sav)
OutputFile = location + model_filename

# WRITE
with tempfile.TemporaryFile() as fp:
    joblib.dump(scikit_learn_model, fp)
    fp.seek(0)
    # use bucket_name and OutputFile - s3 location path in string format.
    s3.Bucket('bucket_name').put_object(Key= OutputFile, Body=fp.read())
Sayali Sonawane
  • 12,289
  • 5
  • 46
  • 47
  • 1
    I think location should contain just the folder name inside the bucket i.e `location = 'folder_name/'`. That is what worked for me – oldmonk May 06 '20 at 06:49
4

You can also use the s3fs library.

import joblib
import s3fs
import os

# Write
fs = s3fs.S3FileSystem()
output_file = os.path.join("s3://...", "model.joblib")

with fs.open(output_file, 'wb') as f:
    joblib.dump(clf, f) 

# Read
with fs.open(output_file, 'rb') as f:
    clf = joblib.load(f)
nbeuchat
  • 6,575
  • 5
  • 36
  • 50
1

Just correcting Sayali Sonawane's answer:

import tempfile
import boto3
s3 = boto3.resource('s3')

# you can dump it in .sav or .pkl format 
location = 'folder_name/' # THIS is the change to make the code work
model_filename = 'model.sav'  # use any extension you want (.pkl or .sav)
OutputFile = location + model_filename

# WRITE
with tempfile.TemporaryFile() as fp:
    joblib.dump(scikit_learn_model, fp)
    fp.seek(0)
    # use bucket_name and OutputFile - s3 location path in string format.
    s3.Bucket('bucket_name').put_object(Key= OutputFile, Body=fp.read())
Andrii Krupka
  • 4,276
  • 3
  • 20
  • 41
Abhi
  • 33
  • 5