6

I have code that writes files to s3. The code was working fine

    conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
    bucket = conn.get_bucket(BUCKET, validate=False)
    k = Key(bucket)
    k.key = self.filekey 
    k.set_metadata('Content-Type', 'text/javascript')
    k.set_contents_from_string(json.dumps(self.output))
    k.set_acl(FILE_ACL)

This was working just fine. Then I noticed I wasn't closing my connection so I added this line at the end:

    conn.close()

Now, the file writes as before but, I'm seeing this error in my logs now

    S3Connection instance has no attribute '_cache', unable to write file 

Anyone see what I'm doing wrong here or know what's causing this? I noticed that none of the tutorials on boto show people closing connections but I know you should close your connections for IO operations as a general rule...

EDIT A note about this, when I comment out conn.close() the error disappears

Brad
  • 6,106
  • 4
  • 31
  • 43
  • Recently run into the same conceptual fear... did you shed some light on this topic? – GarciadelCastillo Jan 31 '14 at 19:59
  • @GarciadelCastillo I didn't. Sorry. Just ended up commenting out the conn.close() and it worked as hoped. – Brad Feb 01 '14 at 20:31
  • So the solution would be ... to not close the connection? – Cyril N. May 23 '14 at 08:57
  • @CyrilN. Solution? Not sure. Maybe work around? I never added it as a solution because I don't know if it was the "right" thing to do. Just because it worked doesn't mean it was a smart thing to do. I can say it hasn't caused problems for me for the life of this project. – Brad May 23 '14 at 15:18
  • Ok, I'll take it as a working solution then :) Thanks for your help :) – Cyril N. May 23 '14 at 19:11

1 Answers1

20

I can't find that error message in the latest boto source code, so unfortunately I can't tell you what caused it. Recently, we had problems when we were NOT calling conn.close(), so there definitely is at least one case where you must close the connection. Here's my understanding of what's going on:

S3Connection (well, its parent class) handles almost all connectivity details transparently, and you shouldn't have to think about closing resource, reconnecting, etc.. This is why most tutorials and docs don't mention closing resources. In fact, I only know of one situation where you should close resources explicitly, which I describe at the bottom. Read on!

Under the covers, boto uses httplib. This client library supports HTTP 1.1 Keep-Alive, so it can and should keep the socket open so that it can perform multiple requests over the same connection.

AWS will close your connection (socket) for two reasons:

  1. According to the boto source code, "AWS starts timing things out after three minutes." Presumably "things" means "idle connections."
  2. According to Best Practices for Using Amazon S3, "S3 will accept up to 100 requests before it closes a connection (resulting in 'connection reset')."

Fortunately, boto works around the first case by recycling stale connections well before three minutes are up. Unfortunately, boto doesn't handle the second case quite so transparently:

When AWS closes a connection, your end of the connection goes into CLOSE_WAIT, which means that the socket is waiting for the application to execute close(). S3Connection handles connectivity details so transparently that you cannot actually do this directly! It's best to prevent it from happening in the first place.

So, circling back to the original question of when you need to close explicitly, if your application runs for a long time, keeps a reference to (reuses) a boto connection for a long time, and makes many boto S3 requests over that connection (thus triggering a "connection reset" on the socket by AWS), then you may find that more and more sockets are in CLOSE_WAIT. You can check for this condition on linux by calling netstat | grep CLOSE_WAIT. To prevent this, make an explicit call to boto's connection.close before you've made 100 requests. We make hundreds of thousands of S3 requests in a long running process, and we call connection.close after every, say, 80 requests.

jtoberon
  • 8,706
  • 1
  • 35
  • 48
  • This doesn't answer my question per se because it didn't resolve the error BUT this is very informative in regards to using this tool so I'm gonna accept it as an answer. Thanks for the really good explanation of what's going on under-the-hood so to speak. – Brad Jul 25 '14 at 19:21
  • I am using boto3.client('s3'... should I change it somehow? – Serge Dec 27 '17 at 22:58