1

I have a requirement to be able to terminate EC2 instances in under a minute.

The current process takes just under 2 minutes per instance because the OS shutdown process takes 60 seconds. I want to speed up terminations considerably, if possible.

Does anyone know of a way to speed up the terminate() function in EC2? Is there a way to "pull the plug" without a shutdown process as other virtualization solutions do?

Background:
In Boto, I call the terminate() function with the wait_until_terminated() function before handling subnet deletion or other follow-up tasks.

But, I am triggering boto from a 3rd party API that times out if processes (like terminations) take longer than a minute. That means every time I terminate, the API returns errors.

I have tried to work with the 3rd party to increase the timeout, but things like terminations are not in their expected use cases, and as of right now, there is no solution from the 3rd party.

I tried a stop(Force=True) and it is a little faster, but still over a minute.

I tried to forcibly remove the EBS volume, but you have to shutdown the instance first, which brings the process over the 1 minute mark.

I tried SSH'ing in to run various shutdown and halt command arguments, but I cannot find an OS command that runs faster than 60 seconds. The running services are already at a minimum, and I cannot speed up the OS shutdown any further.

I'm hoping to find a way to "pull the plug" via AWS. Or some other method to quickly terminate. It seems like terminations require an OS shutdown, which is a little odd to me when I want to torch the instance anyway.

schroeder
  • 276
  • 2
  • 4
  • 15
  • 5
    This sounds like [an XY problem](http://xyproblem.info/). What are you trying to accomplish with this setup? – ceejayoz Mar 07 '17 at 15:56
  • 2
    Why is it important that it be deleted so quickly? – EEAA Mar 07 '17 at 16:49
  • 1
    Can you explain the big picture better. Why would you need to dynamically create and destroy subnets so regularly that this is an issue? I don't think you're going to find a reliable way to turn off an EC2 instance in under a minute, I think you need to reconsider and rearchitect. – Tim Mar 07 '17 at 18:01
  • @Tim explaining the big picture is awfully lengthy - I have considered multiple different ways of accomplishing the same task, but this problem boils down to the 3rd party's API timing out after 60s waiting for AWS to return a result. I've trying forking processes in a number of creative ways, but nothing seems to work. I was hoping for a "pull the plug option" but it appears there is none. – schroeder Mar 07 '17 at 20:57
  • 1
    Seems like implementing backoff/retry logic would be the best solution. Or use a message queue... – EEAA Mar 07 '17 at 21:02
  • @EEAA a message queue is my preferred option, but I'm limited in my current skills to implement. I'm hoping (in vain) for a "use this boto switch". – schroeder Mar 07 '17 at 21:05
  • 2
    It is difficult to see the connection between terminating an instance/waiting for a 3rd party API (caller?) to get a response "from AWS" (for what?)/forking/destroying a subnet. By your own admission, this problem is XY. You need to fully explain and justify the actual problem you are trying to solve, rather than your attempted solution, before we can help. Although the question *prima facie* seems straightforward enough, I am convinced that that you are asking the wrong question or perhaps trying to solve the wrong problem. I am voting to close this question as "unclear what you're asking." – Michael - sqlbot Mar 07 '17 at 23:54
  • Agree with @Michael-sqlbot. Name the 3rd party API and perhaps someone will tell you hot to increase the timeout. Also, other people who use the api are able to comply with it somehow. Nevertheless a hint: try to disable as much services as possible so OS won't need to shut them down – Putnik Mar 08 '17 at 06:59
  • @Putnik I cannot reduce services on the OS further, services are already at a minimum. As I said before, I have already tried to work with the 3rd party, this question is an alternative. – schroeder Mar 08 '17 at 07:22
  • @Michael-sqlbot the termination trigger happens from the 3rd party API, but their API has a 60s timeout waiting for a response. Because termination takes longer than 60s, every time I try to terminate, the 3rd party times out. As I have tried to say multiple times, and as you have acknowledged, I KNOW that the timeout needs to be modified, I have TRIED to work with the 3rd party, but terminating instances is simply not on their expected use cases. THEY HAVE NO SOLUTION CURRENTLY. – schroeder Mar 08 '17 at 07:29
  • Other virtualization solutions I have used offer a "pull the plug" option for termination. My question remains very straightforward: **does EC2 have a method that I have missed?** It seems like termination waits for an OS shutdown, am I correct? Is there a way to speed it up? Can we put aside the perceived XY issue for a moment and deal with EC2 functionality issue? – schroeder Mar 08 '17 at 07:32
  • 2
    I'm moderately experienced with AWS, I hold the three associate certifications, I'm 90% of the way through studying for architect pro, and 10% through studying for devops pro. Nothing I've read, and I've read a lot, suggests what you want is possible. AWS is big though, and I'm not calling myself an expert by any means. If you want the definitive answer pay the $29 for a month of developer support and ask AWS directly. – Tim Mar 08 '17 at 08:10
  • @Tim well, your expertise means something. I suspect there is functionality, but it is not exposed in favour of providing only stable options to end users. – schroeder Mar 08 '17 at 08:19
  • @Tim with your resources, can you investigate or confirm that `terminate` is actually a `stop + terminate`? I've tried to research that one fact, but nothing is mentioned. If this is true (as it appears from my tests), then you can post that as an answer as the time to terminate depends on the instance OS shutdown processes. – schroeder Mar 08 '17 at 09:39
  • I've already looked, I found nothing. Ask AWS. It'll cost you $30 for a month of support. If it's not worth that money then it's not worth doing, in a business context that's nothing. – Tim Mar 08 '17 at 17:58
  • @Tim I was just hoping to give you credit, that's all. – schroeder Mar 10 '17 at 07:08
  • Thanks @schroeder I appreciate that, but if it works I suggest you accept the answer below. – Tim Mar 10 '17 at 08:20

1 Answers1

6

While I agree this is an XY problem for sure and you should address the problem in another way, there are far faster ways of doing an OS shutdown than using shutdown. There is no reason to wait for Linux to call init scripts and issue TERM and KILL to all processes.

Historically, I believe killall -9 init or a magic SysRq key was the quickest way. However, systemd lists many ways (man systemd), for example:

   SIGRTMIN+13
       Immediately halts the machine.

   SIGRTMIN+14
       Immediately powers off the machine.

You'll probably have to test a few options before finding the one that AWS reacts to fastest, but going from 60 seconds OS shutdown to 1-5 seconds should be simple enough.

Animism
  • 181
  • 5
  • 2
    Interesting idea. Kill the OS then call the AWS terminate API to terminate the VM. – Tim Mar 10 '17 at 01:10
  • 2
    @Tim if the instance is set to Auto-Terminate on shutdown, it shouldn't be necessary to call the API – Animism Mar 10 '17 at 01:14
  • I wasn't aware it worked like that, but the documentation suggests you're right. I'd be interested to see what happens when someone tries it. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/terminating-instances.html#Using_ChangingInstanceInitiatedShutdownBehavior – Tim Mar 10 '17 at 01:18
  • 2
    After learning something new, that provides a good answer to the question, it makes me sad you didn't upvote it too :( – user9517 Mar 10 '17 at 06:43
  • I forgot about the other ways to kill Linux. I'll test these today. – schroeder Mar 10 '17 at 07:10
  • What is `kill -9 init` supposed to do? From reading the man page it doesn't sound like valid arguments. – kasperd Mar 26 '17 at 22:08
  • @schroeder did you ever test or resolve this? I'm curious. – Tim Apr 10 '17 at 22:00
  • 1
    @Tim unfortunately, no. I was unable to find a way to pass the signals to the Amazon Linux OS. – schroeder Apr 11 '17 at 06:17
  • @kasperd kill takes a PID, pkill takes a program name. I think he meant pkill instead of kill, as "init" is the name of the first process that starts everything else on a Linux system. But init is always PID 1 so "kill -9 1" should also work. – Joseph Garvin Jun 04 '19 at 20:36
  • You should be able to run kill inside the VM with SSM if you have it configured: https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_SendCommand.html – Gert van den Berg Apr 06 '23 at 14:45