5

I am trying to build an ansible playbook to configure a ubuntu vagrant box. The playbook is pretty much working with the exception of controlling the ubuntu box reboot after upgrading the kernel.

I have a host file for ansible as follow :

localhost ansible_connection=local
dockerhost ansible_ssh_port=2222 ansible_ssh_host=127.0.0.1

The latest iteration I tried to solve this problem is as follow :

  - name: Restart the server
    shell: sleep 2s && reboot & executable=/bin/bash

  - name: Wait until the virtual machine stop ie: ssh port stop responding
    local_action: wait_for host={{ansible_ssh_host}} port={{ansible_ssh_port}} state=stopped
    sudo: false

  - name: Wait for server to come up
    local_action: wait_for host={{ansible_ssh_host}} port={{ansible_ssh_port}} delay=30
    sudo: false

With this playbook steps the process block waiting for the ssh port to stop responding, until it reach timeout and exit the playbook, I am guessing that if the reboot is particularly fast it might happen in between the polling intervals of the wait_for command and miss the short time when the ssh port is actually down. The error returned by ansible is :

failed: [dockerhost] => {"elapsed": 300, "failed": true} msg: Timeout when waiting for 127.0.0.1:2222 to stop.

At least once it managed the get to the step where ansible wait for the ssh port to be available again but hung there until timing out. I think this solution is too sensitive to difference in reboot speed, which can vary widely in a virtual environment. The ansible version I am using is 1.5.3 Ubuntu is 12.04lte with a Kernel upgrade to 3.8 The complete playbook install docker and all dependencies.

I tried many variations and ideas found in various web sites but never managed to properly control a reboot and continue my playbook with the next steps.

I am looking for a simple and fool proof way of rebooting the server and continuing with the next steps in a playbook once the machine is back up and running

I have not explored the possibility of running a local vagrant reload because I want to use this same playbook in operation where I will not be running vagrant, I only mentioned vagrant in case it creates some complexities that I am not aware of. I also don't want to just pause 5 minutes and hope for the server to be up again, the point of using this kind of tools is to provision servers in a predictable and timely manner and to be portable from environment to environment, a pause just does not seem right.

I have also looked around for an ansible module that would manage this requirement but comments on the ansible web site seem to rule this out.

Thanks

Rico
  • 58,485
  • 12
  • 111
  • 141
Raymond
  • 753
  • 7
  • 11
  • 1
    Here's the thing with a vagrant box restarted manually: vagrant executes many steps before the box get bootet up like network configuration. If you reboot without vagrant (from inside the vm, from Virtualbox GUI), vagrant cannot execute the steps and thus there simply might not be a port 22 open after the reboot. – Sgoettschkes Mar 23 '14 at 12:02
  • Thanks for the info, will have to look at the consequences of stopping a Vagrant environment from inside, still the question stand outside of a vagrant environment, any idea ? – Raymond Mar 24 '14 at 13:11
  • You got the same problem vagrant itself has. It also "just" doesn't try to ssh into the box time after time and exits after some long enough time. I'd advise you to build your own base box with the kernel upgrade in order to not reboot if that's possible at all. – Sgoettschkes Mar 24 '14 at 20:26

3 Answers3

1

Might I suggest to use a bridge or private network. Using the port forward might be tricky. I used your code with private and bridged network worked perfectly with both.

Rico
  • 58,485
  • 12
  • 111
  • 141
DomaNitro
  • 3,156
  • 1
  • 23
  • 19
  • Interesting, does that mean that there could be a bug with the **Wait-for** module when using port forwarding ? or maybe **wait_for** was never intended to be used in conjunction with **local_action** ? I'll have to make significant changes to test this out, in the mean time it would be great if somebody has a solution with port forwarding. Thanks for the insight. – Raymond Mar 28 '14 at 11:42
  • I am not an expert, but I dont think its a bug. Probably the port forward (socket behaviour). My suggestion is to not use port forwarding. If you still need too you can hack copy wait_for to try to make it to work. For your reference https://github.com/ansible/ansible/blob/devel/library/utilities/wait_for#L157-166 http://www.virtualbox.org/manual/ch06.html#natforward – DomaNitro Mar 29 '14 at 17:34
  • Hi DomaNitro, I am not saying that you are wrong, but I just fail to see why the Wait-For Module should behave differently if a port forwarding is used. What I understand is that Wait_for pole a specified port on a specified host and wait until the port is either answering or not answering depending on the state parameter. Why would port forwarding have any effect on that ? – Raymond Apr 01 '14 at 09:37
  • 1
    Raymond, The issue as I see it, Is a network level problem wait_for uses socks and it connects to the local port finds it open it waits for it to stop responding but it will not. In order to verify whats going on in the background you could use tcpdump. So if you want a quick hack create a new module in bash or python that loops for X sec and trys to connect nc or telnet to the port and grep for 'SSH-2.0-OpenSSH' or something. And use that instead it will not take you to develop that. And if you still think that this is a bug report it in Anbile github as an issue – DomaNitro Apr 01 '14 at 10:54
  • Thanks DomaNitro, i think I see the light suddenly thanks to you. When monitoring the port I don't connect to the real port that is closed and reopened when I shutdown the ubuntu machine, but instead on the port of the local machine that is then forwarded, so basically the forwarding port is always available. And yes that can't really be considered a bug. – Raymond Apr 02 '14 at 13:01
0

You can use the reboot module

- name: Reboot a slow machine that might have lots of updates to apply
  reboot:
     reboot_timeout: 3600

https://docs.ansible.com/ansible/latest/modules/reboot_module.html

Jobin James
  • 916
  • 10
  • 13
-1

You need "-y" for the apt-get update and dist-upgrade...it's hung up there I believe.

curtis
  • 95
  • 1
  • 8
  • Hi Curtis, this was not the issue, I use the -y option.my play book hang on the reboot, nowhere else.I think that DomaNitro well explain the reason for the hanging, I had troubles understanding it, but I am now convinced he is right. – Raymond May 03 '14 at 07:23