0

I am (trying to!) learn Cloud Foundry using bosh-lite on a MacBook Pro. I manage to get it running however every time after starting from scratch it stops working, I suspect this is associated with stopping the [virtualbox] VM / putting the laptop to sleep, but can't confirm if this is definitely the case.

My experience is limited and I'm having difficulties in not just resolving the issue, but also in understanding what is going wrong. Apologies if this is an obvious problem, but I haven't been able to clearly determine how to stop this problem from happening, and the only solution I've had so far is to destroy the deployment using Vagrant and then starting from scratch - which takes a while and isn't the optimal fix I'm sure. :)

I've noticed that the 'bosh vms' show unresponsive agents and that they're not staring properly. The error in bosh cck indicates a locking issue, but I suspect that this may be a misnomer as running bosh locks indicates that there are no locks. Once again, I'm a newbie, so this may simply be a misunderstanding ...

Help - how do I fix this!! Is there a way to quickly 'reset' to a working state? (vagrant reload --provision doesn't help) Where exactly is the issue?

Also, what is the (default) root password for the vagrant cloudfoundry/bosh-lite VM?

> bosh vms

+---------------------------------------------------------------------------+--------------------+-----+-----------+--------------+
| VM                                                                        | State              | AZ  | VM Type   | IPs          |
+---------------------------------------------------------------------------+--------------------+-----+-----------+--------------+
| api_z1/0 (8dfeb143-59b1-46dd-9482-e90931a70a0d)                           | unresponsive agent | n/a | large_z1  | 10.244.0.138 |
| blobstore_z1/0 (7795ce02-d64e-4cc7-be1e-0e328384d568)                     | unresponsive agent | n/a | medium_z1 | 10.244.0.130 |
| consul_z1/0 (e92f6bfd-f623-4ba4-abf3-3d4baa0953fa)                        | unresponsive agent | n/a | small_z1  | 10.244.0.54  |
| doppler_z1/0 (049eaa18-3d4f-48d8-92ed-ea4b6a20cd29)                       | unresponsive agent | n/a | medium_z1 | 10.244.0.146 |
| etcd_z1/0 (e45a7648-e43d-4753-8a18-3ab21b86293d)                          | unresponsive agent | n/a | large_z1  | 10.244.0.42  |
| ha_proxy_z1/0 (ba6e8ce6-8f40-4868-8a71-c74119f173ea)                      | failing            | n/a | router_z1 | 10.244.0.34  |
| hm9000_z1/0 (ff8ae6a3-1889-4fb0-aabf-072012cf9f48)                        | unresponsive agent | n/a | medium_z1 | 10.244.0.142 |
| loggregator_trafficcontroller_z1/0 (8f2e4ea1-dda7-4d15-9050-528338824e3b) | unresponsive agent | n/a | small_z1  | 10.244.0.150 |
| nats_z1/0 (9e4eab32-ac91-4f05-83be-b8189c2991e7)                          | unresponsive agent | n/a | medium_z1 | 10.244.0.6   |
| postgres_z1/0 (fb8d1eee-3ade-480e-aa01-3db26a64b447)                      | unresponsive agent | n/a | medium_z1 | 10.244.0.30  |
| router_z1/0 (f9ce017b-580f-4fce-b79d-01ceef190e19)                        | unresponsive agent | n/a | router_z1 | 10.244.0.22  |
| runner_z1/0 (c0b0871b-c672-46c8-ac4a-1aabd81864f6)                        | unresponsive agent | n/a | runner_z1 | 10.244.0.26  |
| uaa_z1/0 (63b4bfa7-499d-4dba-93f6-2017b04a7588)                           | unresponsive agent | n/a | medium_z1 | 10.244.0.134 |
+---------------------------------------------------------------------------+--------------------+-----+-----------+--------------+



> bosh cck

Acting as user 'admin' on deployment 'cf-warden' on 'Bosh Lite Director'
Performing cloud check...

Director task 96
Error 100: Unable to get deployment lock, maybe a deployment is in progress. Try again later.

Task 96 error

For a more detailed error report, run: bosh task 96 --debug

> bosh locks

Acting as user 'admin' on 'Bosh Lite Director'

No locks

It is possible to do a 'reset' and get it up and running again using the commands below, but this takes quite some time and is surely more of a 'hammer' than is required!

# bosh-lite dir 
vagrant destroy && vagrant up

# cd cf-release dir 
bosh upload release
bosh deploy 

# cd bosh-lite dir
bin/add-route
cf api --skip-ssl-validation https://api.bosh-lite.com
cf create-org my_org
cf create-space development -o my_org
Eddie
  • 9,696
  • 4
  • 45
  • 58
Jinxed NZ
  • 35
  • 6
  • It _seems_ that after a new deployment that just running vagrant reload --provision will break it. – Jinxed NZ Jan 28 '17 at 15:33
  • I can't give you a full answer, but when you restart or halt then poweron the bosh-lite VM any of the existing jobs will be lost. It happens because bosh-lite deploys your jobs into containers on the VM and those containers are not restarted after a reboot. The easiest way to get back up and running is usually `bosh cck`. It will detect the containers are down and let you recreate them. The lock errors are a little surprising. You might run it again and see if you keep getting lock errors. You might also try disabling resurrector and see if that helps. Sometimes that locks the deployment. – Daniel Mikusa Jan 28 '17 at 19:52

3 Answers3

0

You can use sudo su after ssh'ing into the bosh-lite VM with vagrant ssh to become root without needing to enter a root password.

BOSH-lite has always been hard to resurrect after a VM reboot/sleep.
Someone recently (Dec 2016) wrote a utility to "gracefully put machines running BOSH Lite to sleep" and restore it on system wake, to address it: https://github.com/henryaj/ambient

dkoper
  • 1,485
  • 9
  • 10
0

I usually do vagrant suspend and then vagrant up to avoid a situation with dead containers/VMs inside BOSH Lite.

You can do bosh cck but my experience shows that a simple deployment recreate is much faster and also more reliable.

hsiliev
  • 66
  • 5
  • I am also using `vagrant suspend` but `vagrant resume`. Or, to repair the build, `bosh cck` – muehsi Feb 14 '17 at 10:04
0

It is recommended that we pause the Bosh-lite VM when its not in use so that it can simply be resumed after the system goes to sleep/get rebooted; otherwise VM will be halted by OS (Bosh-lite VM goes in aborted state). Running vagrant up on aborted bosh-lite, gets it running but in that case CF VMs go in unresponsive state which requires redeployment.

Running vagrant suspend for pausing and vagrant resume when restarting the work helps avoid the situation with unresponsive CF VMs.