2

I have an oVirt setup, and recently yum updated all packages in hosts and hosted engine.

Problem is I can't start the hosted engine. After a while if you issue the command:

hosted-engine --vm-status

You get:

--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : hyper1.sarmiento.dmsn
Host ID                            : 1
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"}
Score                              : 0
Local maintenance                  : False
Host timestamp                     : 4129
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=4129 (Tue May  5 13:15:28 2015)
    host-id=1
    score=0
    maintenance=False
    state=EngineUnexpectedlyDown
    timeout=Wed Dec 31 22:14:34 1969


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : hyper2.sarmiento.dmsn
Host ID                            : 2
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"}
Score                              : 0
Local maintenance                  : False
Host timestamp                     : 3900
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=3900 (Tue May  5 13:15:19 2015)
    host-id=2
    score=0
    maintenance=False
    state=EngineUnexpectedlyDown
    timeout=Wed Dec 31 22:11:48 1969

I've been serching a lot in the logs, and what seems to be the problem is that in

/var/log/libvirt/qemu/HostedEngine.log

I can see

2015-05-05T15:18:29.928875Z qemu-kvm: -drive file=/var/run/vdsm/storage/fa0ae001-ccaf-46ed-940a-a3bb1f147f18/c1cd16d1-068d-467b-88fd-6a4910099d27/51e3c614-7725-429d-b1b6-99dbe4eb3b7c,if=none,id=drive-virtio-disk0,format=raw,serial=c1cd16d1-068d-467b-88fd-6a4910099d27,cache=none,werror=stop,rerror=stop,aio=threads: could not open disk image /var/run/vdsm/storage/fa0ae001-ccaf-46ed-940a-a3bb1f147f18/c1cd16d1-068d-467b-88fd-6a4910099d27/51e3c614-7725-429d-b1b6-99dbe4eb3b7c: Could not refresh total sector count: Operation not permitted
2015-05-05 15:18:30.183+0000: shutting down

It says that it cannot open the image file. But it does not says why!!! Any idea on how to debug this and have the engine up and running again???

Thanks a lot!!

Edit: oVirt Version is 3.5

  • What storage do you use? Have you run diagnostics on the disk image for the VM? – dyasny May 06 '15 at 01:13
  • The engine uses a gluster NFS export. I don't know how to run duagnostics on the image! How would you do that? – Luciano César Natale May 06 '15 at 01:16
  • `qemu-img info` and `qemu-img check` would do, to start with. If they pass, go over the file permissions. Since this is gluster based, also check that side of things – dyasny May 06 '15 at 03:28
  • Ok... I'm getting Input/output error when I run qemu-img on the file. But how can I get to the source of this problem??? It's probably something with Gluster I think... or maybe it's a lock issue??? – Luciano César Natale May 06 '15 at 04:18
  • So, the image is either corrupted or inaccessible, start debugging gluster first (and that's a totally different issue really) – dyasny May 06 '15 at 13:04
  • Ok! I'll do that. Basically I will try to mount the share from another location and try to write and read a file. If that's OK, how can i know if the image is locked? – Luciano César Natale May 06 '15 at 13:40
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/23500/discussion-between-luciano-cesar-natale-and-dyasny). – Luciano César Natale May 06 '15 at 15:51

1 Answers1

0

Ok, so it was a storage issue. Hosted engine images was hosted by a gluster volume and the image was in a split-brain situation.

Thanks dyasny for your help!!!