by arthurfayzullin@gmail.com
Occasionally, a situation arises where due to internal problems with self-hosted management VM, system which monitors its health begins to turn it down. The hardest thing in this situation, that it begins to turn it down immediately after turning it on, thus making it impossible to correct the situation. To remedy this situation, it is necessary to translate the system into maintenance mode, thereby disabling the tracking state of this VM.
sudo hosted-engine --set-maintenance --mode=global
Then start VM
sudo hosted-engine --vm-start
Then connect to this VM to detect and resolve problems
Do not forget to turn off maintenance mode
sudo hosted-engine --set-maintenance --mode=none
You can test system state using command (do it after each step, to be shure in right system state)
sudo hosted-engine --vm-status
ovirt-shell failed to start with “No module named kitchen.text.converters” error:
# ovirt-shell Traceback (most recent call last): File "/usr/bin/ovirt-shell", line 9, in <module> load_entry_point('ovirt-shell==3.1.0.7-SNAPSHOT', 'console_scripts', 'ovirt-shell')() File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 299, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 2229, in load_entry_point return ep.load() File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 1948, in load entry = __import__(self.module_name, globals(),globals(), ['__name__']) File "/usr/lib/python2.6/site-packages/ovirtcli/main.py", line 20, in <module> from ovirtcli.context import OvirtCliExecutionContext File "/usr/lib/python2.6/site-packages/ovirtcli/context.py", line 18, in <module> from cli.command import * File "/usr/lib/python2.6/site-packages/cli/__init__.py", line 3, in <module> from cli.context import ExecutionContext File "/usr/lib/python2.6/site-packages/cli/context.py", line 27, in <module> from cli.settings import Settings File "/usr/lib/python2.6/site-packages/cli/settings.py", line 23, in <module> from cli import platform File "/usr/lib/python2.6/site-packages/cli/platform/__init__.py", line 5, in <module> from cli.platform.posix.terminal import PosixTerminal as Terminal File "/usr/lib/python2.6/site-packages/cli/platform/posix/terminal.py", line 24, in <module> from cli.terminal import Terminal File "/usr/lib/python2.6/site-packages/cli/terminal.py", line 17, in <module> from kitchen.text.converters import getwriter ImportError: No module named kitchen.text.converters
Reason: python-kitchen not installed
Solution:
Install python-kitchen from EPEL repository.
yum install python-kitchen
Vm failed to start, and you can see error looks like that:
VM testVm is down. Exit message: internal error Failed to open socket to sanlock daemon: permission denied.
Possible reason: selinux configuration problem.
Check sebool values:
getsebool -a | grep virt virt_use_comm --> off virt_use_fusefs --> off virt_use_nfs --> on virt_use_samba --> off virt_use_sanlock --> on virt_use_sysfs --> on virt_use_usb --> on virt_use_xserver --> off
virt_use_sanlock and virt_use_nfs must be on, if not set it:
setsebool -P virt_use_sanlock=on setsebool -P virt_use_nfs=on
Vm failed to start, and you can see error looks like that:
VM testVm is down. Exit message: internal error Failed to open socket to sanlock daemon: No such file or directory.
Possible reason: softdog module not loaded.
Solution:
modprobe softdog service wdmd start service sanlock start
And, for autoloading softdog module:
echo modprobe softdog >> /etc/rc.modules chmod +x /etc/rc.modules
Or:
echo -e '#!/bin/sh\nmodprobe softdog\nexit 0' > /etc/sysconfig/modules/softdog.modules chmod +x /etc/sysconfig/modules/softdog.modules
Ian Levesque reported in users@ovirt.org maillist:
New engine install on remote DB fails “uuid-ossp extension is not loaded”
Alex Lourie post recommendation/solution:
The solution we've come up with is this: 1. Use (or tell remote DB admin to do so) the psql command to load the extension functions to template1 DB on remote DB server: psql -U postgres -d template1 -f /usr/share/pgsql/contrib/uuid-ossp.sql 2. Now, all newly created databases will include extension functions. template1 is a special DB in postgres. In fact, when you create a new DB, it is actually copied from template1 with a new name.
Ricky Schneberger reported in users@ovirt.org maillist:
After an normal “yum update” i am unable to get one of the storage domains “UP”.
Maor Lipchuk post solution:
go to the meta data of the data storage (in the storage server go to {storage_domain_name}/######..../dom_md/metadata) delete the chksum line _SHA_CKSUM=################ try to activate the storage domain again the DC (it should fail again) vdsm.log should print the computed cksum of the storage domain (Should be an error there which say "Meta Data seal is broken (checksum mismatch).... computed_cksum = ") copy the comuted chksum to the meta data (_SHA_CKSUM={new chksum number} try to activate it again.
Force NFS ver. 3, in file /etc/nfsmount.conf
[ NFSMount_Global_Options ] Defaultvers=3 Nfsvers=3
If management bridge was not created during host setup procedure, remove host from the engine management console. Also, remove vdsm and libvirt from host machine:
service vdsmd stop service libvirtd stop yum -y remove *vdsm* *libvirt* *qemu* *sanlock* jpackage* rm -rf /etc/libvirt/ rm -rf /var/lib/libvirt/ yum clean all yum makecache
Then try to reinstall host. If that not helps you can try to add ovirt management bridge manually.
At first disable NetworkManager, then correct /etc/resolv.conf
service NetworkManager stop chkconfig NetworkManager off
Here the examples of ifcfg files, resides in /etc/sysconfig/network-scripts
vim /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 BOOTPROTO=none NM_CONTROLLED=no ONBOOT=yes BRIDGE=ovirtmgmt vim /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt DEVICE=ovirtmgmt BOOTPROTO=static GATEWAY=xxx.xxx.xxx.xxx IPADDR=xxx.xxx.xxx.xxx NETMASK=255.255.255.0 NM_CONTROLLED=no ONBOOT=yes TYPE=Bridge
How to prevent possible VM startup failed. I.e. you can look message in vdsm log like that:
qemuProcessReadLogOutput:1005 : internal error Process exited while reading console log output: Supported machines are: pc RHEL 6.2.0 PC (alias of rhel6.2.0) rhel6.2.0 RHEL 6.2.0 PC (default) rhel6.1.0 RHEL 6.1.0 PC rhel6.0.0 RHEL 6.0.0 PC rhel5.5.0 RHEL 5.5.0 PC rhel5.4.4 RHEL 5.4.4 PC rhel5.4.0 RHEL 5.4.0 PC
Try to run this command on oVirt management node (hack from Jerome Deliege):
psql -U postgres engine -c "update vdc_options set option_value='rhel6.3.0' where option_name LIKE 'EmulatedMachine';"
or this:
psql -U postgres engine -c "update vdc_options set option_value='pc' where option_name LIKE 'EmulatedMachine';"
How to disable ssl support:
psql -U postgres engine -c "update vdc_options set option_value='false' where option_name='UseSecureConnectionWithServers' and version='general';"
psql -U postgres engine -c "update vdc_options set option_value='' where option_name = 'SpiceSecureChannels';"
Then restart oVirt
For version 3.0
service jboss-as stop service jboss-as start
For version 3.1 and greater
service ovirt-engine stop service ovirt-engine start
If you disable ssl, you must stop firewalls on engine and hosts.
service iptables stop
After virt-v2v you got error: Failed to import Vm <vmName> to <storageName>
Also you can look error in /var/log/ovirt-engineengine.log :
2012-08-16 16:39:30,090 ERROR [org.ovirt.engine.core.bll.ImportVmCommand] (pool-3-thread-50) [2781049c] Command org.ovirt.engine.core.bll.ImportVmCommand throw exception: org.springframework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call insertsnapshot(?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: duplicate key value violates unique constraint "pk_snapshots" Where: SQL statement "INSERT INTO snapshots( snapshot_id, status, vm_id, snapshot_type, description, creation_date, app_list, vm_configuration) VALUES( $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 )"
Solution:
1. Go to export domain folder on you nfs mount point.
cd export
2. Find you domain ovf file.
vim `grep -Ri <vm name> * | cut -d : -f 1`
3. Find all occurrencses of ovf:vm_snapshot_id=“00000000-0000-0000-0000-000000000000” and replace it with unique id generated by uuid command
If Vm or Template remain in state Image Locked more than the reasonable time period, check that the operation (template creation, in my case) really occurs, if not, you can reset this state:
1. Got the vm_guid
psql -U engine -d engine -c "SELECT vm_guid,template_status,vm_name from vm_static where vm_name like '%<vm or template name>%'";
The output looks like:
vm_guid | template_status | vm_name --------------------------------------+-----------------+---------------------- 61eedf77-de4e-42c2-8870-420372b44501 | | <VmName> 6b807ca8-3bbb-4339-bafa-f6a67893b3bb | 0 | <TemplateName>
If template status not NULL, this line contains vm_guid for locked template in other case line contains vm_guid for you locked Vm.
2. To “unlock” Vm you can use this command (use you real vm_guid)
psql -U engine -d engine -c "update vm_dynamic set status = 0 where vm_guid='61eedf77-de4e-42c2-8870-420372b44501';"
3. To “unlock” Template you can use this command (use you real vm_guid)
psql -U engine -d engine -c "update vm_static set template_status=0 where vm_guid='61eedf77-de4e-42c2-8870-420372b44501';"