Welcome, Guest
Username: Password: Remember me
This forum is devoted to discussions around virtualization technologies related to cloud computing including Xen Cloud Platform, KVM and VMware.
  • Page:
  • 1

TOPIC: Does HA actually work?

Does HA actually work? 9 months 2 weeks ago #12712

  • chirauki
  • chirauki's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 193
  • Thank you received: 19
  • Karma: 11
Hi there!

We are trying latest cloudstack and we just can't figure out what it takes fot HA to work.

Our test environment has two XenServer 6.0.2 hosts. We just dropped the one running VMs (halt -f). This was the pool master.

In a minute or so, CS changed the pool master role and node2 came into play. CPVM was restarted fin (storage VM running in a separate vSphere cluster), but none of the user VMs restarted and all of them have HA enabled.

This is a log for the HA job:

2012-09-27 17:48:13,686 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-707) Processing HAWork[707-HA-608-Stopped-Scheduled]
2012-09-27 17:48:13,696 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-707) HA on VM[User|sebas-opensuse]
2012-09-27 17:48:13,705 DEBUG [cloud.capacity.CapacityManagerImpl] (HA-Worker-1:work-707) VM state transitted from :Stopped to Starting with event: StartRequestedvm's original host id: 30 new host id: null host id before state transition: null
2012-09-27 17:48:13,705 DEBUG [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-1:work-707) Successfully transitioned to start state for VM[User|sebas-opensuse] reservation id = 9e87348b-b9c0-4565-96e8-9f73ba4d35ad
2012-09-27 17:48:13,734 DEBUG [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-1:work-707) Trying to deploy VM, vm has dcId: 1 and podId: 1
2012-09-27 17:48:13,734 DEBUG [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-1:work-707) Deploy avoids pods: null, clusters: null, hosts: null
2012-09-27 17:48:13,737 DEBUG [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-1:work-707) Root volume is ready, need to place VM in volume's cluster
2012-09-27 17:48:13,738 DEBUG [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-1:work-707) Vol[664|vm=608|ROOT] is READY, changing deployment plan to use this pool's dcId: 1 , podId: 1 , and clusterId: 1
2012-09-27 17:48:13,740 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) DeploymentPlanner allocation algorithm: random
2012-09-27 17:48:13,743 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Trying to allocate a host and storage pools from dc:1, pod:1,cluster:1, requested cpu: 2200, requested ram: 1073741824
2012-09-27 17:48:13,744 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Is ROOT volume READY (pool already allocated)?: Yes
2012-09-27 17:48:13,745 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Searching resources only under specified Cluster: 1
2012-09-27 17:48:13,759 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Checking resources in Cluster: 1 under Pod: 1
2012-09-27 17:48:13,760 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Calling HostAllocators to find suitable hosts
2012-09-27 17:48:13,760 DEBUG [allocator.impl.FirstFitAllocator] (HA-Worker-1:work-707 FirstFitRoutingAllocator) Looking for hosts in dc: 1 pod:1 cluster:1
2012-09-27 17:48:13,763 DEBUG [allocator.impl.FirstFitAllocator] (HA-Worker-1:work-707 FirstFitRoutingAllocator) FirstFitAllocator has 0 hosts to check for allocation: []
2012-09-27 17:48:13,765 DEBUG [allocator.impl.FirstFitAllocator] (HA-Worker-1:work-707 FirstFitRoutingAllocator) Found 0 hosts for allocation after prioritization: []
2012-09-27 17:48:13,765 DEBUG [allocator.impl.FirstFitAllocator] (HA-Worker-1:work-707 FirstFitRoutingAllocator) Looking for speed=2200Mhz, Ram=1024
2012-09-27 17:48:13,765 DEBUG [allocator.impl.FirstFitAllocator] (HA-Worker-1:work-707 FirstFitRoutingAllocator) Host Allocator returning 0 suitable hosts
2012-09-27 17:48:13,766 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) No suitable hosts found
2012-09-27 17:48:13,766 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) No suitable hosts found under this Cluster: 1
2012-09-27 17:48:13,766 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Could not find suitable Deployment Destination for this VM under any clusters, returning.
2012-09-27 17:48:13,777 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) DeploymentPlanner allocation algorithm: random
2012-09-27 17:48:13,777 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Trying to allocate a host and storage pools from dc:1, pod:1,cluster:null, requested cpu: 2200, requested ram: 1073741824
2012-09-27 17:48:13,780 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Is ROOT volume READY (pool already allocated)?: No
2012-09-27 17:48:13,780 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Searching resources only under specified Pod: 1
2012-09-27 17:48:13,786 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Listing clusters in order of aggregate capacity, that have (atleast one host with) enough CPU and RAM capacity under this Pod: 1
2012-09-27 17:48:13,788 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) CPUOverprovisioningFactor considered: 2.0
2012-09-27 17:48:13,796 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Removing from the clusterId list these clusters from avoid set: [1]
2012-09-27 17:48:13,806 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Cluster: 5 has HyperVisorType that does not match the VM, skipping this cluster
2012-09-27 17:48:13,806 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-1:work-707) Could not find suitable Deployment Destination for this VM under any clusters, returning.
2012-09-27 17:48:13,818 DEBUG [cloud.capacity.CapacityManagerImpl] (HA-Worker-1:work-707) VM state transitted from :Starting to Stopped with event: OperationFailedvm's original host id: 30 new host id: null host id before state transition: null
2012-09-27 17:48:13,820 WARN [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-707) Unable to restart VM[User|sebas-opensuse] due to Unable to create a deployment for VM[User|sebas-opensuse]
2012-09-27 17:48:13,907 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-707) Rescheduling HAWork[707-HA-608-Stopped-Scheduled] to try again at Thu Sep 27 17:49:45 CEST 2012


It says "No suitable hosts", but the standby node is shown as "Up" in CS UI and no errors reported in log. Also, restarting the failed node does not change a thing. Logs say there are 12 vms stopped on node, but no intention of restarting them.

Can somebody point out what am I doing wrong?

thx
The administrator has disabled public write access.

Re: Does HA actually work? 9 months 2 weeks ago #12717

  • chirauki
  • chirauki's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 193
  • Thank you received: 19
  • Karma: 11
Hi there!

I've just noticed that both nodes on cluster have an attribute called "HA Enabled" and is set to "No" on both nodes.

What's this attribute means? Can it be changed somehow (edit host does not allows it)?

thx
The administrator has disabled public write access.

Re: Does HA actually work? 9 months 2 weeks ago #12720

  • chirauki
  • chirauki's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 193
  • Thank you received: 19
  • Karma: 11
Hi,

My bad. I just figured it out.

I was playing with the dedicated HA host feature days ago, and ha.tag was set in global settings, and I set the host tag on one of the hosts. We found out this prevents VM to be migrated when enabling manteinance on the "non-ha-host", so I took off the tag from the host.

Meanwhile, ha.tag in global settings is still set. So on failover, no host is found with the ha.tag applied an no failover at all is done.

Sorry to bother you people.

thx
The administrator has disabled public write access.
  • Page:
  • 1
Time to create page: 0.355 seconds
About BuildaCloud.org Resources Site Info

Build a Cloud.org is a resource for those users who want to build cloud computing software with both open source and proprietary software.