Info: rbd

I upgraded Ceph from the old Dumpling version to the latest Jewel version. In addition to the OSDs not able to start up due to some permission settings on /var/lib/ceph (we need to change the permission settings recursively to ceph:ceph), I am also having this HEALTH_WARN messages:

indra@sc-test-nfs-01:~$ ceph status
    cluster d3dc01a3-c38d-4a85-b040-3015455246e6
     health HEALTH_WARN
            too many PGs per OSD (512 > max 300)
            crush map has legacy tunables (require bobtail, min is firefly)
            crush map has straw_calc_version=0
     monmap e3: 3 mons at {sc-test-ceph-01=192.168.3.3:6789/0,sc-test-ceph-02=192.168.3.4:6789/0,sc-test-nfs-01=192.168.3.2:6789/0}
            election epoch 50, quorum 0,1,2 sc-test-nfs-01,sc-test-ceph-01,sc-test-ceph-02
     osdmap e100: 3 osds: 3 up, 3 in
      pgmap v965721: 704 pgs, 6 pools, 188 MB data, 59 objects
            61475 MB used, 1221 GB / 1350 GB avail
                 704 active+clean

To resolve the problem is very simple, use below command:

ceph osd crush tunables optimal

indra@sc-test-nfs-01:~$ ceph osd crush tunables optimal
adjusted tunables profile to optimal

Ceph status after the adjustment:

indra@sc-test-nfs-01:~$ ceph status
    cluster d3dc01a3-c38d-4a85-b040-3015455246e6
     health HEALTH_WARN
            too many PGs per OSD (512 > max 300)
     monmap e3: 3 mons at {sc-test-ceph-01=192.168.3.3:6789/0,sc-test-ceph-02=192.168.3.4:6789/0,sc-test-nfs-01=192.168.3.2:6789/0}
            election epoch 50, quorum 0,1,2 sc-test-nfs-01,sc-test-ceph-01,sc-test-ceph-02
     osdmap e101: 3 osds: 3 up, 3 in
      pgmap v965764: 704 pgs, 6 pools, 188 MB data, 59 objects
            61481 MB used, 1221 GB / 1350 GB avail
                 704 active+clean

The warning messages related to the crush map are gone. Yay!

PS. Ignore the "too many PGs per OSD" warning, due to I have limited number of OSDs and too many pools and PGs on my test environment.

Source: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg10225.html
Reference: http://docs.ceph.com/docs/master/rados/operations/crush-map/

I installed CloudStack 4.1.0 on Ubuntu 12.04.2 LTS (precise) server. Initially I wanted to use Ubuntu 13.04 (raring) but CloudStack only provides packages repository for Ubuntu 12.04. I used KVM as the hypervisor hosts, also running Ubuntu 12.04.2 LTS, and use Ceph RBD (RADOS Block Device) for the primary storage for CloudStack.

The default libvirt version on Ubuntu 12.04 doesn’t support Ceph RBD as primary storage. I followed this instruction from Wido to get libvirt version 1.0.2, which can support RBD storage pool support. However, I had an issue whereby the libvirt is reporting wrong RBD storage pool’s disk usage / allocation information.

root@hv-kvm-02:~# virsh pool-info bab81ce8-d53f-3a7d-b8f6-841702f65c89
Name:           bab81ce8-d53f-3a7d-b8f6-841702f65c89
UUID:           bab81ce8-d53f-3a7d-b8f6-841702f65c89
State:          running
Persistent:     no
Autostart:      no
Capacity:       5.47 TiB
Allocation:     34819.02 TiB
Available:      5.47 TiB

As a result, VM instance creation failed because the RBD storage pool is reported as having insufficient disk space and CloudStack wasn’t able to find a suitable /available storage pool.

2013-07-15 11:15:28,313 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-3:job-168) Checking pool: 208 for volume allocation [Vol[227|vm=225|ROOT]], maxSize : 15828044742656, totalAllocatedSize : 1769538048, askingSize : 8589934592, allocated disable threshold: 0.85
2013-07-15 11:15:28,313 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Checking if storage pool is suitable, name: sc-image ,poolId: 209
2013-07-15 11:15:28,313 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Is localStorageAllocationNeeded? false
2013-07-15 11:15:28,313 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Is storage pool shared? true
2013-07-15 11:15:28,317 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-3:job-168) Checking pool 209 for storage, totalSize: 6013522722816, usedBytes: 38283921137336466, usedPct: 6366.305226067051, disable threshold: 0.85
2013-07-15 11:15:28,317 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-3:job-168) Insufficient space on pool: 209 since its usage percentage: 6366.305226067051 has crossed the pool.storage.capacity.disablethreshold: 0.85
2013-07-15 11:15:28,317 DEBUG [storage.allocator.FirstFitStoragePoolAllocator] (Job-Executor-3:job-168) FirstFitStoragePoolAllocator returning 1 suitable storage pools
2013-07-15 11:15:28,317 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) Checking suitable pools for volume (Id, Type): (228,DATADISK)
2013-07-15 11:15:28,317 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) We need to allocate new storagepool for this volume
2013-07-15 11:15:28,319 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) Calling StoragePoolAllocators to find suitable pools
2013-07-15 11:15:28,319 DEBUG [storage.allocator.FirstFitStoragePoolAllocator] (Job-Executor-3:job-168) Looking for pools in dc: 6 pod:6 cluster:6 having tags:[rbd]
2013-07-15 11:15:28,322 DEBUG [storage.allocator.FirstFitStoragePoolAllocator] (Job-Executor-3:job-168) FirstFitStoragePoolAllocator has 1 pools to check for allocation
2013-07-15 11:15:28,322 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Checking if storage pool is suitable, name: sc-image ,poolId: 209
2013-07-15 11:15:28,322 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Is localStorageAllocationNeeded? false
2013-07-15 11:15:28,322 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Is storage pool shared? true
2013-07-15 11:15:28,326 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-3:job-168) Checking pool 209 for storage, totalSize: 6013522722816, usedBytes: 38283921137336466, usedPct: 6366.305226067051, disable threshold: 0.85
2013-07-15 11:15:28,326 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-3:job-168) Insufficient space on pool: 209 since its usage percentage: 6366.305226067051 has crossed the pool.storage.capacity.disablethreshold: 0.85
2013-07-15 11:15:28,326 DEBUG [storage.allocator.FirstFitStoragePoolAllocator] (Job-Executor-3:job-168) FirstFitStoragePoolAllocator returning 0 suitable storage pools
2013-07-15 11:15:28,326 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) No suitable pools found for volume: Vol[228|vm=225|DATADISK] under cluster: 6
2013-07-15 11:15:28,326 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) No suitable pools found
2013-07-15 11:15:28,326 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) No suitable storagePools found under this Cluster: 6
2013-07-15 11:15:28,326 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) Could not find suitable Deployment Destination for this VM under any clusters, returning.
2013-07-15 11:15:28,332 DEBUG [cloud.vm.UserVmManagerImpl] (Job-Executor-3:job-168) Destroying vm VM[User|Indra-Test-3] as it failed to create on Host with Id:null
2013-07-15 11:15:28,498 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-3:job-168) VM state transitted from :Stopped to Error with event: OperationFailedToErrorvm's original host id: null new host id: null host id before state transition: null
2013-07-15 11:15:29,125 INFO [user.vm.DeployVMCmd] (Job-Executor-3:job-168) com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|Indra-Test-3]Scope=interface com.cloud.dc.DataCenter; id=6

After consulting the CloudStack users’ mailing list and logging a bug report on Apache’s JIRA here without any success, I managed to resolve the problem by compiling and installing the latest version of libvirt. This is how I did it on my KVM hypervisor hosts running on Ubuntu 12.04.2 LTS servers:

1. Download the latest libvirt version (1.1.0) from libvirt’s FTP site, and extract it:

ftp://libvirt.org/libvirt/libvirt-1.1.0.tar.gz

2. Install the required packages for compiling libvirt:

apt-get install librbd-dev
apt-get install libpciaccess-dev

3. Compile libvirt with RBD storage support, and set the required prefixes to overwrite the existing default libvirt on the Ubuntu server:

./autogen.sh --prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --with-storage-rbd

4. After the compilation has been completed, check the logs visually to confirm that RBD storage support is enabled, then do the installation:

make
make install

5. Restart the KVM hosts after the installation is done, and then verify that the latest version of libvirt has been installed:

libvirtd --version
virsh --version

The command “virsh pool-info” is now showing the correct “ allocation” amount:

root@hv-kvm-02:~# virsh pool-info d433809b-01ea-3947-ba0f-48077244e4d6
Name: d433809b-01ea-3947-ba0f-48077244e4d6
UUID: d433809b-01ea-3947-ba0f-48077244e4d6
State: running
Persistent: no
Autostart: no
Capacity: 5.47 TiB
Allocation: 328.00 B
Available: 5.47 TiB

CloudStack will then be able to utilise the RBD storage pool when creating VM instances.

Info

Wednesday, June 01, 2016

Ceph - Crush Map has Legacy Tunables

Monday, July 22, 2013

Manual Compile of libvirt to Resolve CloudStack and Ceph RBD Storage Issue

Twitter

Blog Archive

About Me

Labels

Search This Blog