Qemu native GlusterFS integration

Starting with GlusterFS 3.4.0 there is a an API which can be used to access files on a GlusterFS volume directly without the FUSE-mount. Qemu supports this, starting with version 1.3.0.

GlusterFS

NOTE: this got implemented in stoney cloud 1.2.10.4 and you should not have to perform the following steps.

Since we run Qemu as unprivileged user we have to permit access to GlusterFS from an unprivileged port as well as an unprivileged user.

To achieve that, add the following line to the volume management in /etc/glusterfs/glusterd.vol on all involved storage nodes:

option rpc-auth-allow-insecure on

and run the following command on a gluster node:

gluster volume set virtualization server.allow-insecure On

After that you should at least restart glusterd:

/etc/init.d/glusterd restart

But it seems that one also has to restart all glusterfsd daemons by restarting the volume. So, do the following on a GlusterFS node:

gluster volume stop virtualization
gluster volume start virtualization

Libvirt

The XML has to be changed only slightly. Following is an example.

current

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/virtualization/vm-templates/5b77d2f6-061f-410c-8ee7-9e61da6f1927/f3d87cf9-f7d8-4224-b908-cc9fc6c8fcd4.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>

direct-gluster-access

    <disk type='network' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source protocol='gluster' name='virtualization/vm-templates/5b77d2f6-061f-410c-8ee7-9e61da6f1927/f3d87cf9-f7d8-4224-b908-cc9fc6c8fcd4.qcow2'>
        <host name='10.1.120.11'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>

The changed elements are:

disk type
has to be network instead of file
source
has now a protocol and a name instead of a file attribute as well as a new subelement host, where the name is the filename without /var/

LDAP

Current

dn: sstDisk=vda,ou=devices,sstVirtualMachine=ece1eab1-4a9e-4729-bfc1-59d9f01550a5,ou=virtual machines,ou=virtualization,ou=services,dc=foss-cloud,dc=org
objectclass: top
objectclass: sstVirtualizationVirtualMachineDisk
sstType: file
sstDevice: disk
sstDriverName: qemu
sstDriverType: qcow2
sstDriverCache: none
sstVolumeName: 3aa376a5-6d09-442b-8662-425de888385b
sstSourceFile: /var/virtualization/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/3aa376a5-6d09-442b-8662-425de888385b.qcow2
sstDisk: vda
sstTargetBus: virtio
sstReadonly: FALSE
sstVolumeAllocation: 0
sstVolumeCapacity: 32212254720

New

dn: sstDisk=vda,ou=devices,sstVirtualMachine=ece1eab1-4a9e-4729-bfc1-59d9f01550a5,ou=virtual machines,ou=virtualization,ou=services,dc=foss-cloud,dc=org
objectclass: top
objectclass: sstVirtualizationVirtualMachineDisk
sstType: network -> instead of file
sstDevice: disk
sstDriverName: qemu
sstDriverType: qcow2
sstDriverCache: none
sstVolumeName: 3aa376a5-6d09-442b-8662-425de888385b
sstSourceProtocol: gluster -> new
sstSourceName: virtualization/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/3aa376a5-6d09-442b-8662-425de888385b.qcow2 -> new attribute
sstSourceHostName: 10.1.120.11 -> new and multi-valued attribute (for later use, currently, only one entry is supported)
sstDisk: vda
sstTargetBus: virtio
sstReadonly: FALSE
sstVolumeAllocation: 0
sstVolumeCapacity: 32212254720

Notes for Configuration in LDAP

Persistent Storage Pool

sstStoragePool=0f83f084-8080-413e-b558-b678e504836e,ou=storage pools,ou=virtualization,ou=services,o=stepping-stone,c=ch

file

sstStoragePoolURI: file:///var/virtualization/vm-persistent/0f83f084-8080-413e-b558-b678e504836e

gluster

sstStoragePoolURI: gluster:///tier1-storage-node-01/gv-tier1-vm-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e

Questions / Open Issues

  • Where do we store the following configuration values?
    • Possible storage pool types: file, gluster (others could follow)
    • Host names or ip addresses (in case of gluster): tier1-storage-node-01
    • Mount point: /var/virtualization/vm-persistent/ or /gv-tier1-vm-01/vm-persistent/
    • Storage Pool Names: 0f83f084-8080-413e-b558-b678e504836e
    • Where and how to we store the information, that we have slow and fast storage?

Test

qemu-img info

foss-cloud-node-01 ~ # qemu-img info gluster://10.1.120.11:24007/virtualization/vm-templates/5b77d2f6-061f-410c-8ee7-9e61da6f1927/f3d87cf9-f7d8-4224-b908-cc9fc6c8fcd4.qcow2
image: gluster://10.1.120.11:24007/virtualization/vm-templates/5b77d2f6-061f-410c-8ee7-9e61da6f1927/f3d87cf9-f7d8-4224-b908-cc9fc6c8fcd4.qcow2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 874M
cluster_size: 65536

qemu-img create

foss-cloud-node-01 ~ # qemu-img create -f qcow2 gluster://10.1.120.11/virtualization/foo.qcow2 20G
Formatting 'gluster://10.1.120.11/virtualization/foo.qcow2', fmt=qcow2 size=21474836480 encryption=off cluster_size=65536 lazy_refcounts=off 
foss-cloud-node-01 ~ # stat /var/virtualization/foo.qcow2 
  File: ‘/var/virtualization/foo.qcow2’
  Size: 197120    	Blocks: 385        IO Block: 131072 regular file
Device: 1ch/28d	Inode: 11101537565691465327  Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-08-04 07:37:01.814422718 +0200
Modify: 2013-08-04 08:31:51.968884606 +0200
Change: 2013-08-04 08:31:51.968884606 +0200
 Birth: -

qemu

The following qemu commandline is generated by libvirt, minus the network device, the -S parameter and with disabled spice password and should give a running VM, given that the specified VM images exist:

/usr/bin/qemu-system-x86_64 \
 -machine accel=kvm \
 -name e35422fa-18e7-41eb-8478-d09daff1b43a \
 -machine pc-1.3,accel=kvm,usb=off \
 -m 2048 \
 -realtime mlock=off \
 -smp 1,sockets=1,cores=1,threads=1 \
 -uuid e35422fa-18e7-41eb-8478-d09daff1b43a \
 -no-user-config \
 -nodefaults \
 -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/e35422fa-18e7-41eb-8478-d09daff1b43a.monitor,server,nowait \
 -mon chardev=charmonitor,id=monitor,mode=control \
 -rtc base=utc \
 -no-shutdown \
 -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x8.0x7 \
 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x8 \
 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x8.0x1 \
 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x8.0x2 \
 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
 -drive file=gluster://10.1.120.11/virtualization/iso/a51f5193-f518-4cb6-8d5b-f24538543217.iso,if=none,id=drive-ide0-0-1,readonly=on,format=raw \
 -device ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 \
 -drive file=gluster://10.1.120.11/virtualization/vm-templates/5b77d2f6-061f-410c-8ee7-9e61da6f1927/f3d87cf9-f7d8-4224-b908-cc9fc6c8fcd4.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none \
 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
 -chardev spicevmc,id=charchannel0,name=vdagent \
 -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \
 -device usb-tablet,id=input0 \
 -spice port=5900,addr=192.168.140.13,seamless-migration=on,disable-ticketing \
 -vga qxl \
 -global qxl-vga.ram_size=67108864 \
 -global qxl-vga.vram_size=67108864 \
 -device AC97,id=sound0,bus=pci.0,addr=0x4 \
 -chardev spicevmc,id=charredir0,name=usbredir \
 -device usb-redir,chardev=charredir0,id=redir0,filter=-1:-1:-1:-1:0 \
 -chardev spicevmc,id=charredir1,name=usbredir \
 -device usb-redir,chardev=charredir1,id=redir1,filter=-1:-1:-1:-1:0 \
 -chardev spicevmc,id=charredir2,name=usbredir \
 -device usb-redir,chardev=charredir2,id=redir2,filter=-1:-1:-1:-1:0 \
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7

Troubleshooting

Qemu hang

Should Qemu hang after start, check the brick logfile (in our default setup this would be /var/log/glusterfs/bricks/var-data-gluster-volume-01.log) for the following entries:

[2013-08-04 11:02:34.022328] E [addr.c:152:gf_auth] 0-auth/addr: client is bound to port 50037 which is not privileged
[2013-08-04 11:02:34.022362] E [authenticate.c:239:gf_authenticate] 0-auth: no authentication module is interested in accepting remote-client (null)
[2013-08-04 11:02:34.022410] E [server-handshake.c:578:server_setvolume] 0-virtualization-server: Cannot authenticate client from foss-cloud-node-01-12502-2013/08/04-11:02:33:989625-virtualization-clie
nt-1-0 3.4.0

This means that one of the configuration options from above didn't work and/or you forgot to restart glusterd or the volume.

Caveats

Number of GlusterFS servers

At the moment it is not possible to specify more than one server.

libvirt integration

Libvirt is supposed to be able to create images on a GlusterFS. But a storage pool must be set up first with the appropriate configuration.

Last modified on 20 December 2013, at 14:33