The migration of a virtual Windows 2003 Server system from an existing VMWare ESXi 5.0 machine to a new Ganeti cluster based on KVM was way harder than expected. The process discribed on this page should work for other virtualisation platforms as well. Just replace the Ganeti and ESXi specific commands with the corresponding commands of your virtualisation software.

Prepare it!

Before we start, it is important to know that without the appropriate modifications to the registry, a Windows 2000/XP/2003 Server refuses to boot from a new disk controller. Hence before shutting down the virtual machine, you have to install this registry patch available from the Microsoft Knowledge Base. Just follow the instruction of the knowledge base article to install the registry patch. This concerns mainly Windows XP flavored systems. Windows 7/Windows 2008 Server and higher is not affected by this issue.

Installing the Virt IO drivers before moving the virtual machine could also work nicely but was not tested so far. In doing so, the para-virtualized devices in KVM could be used from the begin on.

Next, a new machine must be created on the destination host. An empty Ganeti template can be found at SourceForge. After installing the raw-image, the new instance can be created as follows:

newvhost # gnt-instance add -t drbd -o raw-image+default -s 55g -B vcpus=4,memory=4096 <newguest>

Move it!

The machine had a 50 GB single disk. To avoid a time consuming procedure of exporting, converting and importing, the disk can be copied directly over the network. To do so, the Gentoo based System Rescue CD can be used to boot both systems and access the data. To set the boot cd in Ganeti, use:

newvhost # gnt-instance modify -H boot_order=cdrom,cdrom_image_path="/path/to/image.iso" <newguest>

Further make sure that VNC is running on this machine:

newvhost # gnt-instance modify -H vnc_bind_address=0.0.0.0 <newguest>

There are many sources on the internet about how to configure ESXi to boot from a CD, so we will skip this part.

After the linux boot cd’s has started, use the ESXi Client to connect to the old machine and VNC to connect to the new one. When both systems are up showing the root prompt # and both have a working network connection, you can start with the new guest on the Ganeti cluster. Check out the device name of the hard disk. With the default settings they use a para-virtualized hard disk, which will show up as /dev/vda. Next step is to start a netcat process listening on port 9000. Everything that is received by netcat will be piped to diskdump, which copies the data stream to the hard disk:

newguest # nc -l -p 9000 | dd of=/dev/vda bs=1M

The blocksize of 1 MB is to reduce overhead caused by the default of 4 KB that results in many small write requests.

Note: Be sure to copy the whole disk and not only the partitions. The boot loader of Windows 2000/XP/2003 Server does not allow any other partition format as done by the Windows installer. Newer tools try to align the partition to the blocks of the disk to optimize for SSDs. The old Windows bootloader cannot handle this. If you want to copy only the system partition, make sure that you create the partitions on the new disc with a Windows installation medium.

The next step is to start reading the old hard disk and send it to the new guest. Therefore the ip address of the new guest is needed. As ifconfig is obsolete, we use the ip tool from the iprout2 package:

newguest # ip addr

Now you can start the copy process on the old machine. The hard disk of the old guest on the ESXi 5.0 System has the name /dev/sda. This time the command works vise-versa: Diskdump reads from the local virtual hard disk and pipes the data stream to netcat which sends the data to the new guest:

oldguest # dd if=/dev/sda bs=1M | nc <newguest ip> 9000

Note: nc does not return when dd has finished. Instead you can watch the stats of dd in the console. Don’t abort the command immediately after the transfer has finished, because some data might sill be left in the sending queue of nc. To check the progress on the receiver side, you can use:

newguest # killall -USR1 dd

This sends a signal to all dd processes to output their progress stats to stdout. Go to the console where you started dd to see the results. Compare the results of the dd progress on both machines, to make sure everything was received properly on the new guest. After the transfer is complete, you can abort netcat on one machine – the other netcat command will also exit.

We skip the compression of the data before sending it over the network as the bandwidth of the network was higher than the speed of the slowest virtual disc. High compression programs like bzip2 or xz will use 100% of a virtual processor anyway and hence limit the reading rate to 2/3 MB/s. If you want to use compression anyway, pipe the data streams through the compression programs:

oldguest # dd if=/dev/sda bs=1M | <xz|bzip2|gzip> | nc <newguest ip> 9000
newguest # nc -l -p 9000 | <unxz|bunzip2|gunzip> | dd of=/dev/vda bs=1M

If there are other disks proceed with them in the same way. After this step, all data should be on the new vhost and you can shut down the old virtual machine.

Boot it!

As mentioned before, the first problem occured when only the partition was copied to the new system. By default, the invoked linux tools align the partition table to the blocks of the hard disk and hence, the Windows bootloader refuses to boot. Even after booting from the Windows installation cd and running the FIXBOOT and FIXMBR commands, the machine refused to come up.

After copying the full hard disk as described above the system boots to the first blue screen of death (BSOD). The code was something like:

IRQL_NOT_LESS_OR_EQUAL
STOP 0x0000000A (0xBFD14AAC, 0x00000002, 0x00000000, 0x8000F67C)

This is a often seen error that mostly indicates a driver problem, especially for graphic card drivers when starting 3D programs or using any other special hardware acceleration with buggy drivers. But this time it is not a runtime but a boot error. There are no drivers to reinstall this time and booting to safe mode was also not working. So the first step was to replace all para-virtualized devices with emulated ones:

newvhost # gnt-instance modify -H nic_type=e1000,cdrom_type=ide,disk_type=ide <newguest>

Unfortunately, this did not help and the Microsoft Knowledge Base was not a big help either. After taking a closer look, we figured out, that the second parameter indicates the IRQ of problematic device. In this case this is IRQ 2. Following Wikipedia, this IRQ is related to ACPI. But disabling the ACPI results in Windows completely refusing to boot with an error message that ACPI is needed for this version of Windows. Another device that is related to ACPI is the CPU. There is an KVM parameter to change to CPU to an emulated one. To use it in Ganeti simply do:

newvhost # gnt-instance modify -H kvm_extra="-cpu qemu64" <newguest>

Note: This is meant for a 64 bit Windows system! If you want to run a 32 bit system on a 64 bit Ganeti/KVM Cluster, you should emulate a 32 bit CPU!

After enabling the emulated CPU, Windows bootet properly. We assume that the old system had only one virtual CPU or that the new system has a new generation of CPUs with new features. Both the old and the new CPU was an Intel Xeon CPU but the new one has Hyperthreading as an additional feature.

At this point, we had to install the IDE patch mentioned in the preparation section to fix this BSOD:

INACCESSIBLE_BOOT_DEVICE
STOP: 0x0000007B (parameter1, parameter2, parameter3, parameter4)

If you installed the patch but still get this BSOD, you should make sure that the Virt IO drivers were installed properly. Further you can try to set your hard disk controller to ide. Further make sure that registry patch mentioned in the preparation section was also added correctly. To do so, Windows PE live cd could help you.

Run it!

As the migrated machine only runs small tasks that are not time-critical, we did not install the VirtIO driver. Further we did not test the performance of the Windows guest in Ganeti/KVM under high load. First tests in production showed a sufficient performance for our purpose.