Virtualizing and Disaster Testing a Linux Server

Since the mercury has decided to go hiding completely in that tiny little ball it calls home apparently my small but dedicated webserver has decided to freak out and take a small hibernation period itself. I think the NIC is going bad — every few days it would take down my whole network. Pulling the network cable from the router brought everything ELSE back up, and the machine seemed to reboot and work fine once I put the keyboard and head back on, but then, a few days later, same problem.

So, I’ve decided to do away with the older desktop hardware that I keep replacing and re-purposing and move to a virtual machine on my media center PC, which is on all the time anyway, and a lot beefier. This gave me a chance to do a couple of things:

  • Eliminate a whole PC (hopefuly saving some energy $$)
  • Test my backup strategy (I got a passing grade, but just barely — read on)
  • Play with VirtualBox and really dig into a virtual machine
  • Clean up the dang webserver!!

Getting rid of a whole machine has obvious benefits. The existing webserver had a 200GB hard drive which had three partitions — a 40GB windows boot partition (that I NEVER use), a small linux swap space, and a 150GB root ext3 partition. I had a 500GB usb drive attached to it that I used as a backup (more on that soon). I took the 500GB drive and put it in the media center PC running Windows Vista Ultimate 64-bit. I’m planning on downgrading that to 32-bit soon, because the 3D software I have won’t run 3D Frame Sequential movies out of 64-bit, although games and everything else work. Weird. Still, for now that’s where we’re at.

Before I get into the nitty gritty, I wanted to touch on that last point too — cleaning up the webserver. During the migration (or mock recovery, depending on what you want to call it), I realized that I’m hosting six websites on my poor little desktop and home network. That’s not counting the two that I still have files for which have moved off of my personal network. This site (chiplynch.com) has been running in some form or another for over a decade! So you can imagine what sort of junk files I’ve piled up on the website. Layers of manual backups during upgrades and site changes, temporary files and remnants of old site reorganizations. It’s kind of nostalgic to go back through the archives.

For anyone still reading, unless you have a specific problem that google brought you here to fix, or you’re a very geeky and close personal friend, I’m guessing that there’s really no benefit in reading any further. Consider yourself warned.

Ok, so, here’s what we did. I document it here because A) I need something to blog about B) it might be useful to me again in the future when I do something similar and C) it might be useful to you, dear reader, if you want to know what NOT to do when backing up a system.

First, I downloaded VirtualBox and installed it on the Vista PC. Downloading the latest ubuntu image and installing a virtual instance of it was pretty simple. The only thing that wasn’t blatantly obvious out of the box was the networking. The latest VirtualBox (3.1.2 at this point) claims that Bridged networking is easy — “To enable bridged networking, all you need to do is to open the Settings dialog of
a virtual machine, go to the “Network” page and select “Bridged network””

Sounds easy, right? Well… it was, I think. I’m not sure. It just didn’t WORK the first few times I booted the linux box. Rebooted windows a couple of times, I don’t know… eventually it just worked. Do I know why? No. But I don’t think I did anything, I think it’s just weird. If you get frustrated by it. Be frustrated. But keep poking, it will eventually work out. Things that did NOT help were: adding additional VirtualBox network adapters to windows, changing the network card type or anything else in the virtual machine, all of that.

I did change ubuntu’s networking to set a static IP address… change /etc/network from this:

# The primary network interface
auto eth0
iface eth0 inet dhcp

to this:

# The primary network interface
auto eth0
iface eth0 inet static
address 192.168.1.150
netmask 255.255.255.0
network 192.168.1.0
broadcast 192.168.1.255
gateway 192.168.1.1

or whatever makes sense for your network.

This is the same address the webserver had before, so my router knows where to forward legit traffic. Right now we’re only forwarding http,https, and ssh traffic. Security you know. Anyway.

Now we started to hit some snags. I hadn’t decided how to handle the hard drives at this point. The first thing I did was try to mount the linux backup on the Vista box. The wonderful people that write ext2fs make a windows drivers for ext type filesystems which I use regularly. Download, run, plug in the USB drive and, fail. I’m hitting the first error they mention on their troubleshooting page. Download their diagnostic tool and run:

C:\Users\chipmonkey\Downloads>mountdiag.exe e:

The volume has an Ext2/Ext3 file system, but the Ext2 IFS 1.11 software did not
mount it because there is at least one incompat feature flag set. The Ext2
IFS software does not implement:
* needs_recovery *
Here we have an Ext3 file system which has transactions left in its journal. A
pure Ext2 driver must not access such a volume which is in that state (to
prevent data loss!).
You may solve it by mounting it on Linux (which has a kernel with Ext3
support). Be sure that you cleanly dismount it, before you shutdown Linux.
After that the Ext2 IFS software should be able to access the volume.

Ugh.

Also, I’m having trouble getting VirtualBox to recognize the USB drive, so I can’t mount it in my virtual linux installation yet. Joy. I rip the poor HDD out of the USB case and install it in the second bay in the Vista box. Natch, I didn’t realize that this was my primary drive, so now it actually has boot priority over my Windows drive, but it’s not bootable, so no harm, just annoyance. Now to get linux to recognize the drive we do this:

vboxmanage internalcommands createrawvmdk -filename C:\users\chipmonkey\Documents\chiplynch.com\myext3.vmdk -rawdisk \\.\PhysicalDrive0

Added a new SATA controller to the virtual box and mounted the drive to /backup for now:
sudo mkdir /backup
sudo mount /dev/sda1 /backup

Life is good.

A clean shutdown or unmount now means that ext2fs can actually read the drive, but now I don’t need that since I have it mounted.

So, what have we actually backed up? Lots of stuff. But not everything.
The single biggest mistake I made was not completely backing up /var/www/* tree. Remember that. Go now. Back the whole thing up.

Why didn’t I? Well, a few reasons. First, all of my virtual websites have their own trees underneath that branch, and I was backing them up separately so that, in theory, I could transplant them easier when the time came. This was of no benefit. The backup would have been just as easy to navigate regardless. Next, I had one branch (the Gallery2 branch, which held ALL my photos), which was by and large the largest subdirectory, so I backed it up separately. It made sense at the time, really. In retrospect, here’s the important stuff that were not backed up:

  • WordPress Themes and uploads for all my virtual sites
  • manually created files for custom sites
  • flat file configuration files
  • robots.txt, google api keys, 404 replacement files, etc.
  • and old sites that were obsolete before I moved to the new backup
  • subversion files

What I DID back up successfully was:

  • Mysql
  • the /etc/ tree
  • the /var/logs/ tree
  • my massive photo archive for Gallery2

And even then I missed a few old databases that were never added to the backup. That brings me to lesson 2: Use a database backup utility that automatically backs up the whole empire, not just the databases you explicitly define. That way, when you add a new one, you won’t just forget to add it to the backup. I haven’t quite done this yet, but it’s on my next-steps list.

Where was I? Oh, right. I pulled what I could off of the backups. Recreated and populated the necessary mySQL databases:

mysqladmin -u root -p create wordpress
mysql -u root -p wordpress < wordpress.mysql.dmp
mysql -u root -p mysql < mysql.mysql.dmp
mysql -u root -p information_schema < information_schema.mysql.dmp

Of note here is that the last two databases are actually used to run mySQL. Is this good practice? Well for one thing MAKE SURE YOUR mySQL VERSION MATCHES!. This meant that my users, passwords, and permissions were intact, but I'm not sure it was the safest route. It certainly could have been trouble if I needed to merge into an established mySQL database or something similar. I need to research a better way (ideas are welcome).

The same issue occured with filesystem users. I have six, other than myself, for friends and family that use the server. The backup method I used was rsync to the usb attached device (which I ocassionally moved to a drive I had at work -- just in case of physical doom). This maintained permissions, but only if the /etc/passwd file was intact. I DID have /etc/ intact, but rather than muck about recovering the shadowed passwords and possibly adding system users that I didn't want, or which were obsolete, I simply was careful to add the six users back into the system with the correct user ids. my logon is 1000, joe's is 1001, etc. Also, to be sure, www-data and root remained the same user-ids. Noone else owned files in my world, thank goodness.

sudo useradd joe

and so on.

I left out things like mythtv, some listserv software, and all sorts of little things that wanted users that I’d had for years. It feels good to be clean.

I moved the files from /etc/apache2/sites-available/ and ../sites-enabled/ — I had to recreate some /var/www/ subdirectories which had been lost since I failed to back them up, but apache2 came up with only a few warnings.
Mon Jan 11 23:25:26 2010] [error] VirtualHost *:80 -- mixing * ports and non-* ports with a NameVirtualHost address is not supported, proceeding with undefined results

I’m not really stressing this just now, as the webfolk seem to think it’s not a problem so long as the sites work, which they basically did.

I moved the gallery2 files from the backup to the www root folder and decided that was all I could get out of the backup. I rebooted for good measure, started browsing the website, and this was the first time I realized what it was that I’d missed. Panic.

I could log on to the sites that were mySQL based (wordpress and gallery2 sites), but in some cases I had password issues. This was easy:

echo -n iamthebest | md5sum
a6a7c0ce5a93f77cf3be0980da5f7da3
sudo mysql -u root -p wordpress
update wp_users set user_pass="a6a7c0ce5a93f77cf3be0980da5f7da3" where id=1;

no, that’s not my password, try it if you must

I did realize that I should make myself an admin on all of the virtual sites I host — we’re all friends here, and I have root anyway, but that makes life so much easier.

I still had to fix wordpress/gallery2 integration, and other things. For one thing, the wp-config.php file in wordpress that I’d been carrying around for years is way outdated. The new installation installs a much cleaner one, which is kind of nice. I did have to change one thing, though — my sites were displaying weird characters such as “”, so I disabled the forced utf8 charset by commenting out the line:

define('DB_CHARSET', 'utf8');

in the wp-config.php file which results in:
// define(‘DB_CHARSET’, ‘utf8’);
although my original site (the lone backup) is still running the old config file. I should probably change that.

I umount-ed the ext3 drive, and remounted it to /home (which was empty thus far), and added it to the /etc/fstab file:

/dev/sda1	/home	ext3 nodev,nosuid 0 	2

It was at this point that I made the first move towards fixing the backup in the future. I created /home/www and /home/backup instead of leaving them where they were. This was mainly for space and performance reasons — the 500GB drive was clearly where I wanted everything, compared to the 20GB virtual drive. As long as I kept a separate 500GB backup drive with at least enough space for the /home files and the VirtualBox virtual drive file, I could back up everything with relative ease. Leaving the /etc tree virtual meant that as long as I had that snapshot file, permissions, users, passwords, configs, and whatnot wouldn’t be a problem next time I had to restore (disaster notwithstanding).

The main sites are up, but ugly, since I lost the themes. The smaller handcrafted sites are just GONE. Fortunately, this was a mock recovery, so I still have the original root disk. I have the same ext2fs problem, so I actually BOOTED off of the old root (which I usb Attached in place of the 500GB backup drive), and shut down cleanly. Then mounted with ext2fs and rsync’ed everything to the new virtual box.

sudo rsync -av /var/www/files chip@www.chiplynch.com:/home/wwwrsync/

The only drawback here is that the ownerships and permissions are lost, since I’m logging in as “chip”, and since I’m ext2fs mounted instead of linux mounted (since Windows doesn’t understand the users involved). I need to find a better way to fix this, but I think the main solution would just be to perform the recovery locally, on linux, logged in as a root console or at least a sudo user rather than over an rsync or ssh login.

So, ugh. We’re all back up and I think much happier for it. I’m crafting a better backup strategy now (remember that I still have the original drive, so if anything goes wrong terribly soon I’m still OK). We’re faster, cleaner, have more breathing room, and are using less physical hardware. The whole ordeal really only took about 8 hours, and a lot of that was research or manually deciding what to keep and what to purge. There were only a couple of real bumps that wasted time.

Back to my day job.

2 Comments on “Virtualizing and Disaster Testing a Linux Server

  1. Ok, I read on from the point you told people ot stop reading, but I forgot what that makes me.

  2. Sorry Eric… I know, it was long. I’m still finding things I forgot to recover. I hadn’t installed imagemagick (sudo apt-get install imagemagick) so gallery was’t generating thumbnails, nor mod-rewrite (sudo ln -s /etc/apache2/mods-available/rewrite.load /etc/apache2/mods-enabled/rewrite.load) which meant wordpress couldn’t display comment threads and single articles.

    All better now.

Leave a Reply

Your email address will not be published. Required fields are marked *