Tag: virtualization

2009-03-25 21:47 UTC The Home Cloud

My home server is now fully virtualized. When I ssh home I reach "Outland", an OpenVz container. Outland has limited access to the rest of my home network, but isn't completely isolated. I've taken the pragmatic view that allowing some access further "in" is ok too, though there's a two step process (ssh in to outland, and then on to whatever else), and only stuff that's running on Outland is visible to the outside.
While there are some security benefits (Outland doesn't hold anything important, so you have to penetrate both Outland and a server beyond it to get at anything really interesting, like my boring holiday photos), my main motivation was maintenance and flexibility after having slowly deployed OpenVz more and more across my servers at Aardvark Media and seen just how much we've benefited from being able to seamlessly transition services between hardware etc. to make hardware upgrades and optimal use of our hardware easier.

Whenever I've upgraded my home servers in the past I've dragged a lot of cruft with me and had a real painful process recreating my setup, or I've wiped everything and just moved over my data.

With OpenVZ being the only thing that's now allowed to run directly on the host, I can subdivide my "workspace" much more making more regular partial upgrades much easier, and decoupling upgrades a lot. I can now upgrade "service by service". I'm never looking back.

Outland holds various junk I want easily accessible via ssh. It's a "scratchpad" of sorts that I expect to exist semi-permanently, but that there won't be any big consequences from wiping clean now and again. Another container is used for backups and holds snapshots of the other containers, config files on the host, as well as my blog and various other externally hosted stuff. Other containers holds my development projects, or act as short lived scratchpads to prevent me from messing up anything I want to keep. Anything I want to keep for the long term is isolated into containers holding databases and my Git repositories (and a legacy SVN repository or two) and raw files.

I opted for Debian Lenny on the host, partly because I now use mostly Debian at work, and partly because Lenny has great support for OpenVz - install is as simple as "apt-get install linux-image-openvz-amd64" (assuming a 64 bit machine) and a reboot.

Creating a new one is a command or two (vzctl create [id] --ostemplate=...; vzctl set [id] --name ... --hostname ... --ipadd ... --save), and creating scripts to customize the templates is easy enough (for the most part a list of packages to install, and a set of config files to overwrite or add things to) so that I can rapidly set up customized containers for my Ruby projects, for example.

I love the level of isolation I can get between various work, in particular because it makes testing dependencies trivial - I can rebuild the containers from scratch at a moments notice, check out the code and run my tests, and know that the only thing I've "lost" will have been accumulated cruft, such as packages I don't need or files I'd strewn all over the place.

The thought is that individual containers should be ok to kill at almost any time. The "scratchpads" because they don't hold any important data; the dev containers because they'll contain only clones of Git repositories, and I have scripts to rebuild them from scratch; the database and file containers because their data lives in a couple of well defined directories that can be trivially copied out to be moved into replacement containers (and there's backups)

The end result is my personal "cloud" where server instances are ephemeral and will perhaps in some cases be provisioned and destroyed automatically (see Rebuilding the build server on every build), but their functions are persistent. 

Add sufficient automated tests into the mix, and I can bring up a new container from scratch with a new OS version for example, copy data into it, run my tests, and wipe the old and be done with it (though the paranoia in me makes me keep backup snapshots for a while) with reasonable confidence that it forces me to keep the scripts that effectively define each "service" in my home cloud up to date. It also leaves me with the option of trivially farming these images out to hosted servers if I should want to in the future (an option which is more relevant when doing this at work)

Whenever I get new hardware these days, whether at home or at work, the process I follow is thus always to install OpenVz or another virtualization technology first (OpenVZ uses a shared kernel, so it can only run Linux containers, and only on the same kernel as the host, which is a deal-killer in some cases, but also makes it extremely lightweight for the cases where those limitations are ok - nothing stops you from running OpenVz and KVM or Xen on the same box to get the best of both on a container-by-container basis) - everything else belongs in a container.


2008-05-24 15:33 UTC OpenVZ and Apache troubleshooting: PRNG still contains insufficient entropy!

I was setting up Apache on OpenVZ earlier today, and ran into a problem with enabling SSL. Apache would refuse to start, and I'd see this in the error log:

[Sat May 24 07:48:10 2008] [warn] Init: PRNG still contains insufficient entropy!
[Sat May 24 07:48:10 2008] [error] Init: Failed to generate temporary 512 bit RSA private keyConfiguration Failed

The solution is quite simple, though not very intuitive. On the host do this (replace "100" with the name or id of your OpenVZ container):

vzctl set 100 --devices c:1:8:rw --save
vzctl exec 100 mknod /dev/random c 1 8
vzctl set 100 --devices c:1:9:rw --save
vzctl exec 100 mknod /dev/urandom c 1 9

Apache's SSL support requires /dev/random and /dev/urandom to seed the PRNG. Note that if only /dev/urandom is missing, Apache may seem to start, but eat all CPU. If you attach "strace" to it, you may see it spin over attempting to open /dev/urandom over and over.



2008-04-16 23:29 UTC OpenVz, /proc/user_beancounters and tcpsndbuf

If you're using OpenVz, you owe it to yourself to take a look at /proc/user_beancounters every now and again. Just today at work we were having bizarre problems with one of the containers:

Everything would seemingly work fine. Then the load would spike from 0.something to 150+ in less than a minute. Even weirder, when I tried strace'ing some of the processes that seemed to hog the CPU, things looked normal. Imagine my surprise when ran top again, and the offending processes suddenly didn't use much CPU anymore - each time I strace'd a process, it's CPU usage would drop.

user_beancounters to the rescue:

# more /proc/user_beancounters

Version: 2.5 uid resource held maxheld barrier limit failcnt [snipped other values] tcpsndbuf 663088 3190632 4000000 4000000 90686

Uh-oh. The failcnt (in red) was huge (This output is actually from AFTER the problem was fixed - that's why failcnt is large even though "maxheld" which shows the maximum value reached is lower than barrier and limit). failcnt shows when the limits have been reached, and so in a properly running and setup system it should remain fairly stable, though not necessarily at 0 (the odd massive spike in "something" could cause an error here or there - things can't always be perfect). In this case the failcnt shot up each time the problem occurred - we observed the "held" value hitting the limit once to make sure it was the cause before increasing it:

vzctl set 204 --save --tcpsndbuf 4000000:4000000

The line above was our final change - we increased it gradually, but each time it kept going up until it got dangerously close to the new limit. "maxheld" shows how high it eventually got.

This problem is a nasty one - the processes did not handle hitting the tcpsndbuf limit very well, and so if one process hit it, the container started spinning out of control. Everything slowly ground to a halt as Apache handled connections slower, causing more simultaneous servers to be returning data, requiring more buffers, causing more of them to lock up as they were unable to complete, causing memory usage to spin out of control too, to the point where I couldn't even get into the container.

Luckily for us recovery is one of the areas where OpenVz really shines, though. Because the processes of the containers are visible in the host as normal processes, I could kill enough of the runaway processes from the host to get back in, restart Apache and clear things up, and then proceed to analyze and fix the problem without even having to consider restarting the whole container. Setting limits judiciously to make sure the host will always have enough resources is of course key to this.

Another benefit of this is that you can do certain health monitoring from the "outside" that would be impossible to reliably do with a monitoring process running inside the container itself - from the host I could still look at what limits where being hit etc. at the same time as the container was so loaded and messed up that Nagios etc. on the container itself couldn't run (nor send data out over TCP - it would hang because of the problem we were facing in the first place)

In this specific case the container was a "legacy" container that holds a lot of different stuff for different clients that used to run directly on a single host. Virtualizing it let us migrate the sites to a new host with no downtime as well as keep in-sync copies on other hosts for near immediate takeover on failure while we work to migrate those clients to an even more failure tolerant cluster.

Through some brutal routing tricks (excessive abuse of arp pings and temporary address rewrites with iptables on our firewall to ensure clients got to the new host immediately) we actually migrated these services off the physical host and into a container with no site downtime.

The fact there's so much stuff on it made not having to do a reboot quite essential. Even then, "rebooting" OpenVz containers is blindingly fast, since the container doesn't need to go through the typical boot motions of checking hardware, checking filesystems etc. that a physical box or even Xen usually would.

But that's a digression. The morale of this story is that if you see "weird stuff" happening to your OpenVz containers, /proc/user_beancounters should be one, if not THE, first place(s) you look.



2008-04-14 01:23 UTC Rebuilding the build server on every build

At Edgeio we had a very complex software system. It was NOT a typical small little LAMP setup or little Rails setup. The web app was the simple piece. The complex pieces were things like processing 3 million+ blog feeds every days, and building a search engine and getting it to scale. As a result, our app consisted of many dozens of RPMs of Edgeio written code alone.

We used RPM to build packages of almost everything, and that worked well.

But there was one thing that was exceedingly painful, and that was managing build dependencies. We got extremely dependent on having certain build machines that had, through accretion, everything needed to build everything we had on them. We got those dependencies through being careless. A large subset of RPMs had detailed build requirements specified in the RPM spec files, but rarely were the RPMs tested on pristine machines or machines with massively different configurations in order to verify that we had not missed any out. A lot of undocumented dependencies slowly crept in.

We didn't automate all our builds, and that's certainly one of the things we should have, but in retrospect one of the things I've come to realize is that automating the builds would have helped us far less than regularly rebuilding the build servers to help us catch missing build dependencies on packages all the servers we used had gotten installed at one point or the other.

The hardest part about building the Edgeio packages from scratch would not be random breakage in specific packages - most packages were rebuilt regularly, and so any incidental breakage would be limited, and the packages that weren't were obsolete.

The hardest part would be to ensure the build environment, which included specific versions of RPMs from RHEL4 combined with literally dozens of third party RPMs, was correct, and then to correctly order the build of and install at least another dozen or so of Edgeio RPMs that wouldn't break the build if they weren't at the very latest version but had to be there in order to build the rest.

Automating builds does only so much good if you don't keep meticulous track of what it takes to get a successful build, and that extends far beyond being able to type "make" and watch your tree sucessfully build - you need to know what dependencies you have on external packages too, including packages provided by your distribution.

The third-party dependencies were by far the worst, especially as the system aged and finding updates that would work with everything else we depended on got harder without massive updates across the board. A lot of tweaking went into keeping those build machines "just right".

The great thing is that solving this is something that's gotten massively easier over the last couple of years:

Enter virtualization

For my next big system I've set myself a very clear goal:

The build server should be automatically rebuilt on every full build.

That means:

  • Create an OpenVZ template containing a well defined set of packages, and document every step (whether it's OpenVz, Xen, VMWare or who knows what is incidental)
  • On each full rebuild, I will make the system create a new container from scratch (wiping out the old one, or maybe keeping a rotating set of copies), and start it.
  • Then the build system will enter the container, check out the trunk and run a checked in script to add any additional dependencies.
  • It will create a dump of the system as it is, so that any manual builds can be done on todays pristine build server image.
  • Then it will proceed with any automated builds.

The benefit is simple:

You know 100% how to re-create your build environment from scratch. Exactly which packages, and exactly which configuration, and exactly which order. You can easily re-test builds with different package subsets or an upgraded distro, just by swapping out the template you use.

You may think you have that with daily builds, but do you? How often do you do a rebuild on a newly installed machine? Do you know exactly what versions of which libraries must be installed for your builds to succeed? What if someone runs an update on your build machine?

Incidentally, one of the things I've learned from working with OpenVz is how simple a setup like the above can be made, and the complexity of the dependencies we had at Edgeio really drove home the value of being able to rebuild the build server from scratch far more than the need for automatic full rebuilds of our in-house software. The incremental cost is minimal - I can "build" an OpenVz server manually from a template in minutes including all configuration, and automatically in seconds on a suitable server - and the benefit is a significant extra peace of mind.



2008-04-07 14:17 UTC Joys of virtualization

Posted in: , ,
There is something oddly pleasurable in a zen kind of way to be able to log into an OpenVz container, and see a process list like this:

root@ldap:~# pstree  
init-+-apache2---apache2
     |-cron
     |-slapd---3*[{slapd}]
     |-syslogd
     `-vzctl---bash---pstree

This is everything that's running in our new OpenLDAP container at work, including Apache for running a web based user interface, which I could have just as well put in a separate container, but since this is unreachable from outside our firewal I thought that would be unnecessary paranoia. I don't need sshd, since I can always enter from the host node, and the only reason vzctl shows up is because I'm doing just that at the moment. The nice thing about it is that I can tar up the entire root fs of the container and replicate it wherever I want, or for that matter migrate it while it's running to somewhere else. And of course more isolation is always nice.

I prefer OpenVz to Xen mainly because it's easier to work with - the root filesystems for the containers are just ordinary directories on the host etc. Of course it comes at the cost of some reduced flexibility, such as the ability to run different kernels in the containers or the host. But Xen and OpenVz can co-exist on the same machine, so you can mix and match as desired.



Older Entries

About me

E-mail: vidar@hokstad.com Skype: vhokstad
Twitter: vhokstad
View my LinkedIn profile.

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married and we just had our first child, Tristan Ikemefuna Hokstad.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Twitter Updates

    follow me on Twitter