Vidar Hokstad V2.0

Home Blog

Tag: openvz

2008-04-16 19:29 UTC OpenVz, /proc/user_beancounters and tcpsndbuf

If you're using OpenVz, you owe it to yourself to take a look at /proc/user_beancounters every now and again. Just today at work we were having bizarre problems with one of the containers:

Everything would seemingly work fine. Then the load would spike from 0.something to 150+ in less than a minute. Even weirder, when I tried strace'ing some of the processes that seemed to hog the CPU, things looked normal. Imagine my surprise when ran top again, and the offending processes suddenly didn't use much CPU anymore - each time I strace'd a process, it's CPU usage would drop.

user_beancounters to the rescue:

# more /proc/user_beancounters

Version: 2.5 uid resource held maxheld barrier limit failcnt [snipped other values] tcpsndbuf 663088 3190632 4000000 4000000 90686

Uh-oh. The failcnt (in red) was huge (This output is actually from AFTER the problem was fixed - that's why failcnt is large even though "maxheld" which shows the maximum value reached is lower than barrier and limit). failcnt shows when the limits have been reached, and so in a properly running and setup system it should remain fairly stable, though not necessarily at 0 (the odd massive spike in "something" could cause an error here or there - things can't always be perfect). In this case the failcnt shot up each time the problem occurred - we observed the "held" value hitting the limit once to make sure it was the cause before increasing it:

vzctl set 204 --save --tcpsndbuf 4000000:4000000

The line above was our final change - we increased it gradually, but each time it kept going up until it got dangerously close to the new limit. "maxheld" shows how high it eventually got.

This problem is a nasty one - the processes did not handle hitting the tcpsndbuf limit very well, and so if one process hit it, the container started spinning out of control. Everything slowly ground to a halt as Apache handled connections slower, causing more simultaneous servers to be returning data, requiring more buffers, causing more of them to lock up as they were unable to complete, causing memory usage to spin out of control too, to the point where I couldn't even get into the container.

Luckily for us recovery is one of the areas where OpenVz really shines, though. Because the processes of the containers are visible in the host as normal processes, I could kill enough of the runaway processes from the host to get back in, restart Apache and clear things up, and then proceed to analyze and fix the problem without even having to consider restarting the whole container. Setting limits judiciously to make sure the host will always have enough resources is of course key to this.

Another benefit of this is that you can do certain health monitoring from the "outside" that would be impossible to reliably do with a monitoring process running inside the container itself - from the host I could still look at what limits where being hit etc. at the same time as the container was so loaded and messed up that Nagios etc. on the container itself couldn't run (nor send data out over TCP - it would hang because of the problem we were facing in the first place)

In this specific case the container was a "legacy" container that holds a lot of different stuff for different clients that used to run directly on a single host. Virtualizing it let us migrate the sites to a new host with no downtime as well as keep in-sync copies on other hosts for near immediate takeover on failure while we work to migrate those clients to an even more failure tolerant cluster.

Through some brutal routing tricks (excessive abuse of arp pings and temporary address rewrites with iptables on our firewall to ensure clients got to the new host immediately) we actually migrated these services off the physical host and into a container with no site downtime.

The fact there's so much stuff on it made not having to do a reboot quite essential. Even then, "rebooting" OpenVz containers is blindingly fast, since the container doesn't need to go through the typical boot motions of checking hardware, checking filesystems etc. that a physical box or even Xen usually would.

But that's a digression. The morale of this story is that if you see "weird stuff" happening to your OpenVz containers, /proc/user_beancounters should be one, if not THE, first place(s) you look.


2008-04-13 21:23 UTC Rebuilding the build server on every build

At Edgeio we had a very complex software system. It was NOT a typical small little LAMP setup or little Rails setup. The web app was the simple piece. The complex pieces were things like processing 3 million+ blog feeds every days, and building a search engine and getting it to scale. As a result, our app consisted of many dozens of RPMs of Edgeio written code alone.

We used RPM to build packages of almost everything, and that worked well.

But there was one thing that was exceedingly painful, and that was managing build dependencies. We got extremely dependent on having certain build machines that had, through accretion, everything needed to build everything we had on them. We got those dependencies through being careless. A large subset of RPMs had detailed build requirements specified in the RPM spec files, but rarely were the RPMs tested on pristine machines or machines with massively different configurations in order to verify that we had not missed any out. A lot of undocumented dependencies slowly crept in.

We didn't automate all our builds, and that's certainly one of the things we should have, but in retrospect one of the things I've come to realize is that automating the builds would have helped us far less than regularly rebuilding the build servers to help us catch missing build dependencies on packages all the servers we used had gotten installed at one point or the other.

The hardest part about building the Edgeio packages from scratch would not be random breakage in specific packages - most packages were rebuilt regularly, and so any incidental breakage would be limited, and the packages that weren't were obsolete.

The hardest part would be to ensure the build environment, which included specific versions of RPMs from RHEL4 combined with literally dozens of third party RPMs, was correct, and then to correctly order the build of and install at least another dozen or so of Edgeio RPMs that wouldn't break the build if they weren't at the very latest version but had to be there in order to build the rest.

Automating builds does only so much good if you don't keep meticulous track of what it takes to get a successful build, and that extends far beyond being able to type "make" and watch your tree sucessfully build - you need to know what dependencies you have on external packages too, including packages provided by your distribution.

The third-party dependencies were by far the worst, especially as the system aged and finding updates that would work with everything else we depended on got harder without massive updates across the board. A lot of tweaking went into keeping those build machines "just right".

The great thing is that solving this is something that's gotten massively easier over the last couple of years:

Enter virtualization

For my next big system I've set myself a very clear goal:

The build server should be automatically rebuilt on every full build.

That means:

  • Create an OpenVZ template containing a well defined set of packages, and document every step (whether it's OpenVz, Xen, VMWare or who knows what is incidental)
  • On each full rebuild, I will make the system create a new container from scratch (wiping out the old one, or maybe keeping a rotating set of copies), and start it.
  • Then the build system will enter the container, check out the trunk and run a checked in script to add any additional dependencies.
  • It will create a dump of the system as it is, so that any manual builds can be done on todays pristine build server image.
  • Then it will proceed with any automated builds.

The benefit is simple:

You know 100% how to re-create your build environment from scratch. Exactly which packages, and exactly which configuration, and exactly which order. You can easily re-test builds with different package subsets or an upgraded distro, just by swapping out the template you use.

You may think you have that with daily builds, but do you? How often do you do a rebuild on a newly installed machine? Do you know exactly what versions of which libraries must be installed for your builds to succeed? What if someone runs an update on your build machine?

Incidentally, one of the things I've learned from working with OpenVz is how simple a setup like the above can be made, and the complexity of the dependencies we had at Edgeio really drove home the value of being able to rebuild the build server from scratch far more than the need for automatic full rebuilds of our in-house software. The incremental cost is minimal - I can "build" an OpenVz server manually from a template in minutes including all configuration, and automatically in seconds on a suitable server - and the benefit is a significant extra peace of mind.


2008-04-07 10:17 UTC Joys of virtualization

Posted in: , ,
There is something oddly pleasurable in a zen kind of way to be able to log into an OpenVz container, and see a process list like this:

root@ldap:~# pstree  
init-+-apache2---apache2
     |-cron
     |-slapd---3*[{slapd}]
     |-syslogd
     `-vzctl---bash---pstree

This is everything that's running in our new OpenLDAP container at work, including Apache for running a web based user interface, which I could have just as well put in a separate container, but since this is unreachable from outside our firewal I thought that would be unnecessary paranoia. I don't need sshd, since I can always enter from the host node, and the only reason vzctl shows up is because I'm doing just that at the moment. The nice thing about it is that I can tar up the entire root fs of the container and replicate it wherever I want, or for that matter migrate it while it's running to somewhere else. And of course more isolation is always nice.

I prefer OpenVz to Xen mainly because it's easier to work with - the root filesystems for the containers are just ordinary directories on the host etc. Of course it comes at the cost of some reduced flexibility, such as the ability to run different kernels in the containers or the host. But Xen and OpenVz can co-exist on the same machine, so you can mix and match as desired.


Older Entries

About me

E-mail: vidar@hokstad.com
Skype: vhokstad
View my LinkedIn profile

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Categories

StumbleUpon My link page

(Links I have stumbled and like)