At Edgeio we had a very complex software system. It was NOT a typical small little LAMP setup or little Rails setup. The web app was the simple piece. The complex pieces were things like processing 3 million+ blog feeds every days, and building a search engine and getting it to scale. As a result, our app consisted of many dozens of RPMs of Edgeio written code alone.
We used RPM to build packages of almost everything, and that worked well.
But there was one thing that was exceedingly painful, and that was managing build dependencies. We got extremely dependent on having certain build machines that had, through accretion, everything needed to build everything we had on them. We got those dependencies through being careless. A large subset of RPMs had detailed build requirements specified in the RPM spec files, but rarely were the RPMs tested on pristine machines or machines with massively different configurations in order to verify that we had not missed any out. A lot of undocumented dependencies slowly crept in.
We didn't automate all our builds, and that's certainly one of the things we should have, but in retrospect one of the things I've come to realize is that automating the builds would have helped us far less than regularly rebuilding the build servers to help us catch missing build dependencies on packages all the servers we used had gotten installed at one point or the other.
The hardest part about building the Edgeio packages from scratch would not be random breakage in specific packages - most packages were rebuilt regularly, and so any incidental breakage would be limited, and the packages that weren't were obsolete.
The hardest part would be to ensure the build environment, which included specific versions of RPMs from RHEL4 combined with literally dozens of third party RPMs, was correct, and then to correctly order the build of and install at least another dozen or so of Edgeio RPMs that wouldn't break the build if they weren't at the very latest version but had to be there in order to build the rest.
Automating builds does only so much good if you don't keep meticulous track of what it takes to get a successful build, and that extends far beyond being able to type "make" and watch your tree sucessfully build - you need to know what dependencies you have on external packages too, including packages provided by your distribution.
The third-party dependencies were by far the worst, especially as the system aged and finding updates that would work with everything else we depended on got harder without massive updates across the board. A lot of tweaking went into keeping those build machines "just right".
The great thing is that solving this is something that's gotten massively easier over the last couple of years:
For my next big system I've set myself a very clear goal:
The build server should be automatically rebuilt on every full build.
Create an OpenVZ template containing a well defined set of packages, and document every step (whether it's OpenVz, Xen, VMWare or who knows what is incidental)
On each full rebuild, I will make the system create a new container from scratch (wiping out the old one, or maybe keeping a rotating set of copies), and start it.
Then the build system will enter the container, check out the trunk and run a checked in script to add any additional dependencies.
It will create a dump of the system as it is, so that any manual builds can be done on todays pristine build server image.
Then it will proceed with any automated builds.
The benefit is simple:
You know 100% how to re-create your build environment from scratch. Exactly which packages, and exactly which configuration, and exactly which order. You can easily re-test builds with different package subsets or an upgraded distro, just by swapping out the template you use.
You may think you have that with daily builds, but do you? How often do you do a rebuild on a newly installed machine? Do you know exactly what versions of which libraries must be installed for your builds to succeed? What if someone runs an update on your build machine?
Incidentally, one of the things I've learned from working with OpenVz is how simple a setup like the above can be made, and the complexity of the dependencies we had at Edgeio really drove home the value of being able to rebuild the build server from scratch far more than the need for automatic full rebuilds of our in-house software. The incremental cost is minimal - I can "build" an OpenVz server manually from a template in minutes including all configuration, and automatically in seconds on a suitable server - and the benefit is a significant extra peace of mind.
I'm a Norwegian technologist in London who's been living in London since 2000.
I have a son, Tristan, born in 2009.
I've got development and management experience from a number of startups as
well as more established companies, after co-founding my first company at age 19.
I'm currently technical director at Aardvark Media.
I'm also on the board of SpatialQ, a startup in the GIS space, and an advisor to NotHotel