No, not this kind.

puppetchef.jpg

I’ve become interested in the new wave of configuration management tools like Puppet, Chef, Ansible, SaltStack… The list seems to go on and on.

My problem wasn’t just that I didn’t know how they work or even exactly what they are. My problem was that I could not even conceive of why you’d want such a thing. I’m getting the feeling that this is inconceivable to people hiring modern "devops" talent. How can this be that I know so much about so much but can’t get my head around what these tools really are good for? I think it would be a bit like telling Brian Kernighan how great STL map is and what a shame it is that C doesn’t have such a thing. Kernighan would send you to page 143 of K&R where in two and a half pages he completely implements such a thing in vanilla C. This doesn’t mean that either of those approaches are "correct" or even that one is better. It depends on who you are and what you’re doing. I think the most important thing for me to convey with this analogy is that if someone asked Brian Kernighan to troubleshoot some C++ STL, he probably would be extremely good at it. If he chose to be.

I’m obviously not so godlike as Kernighan but when it comes to the stuff that these configuration management systems claim to be able to do, I think to myself, hmm, but I can already do that. I wonder, why wouldn’t you just use Bash, SSH, Rsync, etc? Of course I’m open to learning not just stuff I don’t know but stuff I never knew I didn’t know. For this reason I’ve kept an eye on configuration management resources. Recently a very excellent resource presented an opportunity.

I had the chance to hear a talk by Christopher Webber, Engineering Lead at Chef. I then asked him some follow up questions in email which he was extremely gracious to reply to with thoughtful answers. I’ve added some additional comments in-line.

Question - Did I correctly interpret your musing about replacing LDAP as just that? Not using Chef to configure LDAP (oh god what a nightmare), but to configure /etc/passwd with your common set of users? I have to admit, I instantly saw the wisdom in that and no amount of complexity in the replacement system would be too discouraging in this particular case.

That is exactly what you heard. The biggest reason being that we needed a way to front the change with a form that a PI could add users. With LDAP this was great, except when it came time to create the homedir on the nfs box running zfs. By moving it to being data bag driven you could basically write a web service that had permission only to muck with those data bags and just update the data bag and then use push jobs to cause the cluster to do a chef run.

Note - I just did the exact process he describes for creating a user account this morning. Replacing LDAP is actually compelling enough. The configuration of the file server is a bonus.

Question - Next is a philosophical question. Is there an infinite regress inherent in software that configures configurations? I.e what configures the thing that configures that? Can you use Chef to configure some Chef servers? Another way to state the problem is that, "you have a configuration management problem; you will configure Chef to cure it; now you have two configuration management problems." (By the way, I’m also suspicious of programs that go through your logs and produce a condensed log file. And virtual machines running Java, i.e. another VM.)

I think this is inherently how we solve problems in our space. Everything is an abstraction layer, built upon an abstraction layer. At some point, we just have to excuse the fact that we are moving up the stack and there are parts of the stack that matter less. For example, early systems programmers spent time organizing where on disk a given block lived for performance reasons… could you imagine caring about block placement on a multi TB disk?

Note - This is a fair answer though I think there is a better one. For instance, why must we run software through a web browser when we have a perfectly good computer that should be able to run software natively? This isn’t about abstraction layers. This is about the computing landscape left in the wake of a certain predatory OS vendor. At some point it’s just easier to run things in a web browser or even a JVM than fight for sane cross platform run environment standards. In other words, there may not be a good reason to do this, but there may be compelling bad reasons. In the case of Chef, the bad reason to go one turtle down may simply be that it’s a chance to have a clean and sane turtle.

Question - When I think of configuration management, I think of dpkg and rpm and their gateway to infinite regress, apt and yum. Why not just create custom rpms that do what needs to be done? I’m not saying this is fun or easy or that I’m especially good at it, but as a way to understand what extra utility something like Chef provides it might be helpful to distinguish these strategies. If I was a huge operation like Facebook, I’d consider this architecture since it adds no more moving parts and is as secure and operable as anything already going on. What does Chef offer over custom packages in custom repos?

There is a huge discussion here that isn’t easily had in email. the TL;DR is that it has been tried and some places have had success but most have not. The real reason for me is the same reason I moved to Chef… I want a programing language where I can treat my node as an object and mutate state through that language.

Note - This is a very helpful answer which I am summarizing in my mind as "Whatever you’re most comfortable with will probably be reasonable." It’s very helpful to me to know that my thinking isn’t completely off track though. I understand that Chef is probably more manageable to get elaborate with than rpms or debs or ebuilds, etc. And if you need to support diverse platforms Chef would be extremely helpful.

Question - I also think about rc-config (in Gentoo) when I think about configuration management. That tool isn’t delightful, but it does a superb job of highlighting the complete quagmire of special cases and annoying configuration idiosyncrasies of one’s setup. Is Chef really worthwhile when all of your set up is comprised of special weird things? Or put another way, rc-config is actually doing a good job even though it seems like a huge pain because no matter how you solve the problem, it’s going to be a huge pain.

It is even more useful when you have a collection of snowflakes. The reason being is now, instead of jumping on the box and going WTF, how does this even work? You can instead list the run-list, go look at the source code and go, oh, that is what is supposed to be installed on this box and this is how it is supposed to be configured. The other thing is that it helps you DRY (Don’t Repeat Yourself) up things that are shared, like ssh config, ntp config, etc.

Note - This affirms my belief that the more systems you’re worrying about, the better Chef is. I guess the centralizing tendency of Chef is not really any different conceptually from my centralized notes about all my machines. A good thing. And massively preferable to no policy documentation anywhere about anything. Ya, that’s crazy.

Question - Another thing I think about when I think of configuration management is the distro’s overly helpful contribution. Take Apache as an example. Where do you "configure" that? Well, off hand I can think of four potential strategies that are not simply the canonical /etc/apache/httpd.conf: 1. the baroque but heavy duty config file layout of Gentoo, 2. the baroque config file layout of Ubuntu, 3. the baroque config file layout of Red Hat, and 4. the reasonable but not as simple as you’d hoped layout of Apache installed from source. They’re all different. With Chef, do I have to detangle that mess and figure out what my distro is including where and make a master comprehensive httd.conf? Oh, and then figure out a way to turn off all the distro included stuff? If I was keen to do that, I’d definitely have done it already because it sounds really great.

So this is a long running battle. I can point to example after example where the distros make this a mess. We basically see two major approaches… the apache2 cookbook approach where you just decide on a way and go with it, to hell with disto conventions. the opposite approach is the one taken by the httpd cookbook which uses the distro way of doing things and then abstracts it away so that you just interface with resources in chef.

In the end, yes, you the operator have to deal with writing the abstractions because, yeah, computers.

Note - Honest answer. An annoying problem for everyone. I imagine Chef has some community resources where some other people have done the detangling. That would be appealing.

Question - Ironically, the best use cases I could come up with for something like Chef is in my home, not at work. That /etc/passwd stuff is indeed quite clever (though god help you if you change your password on a Mac without the GUI and the magic keychain "feature" shuts you down; I can just imagine the laughs we’d all have dropping in a new /etc/passwd or OSX equivalent). My problem at home is that I have lots of different distros coming and going. I’m pretty good at getting my desired environment up quickly but I’ve got an end of life Mint on one of my computers, Debian on another, Xubuntu on my son’s, OSX on my wife’s, unpatched ancient Gentoo on the ancient video playing computer, Ubuntu on my laptop, etc. I’m having a hard time imagining something besides my extensive expertise magically sorting out that mess. Should I be more optimistic?

First off, I think the user provider on OS X handles the magic you speak of. And the answer is that it is your expertise that sorts it out and uses chef to document and execute it. Chef doesn’t make it so you don’t have to do configuration management, it instead gives you a framework by which you can capture your expertise.

Note - Probably the most clear and lucid description of what Chef does that other tools do not do. Of course I keep extensive consistent and centralized notes (which, to my horror, not everyone does) but Chef makes that official like a Makefile makes dependency policy official.

Question - At work I manage a cluster and I use PXE. I can reboot a machine (remotely) and have it reinstall a new CentOS (automatically) just as I’ve configured it. Once the machines are up I have my own Bash scripts that will run commands on all nodes or copy files to all of them (or GNU Parallel). What does Chef offer over that kind of approach?

So in the past, what I have done is that I used PXE to get the box up and joined to the CM system (it was puppet at the time but chef is no different) and then get the config on the box through that. From there, what I needed to do would dictate the commands I used. If I am making a config change, I would put that in Chef. If I was just executing a command for the cluster I would use knife ssh. The cool part is that with knife ssh you can pass it a search parameter as well. So lets say I assigned the role of highmem to a set of boxes that were used for a queue dedicated to memory intensive jobs, I could run knife ssh ‘role:highmem’ <command> and it would run across all those nodes. IIRC it executes in parallel by default.

Note - I’m going to assume this works great, just like my scripts. If you didn’t already invent this wheel on your own, Chef is probably a good place to start.

Question - On this cluster I’ve been thinking of moving some particular science data, say 10G, to each node so that there are always local copies of it available. Currently there is a NFS server and contention can arise. I’ve been thinking of ways to politely, but quickly, get that data out to the nodes and keep it synchronized. A sequential Rsync would work but would be slow. A parallel Rsync would kill the file server. I’ve been thinking about an overly-clever binary cell division approach where each node that gets the data spreads it to other nodes. Then we’re just worrying about the fabric of the switch which I don’t care about. Can Chef help with this kind problem given my concern for the file server’s overuse?

I would probably use something like serf or consul to register nodes and handle how many systems could pull down the file at once. This is a great example of where chef probably isn’t a great fit because of the nature of the problem.

Note - Fair answer and great tips.

Thank you very much, Christopher Webber!