logotipo

Your resources are cattle

For a long time now, IT resources have been being treated like pets: named and individually risen, one by one, and mourned when broken. As we move to a cloud world this behaviour has been transported to contexts wherein that doesn’t even make sense any more. We will discuss this behaviour and analyse its impacts.

Contents

Origins

The analogy comparing servers to cattle firstly originated from a slide show presentation by Bill Baker, a Distinguished Engineer at Microsoft, who compared a scale up approach to regarding servers as pets: once sick, they would be treated back to health, whereas a scale out approach is analogous to treating them like cattle: if they get sick, you would them and replace them.

This analysis was actually directed at physical hardware running SQL Server 2012, and even though much has been said about this analogy in a cloud context, I still see it as a relevant discussion that could give developers some more peace of mind.

Implications

So the dichotomy is clear, what may not be so are the implications to each approach. Baker seemed to believe there are more advantages to the cattle approach — and I humbly agree.

The pets approach

Some people are put off by the idea of simply deleting and replacing their infrastructure as it fails because they think the work in doing so is higher than just taking good care of what you already have. Their environment may become too complex with too much software, customised settings and variables and do not get me started in all the integrations between services that have already been configured. In fact, it is exactly that complexity that makes the whole server harder to maintain. Think about all the services whose status you would need to check, should a fail occur, just to begin fixing the issue.

Even then, once the issue is found, the hassle of fixing will probably be greater than it seems at first since there is quite a large amount of overhead. Concurring software, conflicting versions, license requirements, just to name a few, are some of the aspects that may turn what was supposed to be a small fix into a review of the whole system.

There absolutely are use cases where the pet approach does make a lot of sense, databases being the main of such cases, since the data does need to be stored somewhere. Even then, if you can manage to separate data and software logic, that will probably result in an easier to maintain infrastructure.

The cattle approach

On the other hand, when we deal with cattle, we do not need to be concerned about the state of one specific specimen. When one of your servers is presenting issues, just terminate it and spin up a new one with all the right software, settings and environment variables guaranteed, in just one click.

It seems like a very straight-forward choice to me.

Letting go of your pets

Suppose I did manage to persuade the reader the cattle approach is worth a try, where does one even begin setting it up? Fear not, that is easier then it seems, all one needs is to understand the underlying infrastructure they want to work upon. I have prepared some overall guidelines for applications running in virtual machines, docker and Kubernetes.

Virtual machines on premisses

If the application you would like to migrate is currently being deployed on virtual machines, infra as code is the way to go. Examples of tools for that task than can be used on premises are Ansible and Puppet . With these, all the details needed to properly running the application are crystallized as text, meaning these configurations can be managed with Git.

Most infra as code tools are said to be declarative, which means that, in order to use them, one declares the desired state for the server, leaving the work on how to get to that state to be figured out by the IaC software itself — meaning once you have figured out the state needed for your application and written it as code, running the automation would be enough to obtain a new server with the exact same configurations — assuming you have the base VM with an clean-state operating system available.

Cloud virtual machines

The above-mentioned tools do require that a virtual server has already been provided in which to run the designated automation, there are others, however, which can connect to cloud providers such as Azure, GCP or AWS and automatically provision the underlying server when one is needed. Terraform and Chef are examples of such tools. Since we are analysing infra as code tools for the cloud, some service providers do have such tools, with the advantage they are specifically designed for their own environment — although one could say that is also their primary weakness.

For AWS, CloudFormation is worth considering. It allows the infrastructure to be modelled either as code or with a drag and drop designer; with the extra benefit of no concern for ordering being needed, since the software itself is able to figure out precedence among the desired components. For Microsoft Azure, Resource Manager is the CloudFormation counterpart.

If no extra network or storage is needed, an even easier way to have your servers ready to go is by setting their User Data on AWS — which basically is a shell script ran as root as soon as the VM has been provisioned, allowing updates, settings and variables to be set easily.

Caveats

The freedom from running around the clock to fix a prod server that went berserker does wonders for one’s peace of mind. It does not mean, however, one should not investigate the causes of the problem. It could be a stochastic error unlikely to happen again or could be some serious architectural issue demanding a proper fix.

Summary

  • Manually going through all the state of each individual server in order to find what caused an issue can be a hassle;
  • The approach of getting rid of the faulty server and replacing it with a new, correctly configured one, can save time and turn out less stressful;
  • Appropriate tooling for that purpose is available in different contexts for various scenarios.

I do hope this post was able to provoke some thought about the subject. How is your workflow? Are your resources being treated like pets? Is that the best way to do things for your use case? Let me know in the comments! 😀


Cover photo by Helena Lopes

Breno Beraldo

Just your average engineer-coder-gamer. Trying to catch up on my always growing backlog of personal projects and books to read. Firmly believes Python — not French — is the most romantic language.


Comments

Will be shown to all readers

Will remain private