Pets and cattle of infrastructure

Pets and cattle of infrastructure

The metaphor is quite common in cloud for servers.

Gist:
We name pets and take care of them, nurse them. We ensure pets are in healthy state always and attend immediately if otherwise.

We number cattle. When one gets ill, we shoot them. We ensure new cattle spawns up, by autoscaling groups, etc.

Taking metaphor to do automations and ensuring we deal with situations gracefully, reduce the noise.

Let's say we're tagging our resources with "type:pet" and "type:cattle" and we've our automations/auto-healing mechanisms where we add scripts or runbooks, select a tag, it's applied to every instance with this tag.

With time, we'd want to have all of the resources as cattle and limit our usage of pets.

For example:

If disk space is above 90%, auto modify disk attached to the instance and add 15% more disk space, if tag is pet. If tag is cattle, kill the instance and rely on autoscaling group to launch a new one.