DevOps Horror Stories: Worst Decision Ever
Cloudsmith Horror Blog III - Another frivolous story about making a poor choice and the aftermath of us trying to fix it.
A modern tech stack is filled with choices; as the tech market has exploded over the last ten years; there are solutions for nearly every conceivable problem. Software as a Service is now commonplace amongst companies large and small.
With so many options to choose from, it's easy to make bad choices. And if you do choose poorly, that choice could haunt you for years, decades, or even centuries (for all you immortals out there).
It is this simple truth that created such a nightmare for one particular company in the tale we are about to tell. But just to warn you: this story might give you a fright if you are easily haunted by poor tech choices so proceed at your own risk...
Our story begins in early 2016. The first Deadpool was taking over the cinema and remote-working wasn't the default.
This particular company - let’s call them Cloudsmith - was beginning to think about their deployment strategy for Cloudsmith and had some choices to make. They’d already selected AWS so it was between Puppet and Chef for configuration management. At the time, both tools were comparable and there wasn’t too much to distinguish between them. What did they choose, you ask? Chef for… well, fairly arbitrary reasons.
As they leaned into the cloud (via an early AWS OpsWorks) it became obvious they needed to move to a better Infrastructure as Code solution. Already in the Chef ecosystem; they opted for Chef Provisioning (now you can see where this is going?) over Hashicorp’s Terraform.
Terraform got better and better, driven by a great, responsive open source community providing timely, up-to-date releases supporting all the new bells and whistles getting released on AWS.
Chef Provisioning started to rot. The magic unraveled to reveal the monsters within.
Slow support for the latest and greatest began to affect the company's ability to keep up with the Jones’. Nodes started to fail, and through a perfect storm of deprecated technology, meant they could not simply bring them back up. It was both mind-boggling and very worrying.
Eventually, they mapped out a plan to take a month out of their engineering roadmap to move to Terraform, now the clear market leader. A month. That’s a month of invisible work to their customers. While you could argue that uptime and stability are visible, it’s pretty easy to argue that it’s not. It’s table stakes. Or should be. No matter what; it’s no new products, no new features, for an entire month.
A nightmare for any business, this became a haunting reality for the company.
But what else could they do? They took the hit in time and rebuilt their infrastructure stack using Terraform - replacing the months of work previously dedicated to Chef Provisioning (which has since gone to its grave).
Proceed with caution
Fast forward to the year 2020 and we know it was the right choice, at the right time but take it from us and proceed with caution when making similar decisions for your organization.
Choices like these can have ramifications for years. And they are sometimes not easy to unpick. (In fact, we’ve another one which is less impactful to the platform but equally annoying but for that story you’ll have to come back next year!)
We are acutely aware that our customers make the choice every day to store and distribute their assets using the Cloudsmith platform.
We believe in transparency. Trust. And an adaptive, well-thought-out, technology roadmap.
No matter what tool you’re evaluating, remember: choose wisely.