Emojis are finally useful for more than sentiment or sass. Cloudsmith Engineering can now be raised in emergency situations via a single emoji - SOS.
And it’s been used for the first, and so far only, time.
At Cloudsmith, we pride ourselves on our reactive (and in many cases proactive) support. We architected and built the platform with monitoring in mind allowing us to provide transparency of internal operations to you and us. We utilize numerous tools to help us provide the best package logistics experience to our customers. All our customers, all the time. Statuspage, Datadog, Intercom, Slack are all critical components to the Cloudsmith Experience just as Cloudsmith is a critical component in your software lifecycle.
Up until a few months ago, we ran support on a fairly ad-hoc basis. We have a passionate team in Engineering willing to stay up late and get up early for the exigencies of the service.
Cloudsmith is a 24/7 operation with customers pulling packages and containers every second of every day. Uptime is incredibly important. System behavior is incredibly important.
However, it was time to formalize the support function. We evaluated a number of products but ended up going with Opsgenie (sidenote; some of the competitors missed out on a sale because of non-responsive support… all we wanted was a trial extension). We implemented a sane rota with primary and secondary levels and put it into force.
This had the nice side-effect that productivity in Engineering went up. Now, the team not on rota could relax from the burden of support and focus on building product. And those active on the rota could relax, talk to customers and fix bugs, or achieve small tasks in-between times. It was a win-win for everyone.
But we wanted more.
Now, we had the power and ability to raise and contact Engineering in off hours if we needed to. What if a customer, in distress, could do the same?
We added emoji support for all Ultra level customers via Slack. Our customers can now directly raise a Cloudsmith engineer if they feel the service is underpowered or having critical issues.
It’s worth noting that with great power comes great responsibility. So we run it like the challenge system in tennis; you get three challenges, if there is an issue your challenge counter remains; if it’s a false alarm you get docked. At least, we assume that’s the way it will work. I guess we’ll find out!
To date, our service uptime is 99.99% and the last two incidents were only degraded service, highlighted early by our monitoring and immediately managed by the team feverishly rerouting traffic across regions to prevent the vast majority of our customers and our customer’s customers from noticing anything was sub par.
Only once has the SOS emoji been used in anger. A Docker behavioral issue that was blocking pipeline builds. Our dutiful customer raised the alarm, we rolled back a change from the previous day, resolving the issue from bed to deployment within about 30 minutes.
We thanked them.
They still have three challenges remaining.
A secure, cloud-native repository in minutes. Sign up now.