Recently, I asked my subscribers what topics are interesting to them and a few people mentioned observability.
That’s funny, ‘coz yesterday I accidentally bumped into a great series of articles on setting SLAs for your products by Alex Ewerlöf!
- Calculating composite SLA - truly outstanding read!
- Some practical advice when setting SLA - notice, it says SLA, not SLO. So, there are some business related tips in this article as well. However, the core is technical, ofc.
- Calculating the SLA of a system behind a CDN - I haven’t read this one yet. But given the quality of previous two, I expect this one be great as well!
tl;dr for the first article in the list:
I would personally also add that when you try to set a “full” SLO(A) for your service, that is also a composite SLO(A). You should treat it as a serial. For example, if you have 99.8% error rate SLO and 99.1% latency SLO, an “overall” SLO would be 0.998 0.991 100% = 98.9%
That’s not only good to know, but you may also want to write your marketing materials differently. There is a difference between:
> We guarantee 99.8% SLO on 5th error rate and 99.1% SLO on requests not taking longer than X milliseconds.
And
> We guarantee the 98.9% availability of our systems.
I’m not a marketing person, though. I don’t know what’s better. What I do know is that:”Nines doesn’t matter, if your users are unhappy”.
#observability #slo #sla
That’s funny, ‘coz yesterday I accidentally bumped into a great series of articles on setting SLAs for your products by Alex Ewerlöf!
- Calculating composite SLA - truly outstanding read!
- Some practical advice when setting SLA - notice, it says SLA, not SLO. So, there are some business related tips in this article as well. However, the core is technical, ofc.
- Calculating the SLA of a system behind a CDN - I haven’t read this one yet. But given the quality of previous two, I expect this one be great as well!
tl;dr for the first article in the list:
for serial, multiply availability; For parallels, multiply unavailability
I would personally also add that when you try to set a “full” SLO(A) for your service, that is also a composite SLO(A). You should treat it as a serial. For example, if you have 99.8% error rate SLO and 99.1% latency SLO, an “overall” SLO would be 0.998 0.991 100% = 98.9%
That’s not only good to know, but you may also want to write your marketing materials differently. There is a difference between:
> We guarantee 99.8% SLO on 5th error rate and 99.1% SLO on requests not taking longer than X milliseconds.
And
> We guarantee the 98.9% availability of our systems.
I’m not a marketing person, though. I don’t know what’s better. What I do know is that:”Nines doesn’t matter, if your users are unhappy”.
#observability #slo #sla
Medium
Calculating composite SLA
How to serial and parallel dependencies affect the total SLA
An “Awesome SLOs” list.
Books, articles, videos, and more.
Also, it’s open source, so feel free to contribute!
#slo #observability
Books, articles, videos, and more.
Also, it’s open source, so feel free to contribute!
#slo #observability
GitHub
GitHub - stevexuereb/awesome-slo: Curated list of resources on SLOs
Curated list of resources on SLOs. Contribute to stevexuereb/awesome-slo development by creating an account on GitHub.