Cloud Computing: SLI + SLO + $ = SLA

first_imgThe terms of the Service Level Agreement (SLA) customers and vendors agree to usually promise uptimes to multiple nines. Beyond SLA’s are two other factors, SLO (Service Level Objective), and SLI (Service Level Indicator). Quality of Service (QoS) also factors into this equation. QoS relates to performance of a service. In terms of cloud computing, a major quality factor is latency, how fast can services be performed and delivered. “Cloud customers want strong, understandable promises (Service Level Objectives, or SLOs) that their applications will run reliably and with adequate performance, but cloud providers don’t want to offer them, because they are technically hard to meet in the face of arbitrary customer behavior and the hidden interactions brought about by statistical multiplexing of shared resources,” wrote Jeffrey Mogul and John Wilkes from Google in an ACM paper. Why are SLAs hard to specify? Mogul and Wilkes write that “creating an SLA seems simple: define one or more SLOs as predicates on clearly-defined measurements (Service Level Indicators, or SLIs), then have the business experts and lawyers agree on the consequences, and you have an SLA. Sadly, in our experience, SLOs are insanely hard to specify. Customers want different things, and they typically cannot describe what they want in terms that can be measured and in ways that a provider can feasibly commit to promising.” In most cases there are objectives (SLOs), like making sure there is uptime of, say 99.9% of time. That uptime is an indicator (SLI), a measurable. The SLA is written into the contract that states what the consequences are if the SLO isn’t achieved.center_img Cloud Computing is hard. Customers want to be reassured that their data is safe and will be available at any time. Mogul and Wilkes conclude that “perhaps the most important lesson we can learn from statistics, however, is humility — that the combination of unpredictable workloads, hard-to-model behavior of complex shared infrastructures, and the infeasibility of collecting all the necessary metrics means that certain kinds of SLOs are beyond our power to deliver, no matter how much we believe we need them.” Often agreeing to the definition of the metric behind the SLA is hard. What is uptime, for example, and what is the granularity of measurement — minute or hour?last_img