Framework / Domains / Optimize Cloud Usage & Cost / Workload Optimization
Analyze and optimize cloud resources to match specific usage patterns while ensuring that workloads operate efficiently, sustainably and generate sufficient business value for their cost.
Creating a workload optimization strategy
Managing workload optimizations
Understanding where opportunities have value
Workload Optimization is a set of practices that ensure that cloud resources are properly selected, correctly sized, only run when needed, appropriately configured, and highly utilized in order to meet all functional and non-functional requirements at the lowest cost and environmental impact. This work is primarily done by Engineering, using guidelines and strategies formed collaboratively with the FinOps, Product, and other personas.
Engineers should seek to ensure there is sufficient business value for the cloud costs associated with each type of resource being consumed. Because cloud systems are built iteratively, it is typical to observe resource utilization over time to ensure performance, availability or other quality metrics are met, and to adjust or modify resources which are over- or under-sized, or make other optimizations even for systems which are well-architected for the cloud.
There is a strong relationship between all of the Capabilities in the Optimize Cloud Usage & Cost Domain. Each of the Capabilities in this Domain work in different ways to optimize cloud value – by using commitment based discounts, rearchitecting, using or stopping the use of licenses or SaaS, providing guidance on cloud sustainability improvements, and optimizing the utilization and efficiency of the workloads that make up systems. Among all of these, Workload Optimization will likely be the most widely practiced, and have the most options.
Early in the FinOps practice, the FinOps Team will likely play a large role in identifying opportunities to optimize workloads, but over time Engineering will take on the primary responsibility for their cloud usage by seeking out ways to optimize, or better yet by building in optimization as much as possible as systems are being built. But, no matter how well built and efficient a system is when built, services in the cloud are constantly being added and modernized, and organizations must be prepared to continuously work to keep pace and maintain optimal performance and utilization. Engineering leadership is critical to establishing the cadence and highlighting the need to maintain optimization of workloads at the appropriate level.
A key way the FinOps team can support this is by developing a workload optimization strategy. This strategy can direct optimization work by highlighting which types of resources should be prioritized, setting thresholds for taking action so that time is not wasted on trivial improvements, defining target KPIs the organization wants to achieve, and creating guidelines for making the tradeoffs that come with optimization. Other capabilities in this Domain may have important inputs to this strategy in highlighting for Engineering where the organization supports (or plans to stop) using licensed software, when rearchitecting is preferred over resource optimization, how to prioritize resource optimization against rate optimization, or how to incorporate sustainability and carbon impact decisions into usage optimization decision-making. As noted, the strategy may also set Leadership’s expectations of how frequently and diligently optimization should be pursued by Engineering versus new feature development work.
Engineering teams, in collaboration with FinOps, Product, and Leadership, will use the Capabilities in the Understand Cloud Usage & Cost Domain to review workloads in their areas of responsibility. Determining utilization and identifying scaling or workload management opportunities may require access to utilization, performance, or observability data in addition to cloud usage and cost and carbon impact data. Engineering teams may focus their efforts on finding opportunities to optimize in different ways depending on factors like the system’s importance, time available to optimize, maturity of the application, or whether the workloads are production or non-production.
A wide range of options exist to optimize workloads in the cloud including:
Examine workloads carefully for longer-cycle periods of high utilization (e.g. higher utilization at month-end, or quarterly busy periods) and be cautious of workloads that have resource requirements for warranty or software performance reasons. Rightsizing typically requires recreating resources so this can involve system outages that should be carefully coordinated within the Engineering team.
There may be times when utilization may need to decrease and the extra expense incurred is worth the value the resources create. Or the opposite may be true and carbon and/or performance expectations can be lowered to improve cost.
For some resources, like storage, it may be necessary to estimate latent inefficiency in the stored data, and by extension the potential gross savings that can be realized by removing, or rightsizing, that inefficiency. Different data sets require tailored approaches. For example, highly compressible (yet uncompressed) data has relatively high latent inefficiency, whereas encrypted data has relatively low (or no) latent inefficiency. Data that is infrequently accessed but stored in a high cost, high performance storage class (or tier) also has relatively high latent inefficiency. Similarly, storage data housekeeping like optimizing data placement, implementing data compression techniques, and adopting tiered storage solutions. By reducing unnecessary data duplication and implementing energy-efficient storage infrastructure, organizations can minimize their carbon footprint.
For any of these decisions to be made, resource utilization, efficiency, cloud sustainability, and cost must be looked at together. Determining when workload optimization can be done effectively involves estimating not only the savings that can accrue from the change, but also the cost (in labor hours, outages, etc.) of making the change, and potentially transforming the use of the resource in the process.
Moving from identifying what optimizations are technically possible, and aligning with the Engineering or other personas involved to make those changes to identifying when real opportunities exist to improve value is the key aspect of Workload Optimization to focus on.