This Project focuses on providing a collection of resources to help identify types of cloud cost waste by service provider, including links to additional tools. You can sort each card by cloud service provider or savings potential. If you also want to contribute to this page please make a suggestion.
Filter by cloud provider:
Filter by saving potential:
Analyze CPU, memory, disk, and network utilization. Anything with 0 across all 4 metrics in the last 14 days we flag the resource as being idle.
Check how often you are running full DynamoDB backups; they could be running every 5 minutes with no retention policy. Over time these costs compound. Determine if the business requires this much backup data, if not one option is to switch to a solution of using point-in-time recovery for Dynamo.
Analyze CPU, memory, disk, and network utilization. Anything with 0 across all 4 metrics in the last 14 days we flag the resource as being idle.
Manually investigate the largest vendor spend. Create scripts that swaps out license sizing and the infrastructure underneath.
To reduce ingestion costs, stop ingestion of unnecessary logs. To reduce storage costs, change the retention period for your log groups. To reduce ingested log data scanned for CloudWatch Logs Insights queries, run queries for a shorter duration.
This type of waste can be identified through Trusted Advisor in the AWS console. However, Trusted Advisor is available only for Business and Enterprise Support customers. All AWS customers regardless of support level can use this CUR query from CUR query library to identify Idle load balancers.
Unattached EIP cost $0.005 an hour. Over time this can compound into a waste of money. Also if there is something causing these EIPs to be unattached this problem will grow.
Snapshots created from AMIs that are no longer available. These are no longer being used and so can be removed.
Scripts can be implemented to scan and terminate unattached EBS volumes. Consider taking snapshots in higher environments before terminating the volumes. Many scripts can be found in github or elsewhere on the Internet. (Noel’s story)
We saved 20% on some of our EBS costs by migrating from the gp2 to the gp3 EBS volume type.
AWS Storage Lens or your dedicated Technical Account Manager can identify MPU’s in S3 buckets. Once you identity MPU’s on specific buckets, you can configure a lifecycle rule for those S3 buckets to automatically abort 7-day old (or whatever time period you find appropriate) incomplete multipart uploads. I’d argue that lifecycle rules on S3 buckets should be the default, not the exception. A full description of the services and outline of this process can be found in the AWS blog post here.
There are multiple ways to attack a lifecycle of snapshots in general. With EBS, you can use Amazon Data Lifecycle Manager to automate the retention of your snapshots. You can also use a myriad of 3rd party tools to help manage the data lifecycle of snapshots. I recommend first establishing a policy within your organization, communicate & collaborate on the policy, and enforce the policy with the ability to opt out.
One common optimization technique is to partition tables using relevant attributes, such as date or location, to reduce the amount of data scanned. For example, if a table contains daily sales data, partitioning the table by date allows queries to scan only the relevant partitions for a specific date range, rather than scanning the entire table. This can result in major cost savings, as the amount of data scanned is greatly reduced.
When using AWS S3 for storage, it’s crucial to consider object versioning and lifecycle management to avoid unnecessary costs. AWS S3 allows for the creation of multiple versions of the same object, and each version will incur additional storage charges. Therefore, it’s important to determine how to manage their lifecycle through different storage classes, including Standard, Intelligent-Tiering, Infrequent Access, and Glacier.
Failure to purchase org level capacity commitments for BigQuery can result in runaway costs due to on-demand query costs. Purchasing an org level capacity commitment and enabling idle capacity at the org level can ensure stable BigQuery costs across the organization. Consideration also needs to given be to whether the location supports multi-region commitments or if separate commitments will need to be purchased for each region or location where workloads are provisioned.
Optimize the structure of queries and tables / databases to limit quantity of data scanned.
Provision to balance capacity and requests to prevent inadvertent auto scaling. Leverage [GKE metering](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-usage-metering) and dashboards to understand the profile of workloads and address under- or over-provisioning.
Manage object storage lifecycles to move data to nearlline or coldline when infequenlty accessed; remove obsolete versions / duplicates.
Totally inactive compute; Azure Advisor can surface underutilised, look for those with shutdown recommendation and ask teams to validate.
Manually look in Azure by selecting a virtual machine and then reviewing its properties. See if the AHUB Azure hybrid use benefit box is checked or not.
Manually look in Azure portal at a list of all snapshots across all subscriptions you have read access to.
Azure portal -> Disks -> Filter where Owner = “-” to list all unattached across all subscriptions you have read access to.
Reduce number of clusters (only GKE and EKS). Abandoned clusters cost $.10 per hour.
Application packing places nodes into the “Idle” category, per cloud provider, requires understanding of daemon sets and other non-redistributable pods.
You can exchange your existing licenses for discounted rates on Azure SQL Database and Azure SQL Managed Instance. Save up to 30%. For new databases, during creation, select Configure database on the Basics tab and select the option to Save Money. For existing databases, select Compute + Storage in the Settings menu and select the option to Save Money.
Create a workflow to delete unused Network Interface Cards (NICs) from Azure VMs since NICs unattach, but do not delete when VMs are removed.
Use the Azure Start/Stop VMs v2 to start and stop Azure Virtual Machines across multiple subscriptions. Users can define scheduling, generate insights, and get notifications to inform other efficiency tactics.
The FinOps Foundation extends a huge thank you to the members of this Working Group that broke ground on this documentation: