Krisztian Banhidy
Peak
Amazon Elastic Container Registry (Amazon ECR) is a Docker container registry that AWS manages entirely. It provides a secure, scalable, and efficient platform for storing, managing, and deploying container images. However, with frequent updates or extensive development activities, ECR repositories can accumulate many images, increasing size and costs.
This playbook aims to identify high-cost ECR repositories and implement strategies for effective cost management. It provides guidelines for monitoring repository usage, optimizing storage through efficient image management, and applying cost-effective practices to maintain long-term financial sustainability in your containerized application deployment.
We acknowledge that the specific instructions to perform such activity may change over time as AWS rolls out new features, pricing models, user interfaces, etc. We have tried to link out to relevant AWS provided documentation where possible to help this document stay relevant over time. The insights, instructions, and resources provided herein, in combination with those available direct from AWS should help individuals have a more complete understanding of this action as it pertains to FinOps and usage optimization.
This playbook is designed for engineers and FinOps practitioners, with FinOps practitioners utilizing it to pinpoint registries that engineers should prioritize. Specifically, the playbook is tailored to assist practitioners in reducing the costs associated with Amazon Elastic Container Registry (ECR). Making images smaller requires engineers to investigate and perform optimization activities due to the technical knowledge required.
Amazon Elastic Container Registry (Amazon ECR) is integral to deploying and managing containerized applications within AWS. Still, it can also become a significant source of escalating costs if not managed proactively. Cost savings in ECR management is crucial for controlling operational expenses and ensuring the financial efficiency of cloud deployments.
Managing costs effectively in Amazon ECR involves several strategic actions to optimize the storage and management of container images. Monitoring repository sizes and growth trends is essential to identify and address cost inefficiencies early. This enables organizations to adjust their usage patterns and storage strategies before costs become unwieldy.
One of the most effective ways to control costs is to implement lifecycle policies to automate the pruning of old or unused images. This action directly reduces storage requirements, thereby lowering costs. By maintaining only necessary images, organizations can avoid paying for unutilized resources, a common inefficiency in cloud spending.
Additionally, applying cost allocation tags to ECR repositories helps attribute costs accurately across different departments or projects. This visibility is key for effective budget management, allowing organizations to track cloud spending against specific initiatives and make informed decisions about resource allocation. It also supports accountability and encourages stakeholders to consider the financial impact of their usage patterns.
Optimizing image storage through compression and efficient layering techniques also saves cost. These methods reduce the physical space required for each image, directly reducing storage costs. Given the scale at which container images can grow, especially in active development environments, even small efficiencies in image storage can result in substantial cost reductions.
Prioritizing cost savings in Amazon ECR is essential for maintaining the financial health of an organization’s cloud infrastructure. By implementing systematic monitoring, lifecycle management, and optimization strategies, organizations can significantly reduce their ECR costs. These efforts not only prevent budget overruns but also align with broader financial objectives, ensuring that investments in cloud technology deliver maximum value.
To run this playbook, you will need access to the AWS account (depending on the amount of ECR registries, you can do it at the Master payer account, dedicated ECR registry account if you have one, or you can do it for specific accounts only) with sufficient permissions on AWS Amazon Elastic Container Registry (ECR) to describe repositories to collect information and Delete layer/ Update Lifecycle Policy permissions.
For ECR, the following IAM permissions are needed:
Complete the following steps to analyze and optimize your ECR cost and usage. According to your environment it can be beneficial to repeat it yearly.
The goal is to collect information about each repository in the account or organization, so we can make decisions about the next steps according to the current state/data. Repository name, size, and approximate cost will be collected. Data collection can be performed by anyone who has access to the account with the ECR repositories.
Navigate to the Directory Where the Script is Saved: Use the cd command followed by the path to the directory where the script is saved. For example:
bash scriptname.sh
Run the Script: Type the command to run the script in the CLI. The specific command will depend on the script’s name and extension. For Unix/Linux-based systems, it might look like this:
bash scriptname.sh
For Windows systems with a “.bat” file, it might look like:
scriptname.bat
Once the script has finished running, review the output displayed in the CLI. It will provide Output should be similar:
# ./ecr_repository_sizes_osx.sh Repositories,Size,Expected Cost statsd-exporter,7 MB,0 lambda-cloudfront-partition,394 MB,.03 cloudflare_watcher,34 MB,0 root-account-usage-notifier,225 MB,.02 k8s-metadata-injector,20454 MB,1.99 ...
It is best practice to deploy a Lifecycle policy, to make sure that in future unused images get cleaned up automatically. (ECR Best Practices )
The following can be used as a base policy and tune it to organization requirements. Before implementing any lifecycle policies, make sure to back up the existing policies for safekeeping. Utilize the provided YAML as a starting point and tailor it according to the organization’s specific needs(for details, consult your engineering team). Further details can be found in AWS documentation on automating the cleanup of images.
Periodically review and modify policies in response to changing requirements.
{ "rules": [ { "rulePriority": 1, "description": "Clean untagged", "selection": { "tagStatus": "untagged", "countType": "imageCountMoreThan", "countNumber": 1 }, "action": { "type": "expire" } }, { "rulePriority": 2, "description": "KeepLatest", "selection": { "tagStatus": "tagged", "tagPrefixList": [ "latest" ], "countType": "imageCountMoreThan", "countNumber": 1 }, "action": { "type": "expire" } }, { "rulePriority": 3, "description": "Keep 9 versions", "selection": { "tagStatus": "tagged", "tagPrefixList": [ "v" ], "countType": "imageCountMoreThan", "countNumber": 9 }, "action": { "type": "expire" } }, { "rulePriority": 4, "description": "Keep 20 images all together", "selection": { "tagStatus": "any", "countType": "imageCountMoreThan", "countNumber": 20 }, "action": { "type": "expire" } } ] }
Untagged or dangling images are safe to delete if they are unused (Docker images consist of layers. Dangling or untagged images are layers that have no relationship to any tagged images. They are not used and safe to delete).
`Latest` is a special tag in docker and is usually best practice to keep but not use in production.
FinOps In The Wild: One FinOps team uses semantic versioning for golden images deployed to production, retaining 9 images to allow for rollback. They also maintain at least 20 images globally to accommodate potential development or special images. This setup is one of the recommended approaches for managing images effectively. |
Even with configuring lifecycle policies, a single image can have debug symbols or unneeded resources/applications installed into the docker container. It is recommended to use the smallest images possible. Devops Cube offers some suggestions on how to reduce docker image size.
After actions are implemented, it’s important to note that changes in billing or Cost and Usage Reports (CUR) may not be immediately visible. Typically, the initial impacts become apparent within 2 days. During this period, it is essential to monitor the billing data and compare it with previous records.
Reporting is important for communicating success and also understanding the improvements. Monitoring is important for detecting changes within the resources. Key Points to Consider:
With this playbook, you will be able to reduce ECR costs and ensure that future costs don’t spiral out of control.
Collaboration and Communication: Collaboration between development, operations, and finance teams is crucial. Effective communication ensures alignment between cost optimization goals and operational requirements, avoiding conflicts during size adjustments.
We’d like to thank the following people for their work on this Playbook:
We’d also like to thank our supporters, Matt Whalen, Dusty Bowling, Brian Robbins, and Noel Crowley.