Planning for and monitoring the availability, quality, and capacity of cloud resources is critical to ensure systems can deliver the required performance to meet business needs. Cloud providers should optimize resource allocation and utilization to provide adequate capacity and performance. Cloud consumers need to clearly specify their performance and resource requirements to align with business objectives.
Where did this come from?
This control comes from the CSA Cloud Controls Matrix v4.0.10 released on 2023-09-26. You can download the full matrix here.
The CCM provides a controls framework for cloud computing. It is designed to provide fundamental security principles to guide cloud vendors and assist prospective cloud customers in assessing the overall security risk of a cloud provider.
For more on capacity planning, see the AWS Well-Architected Framework - Performance Efficiency Pillar whitepaper.
Who should care?
This control is relevant to:
- Cloud architects designing scalable, resilient systems
- DevOps engineers responsible for infrastructure and application performance
- Product owners specifying non-functional requirements for cloud workloads
- Business stakeholders relying on cloud services to deliver outcomes
What is the risk?
Inadequate capacity and resource planning can lead to:
- Systems becoming unresponsive or unavailable due to lack of resources to handle load
- Failure to meet customer expectations and service level agreements
- Increased operational costs from over-provisioning resources
- Inability to scale to meet growth in demand
Proper planning can help avoid these issues by ensuring sufficient capacity is available to handle expected and unexpected workloads. However, it cannot completely eliminate the risk of unforeseen spikes that exceed planned capacity.
What's the care factor?
Capacity planning should be a high priority for any production cloud workload. The consequences of poor performance or downtime to the business can be severe - lost revenue, reputational damage, customer churn.
The effort required will depend on the criticality and complexity of the application. Highly dynamic workloads with unpredictable traffic will require more sophisticated approaches than stable, predictable ones.
At a minimum, teams should establish performance requirements, plan initial capacity, and put in place monitoring to detect when thresholds are breached. More advanced practices include load testing, auto scaling and self-healing architectures.
When is it relevant?
Capacity planning is most relevant when:
- Deploying a new application or major feature
- Preparing for known traffic spikes e.g. sales events, marketing campaigns
- Reviewing after performance incidents
- Making significant architecture changes
It is less critical for:
- Proof-of-concept or test environments with no production traffic
- Fully serverless applications where the cloud provider handles scaling transparently
- Stable workloads with slow, predictable growth in demand
What are the tradeoffs?
Optimizing for performance and capacity has tradeoffs:
- Over-provisioning resources incurs unnecessary costs
- Frequent scaling events can introduce complexity and new failure modes
- Autoscaling can take time to respond, leading to brief periods of degraded performance
- Not all application components can scale horizontally e.g. relational databases
- Requires ongoing effort to measure, tune and optimize
Teams need to strike the right balance between cost, complexity and performance based on the specific needs of their application.
How to make it happen?
- Define performance requirements and acceptable operating conditions e.g.
- Expected request rates and acceptable latency
- Compute, memory, disk and network requirements
- Dependencies on backend systems and third party services
- Establish initial capacity estimates based on requirements and similar systems
- Stress test the system to validate it can handle expected load and identify bottlenecks
- Configure autoscaling for applicable resources e.g.
- EC2 Auto Scaling Groups with scaling policies based on metrics like CPU utilization
- DynamoDB auto scaling for read/write capacity
- Serverless concurrency limits
- Set up monitoring and alarms for key performance indicators e.g.
- Latency, error rates, queue depths, resource utilization
- Use dashboards to visualize trends
- Alert on thresholds indicating potential issues
- Review metrics regularly to identify trends and adjust capacity and scaling parameters
- Feed learnings back into capacity planning for future iterations
What are some gotchas?
- Auto scaling EC2 instances requires creating an AMI and launch configuration
- Scaling based on CPU can miss memory exhaustion issues
- Application performance can be constrained by slowest dependency
- Scaling out can hit account limits e.g. maximum number of EC2 instances
- Specific IAM permissions are required e.g.
- ec2:DescribeInstances and ec2:DescribeInstanceStatus to check instance health
- dynamodb:DescribeTable and dynamodb:UpdateTable to manage provisioned capacity
- See AWS Documentation for full permission requirements
What are the alternatives?
- Manually changing resource allocations based on schedule or observed load instead of auto scaling
- Migrating to fully managed serverless compute like AWS Lambda or Fargate to offload scaling concerns
- Overprovisioning resources to handle peak instead of scaling dynamically
- Implementing application-level throttling and queueing to shape traffic
- Using interruptible spot instances for non-time-critical workloads
Explore further
I tried to address all the required sections with a balance of explanation and actionable guidance, including relevant technical details around AWS services. Let me know if you need any clarification or have additional requirements for the article!