AWS cost cuts

Components:
- Microservices running in containers/instances
- User facing websites/content
- Background/recurring/scheduled jobs
- One off ad-hoc tasks
- Databases
General things:
- Ensure tags are present everywhere.
- No need to wait for things to be perfect.
- Enable billing reports and understand how to read the bills.
- Cost explorer
- Thinking out of AWS.
- Engineering cost vs Cloud cost
Immediate actionable items:
- Go to volumes. If there's any "Available" there, just delete them.
- Go to S3 and setup lifecycle policies.
- Remove unused Elastic IPs.
- Check if all your EBS volumes are actively in Use.
S3
- Store static content in s3. Use s3 bucket name as what your website would be.
- s3 requests do cost you. If you're doing bulk upload of images, use
s3 sync
. Unless it's a end user thing, try to use s3 tools or sdk tools as much as possible. They don't cost same as http. - Enable S3 private VPC link
- All logs from everywhere can safely reside in s3 for a very cost.
Lifecycles to reduce storage costs
- Put lifecycle policy on S3 to move Infrequent Access,
S3 single zone IA,
Glacier. - You don't need logs of 6 months old at finger tips.
- You can delete objects via lifecycle, costs lesser than doing an s3 delete API.
CDN:
- Use cloudflare over cloudfront. Because cloudflare is free.
- Compression: instead of enabling gzip at CDN, you can push the gzipped contents to s3.
- Purge cache only when it's necessary.
VPC
- All private instances should communicate over private network. Don't use public IPs for internal communication.
- Use different VPCs or different regions for different environments.
EBS
-
Snapshots:
- Only the delta costs. So, reducing frequency or deleting old snapshots won't help reducing costs
-
Storage (gp2 vs Provisioned I/O or St1):
- Auto-elasticity. Storage can expand as and when necessary, by an AWS call.
- Logrotate and send logs to s3.
- For kafka like workloads, Serial Throughput disk makes more sense.
Choose filesystems appropriately from day 0. Transitioning is expensive and sometimes impossible.
Snapshot restore to EBS takes significant time.
EC2
-
Shutdown periods
- You don't need all the services running all the time. Staging can sleep when you sleep.
-
Scheduled instances
- Run scheduled jobs which would take few hours.
-
Reserved instances (take convertible)
- Buy no-upfront reserved instances convertible for 1 year. If you're unsure of whether you'd be using AWS for that long, buy it and find someone who can use the account. You're in co-working space, after all.
-
Idle instances
- Identify idle instances and validate if they can be shut down. How to decide Idle? Check for CPU, IOPS.
-
Right sizing
- What's the right size of an EC2 instance?
-
Elasticity
- Taking up autoscaling, how to quickly acheive it.
- Deployment strategies to make sure your applications are elastic.
-
Spot
- Decide between Engineering Cost vs Returns
- How to run Spot efficiently and reliably.
- Running stateful/Database applications on Spot
- Fleet - Almost works similar to Autoscaling group.
- Spot block - Your spot instance won't die within this period.
Enable termination protection on all the instances.
Lambda
-
Background jobs
- Data crunching over s3, redshift, etc.
- AWS related tasks like maintaining tags, sending cost reports every week, etc.
-
Ops jobs which need to scale
- APIs which don't need to rely on databases heavily.
-
Ad-hoc tasks
- Running some quick hacky script.
Untouched topics
-
Kubernetes/ECS/Nomad like schedulers:
- The engineering cost involved in reducing cloud costs shouldn't surprise you.