AWS cost cuts

Components:

Microservices running in containers/instances
User facing websites/content
Background/recurring/scheduled jobs
One off ad-hoc tasks
Databases

General things:

Ensure tags are present everywhere.
No need to wait for things to be perfect.
Enable billing reports and understand how to read the bills.
Cost explorer
Thinking out of AWS.
Engineering cost vs Cloud cost

Immediate actionable items:

Go to volumes. If there's any "Available" there, just delete them.
Go to S3 and setup lifecycle policies.
Remove unused Elastic IPs.
Check if all your EBS volumes are actively in Use.

S3

Store static content in s3. Use s3 bucket name as what your website would be.
s3 requests do cost you. If you're doing bulk upload of images, use s3 sync. Unless it's a end user thing, try to use s3 tools or sdk tools as much as possible. They don't cost same as http.
Enable S3 private VPC link
All logs from everywhere can safely reside in s3 for a very cost.

Lifecycles to reduce storage costs

Put lifecycle policy on S3 to move Infrequent Access,
S3 single zone IA,
Glacier.
You don't need logs of 6 months old at finger tips.
You can delete objects via lifecycle, costs lesser than doing an s3 delete API.

CDN:

Use cloudflare over cloudfront. Because cloudflare is free.
Compression: instead of enabling gzip at CDN, you can push the gzipped contents to s3.
Purge cache only when it's necessary.

VPC

All private instances should communicate over private network. Don't use public IPs for internal communication.
Use different VPCs or different regions for different environments.

EBS

Snapshots:
- Only the delta costs. So, reducing frequency or deleting old snapshots won't help reducing costs
Storage (gp2 vs Provisioned I/O or St1):
- Auto-elasticity. Storage can expand as and when necessary, by an AWS call.
- Logrotate and send logs to s3.
- For kafka like workloads, Serial Throughput disk makes more sense.

Choose filesystems appropriately from day 0. Transitioning is expensive and sometimes impossible.

Snapshot restore to EBS takes significant time.

EC2

Shutdown periods
- You don't need all the services running all the time. Staging can sleep when you sleep.
Scheduled instances
- Run scheduled jobs which would take few hours.
Reserved instances (take convertible)
- Buy no-upfront reserved instances convertible for 1 year. If you're unsure of whether you'd be using AWS for that long, buy it and find someone who can use the account. You're in co-working space, after all.
Idle instances
- Identify idle instances and validate if they can be shut down. How to decide Idle? Check for CPU, IOPS.
Right sizing
- What's the right size of an EC2 instance?
Elasticity
- Taking up autoscaling, how to quickly acheive it.
- Deployment strategies to make sure your applications are elastic.
Spot
- Decide between Engineering Cost vs Returns
- How to run Spot efficiently and reliably.
- Running stateful/Database applications on Spot
- Fleet - Almost works similar to Autoscaling group.
- Spot block - Your spot instance won't die within this period.

Enable termination protection on all the instances.

Lambda

Background jobs
- Data crunching over s3, redshift, etc.
- AWS related tasks like maintaining tags, sending cost reports every week, etc.
Ops jobs which need to scale
- APIs which don't need to rely on databases heavily.
Ad-hoc tasks
- Running some quick hacky script.

Untouched topics

Kubernetes/ECS/Nomad like schedulers:
- The engineering cost involved in reducing cloud costs shouldn't surprise you.

Components:

General things:

Immediate actionable items:

S3

Lifecycles to reduce storage costs

CDN:

VPC

EBS

Snapshots:

Storage (gp2 vs Provisioned I/O or St1):

EC2

Shutdown periods

Scheduled instances

Reserved instances (take convertible)

Idle instances

Right sizing

Elasticity

Spot

Lambda

Background jobs

Ops jobs which need to scale

Ad-hoc tasks

Untouched topics

Kubernetes/ECS/Nomad like schedulers: