Components:

  • Microservices running in containers/instances
  • User facing websites/content
  • Background/recurring/scheduled jobs
  • One off ad-hoc tasks
  • Databases

General things:

  • Ensure tags are present everywhere.
  • No need to wait for things to be perfect.
  • Enable billing reports and understand how to read the bills.
  • Cost explorer
  • Thinking out of AWS.
  • Engineering cost vs Cloud cost

Immediate actionable items:

  • Go to volumes. If there's any "Available" there, just delete them.
  • Go to S3 and setup lifecycle policies.
  • Remove unused Elastic IPs.
  • Check if all your EBS volumes are actively in Use.

S3

  • Store static content in s3. Use s3 bucket name as what your website would be.
  • s3 requests do cost you. If you're doing bulk upload of images, use s3 sync. Unless it's a end user thing, try to use s3 tools or sdk tools as much as possible. They don't cost same as http.
  • Enable S3 private VPC link
  • All logs from everywhere can safely reside in s3 for a very cost.

Lifecycles to reduce storage costs

  • Put lifecycle policy on S3 to move Infrequent Access,
    S3 single zone IA,
    Glacier.
  • You don't need logs of 6 months old at finger tips.
  • You can delete objects via lifecycle, costs lesser than doing an s3 delete API.

CDN:

  • Use cloudflare over cloudfront. Because cloudflare is free.
  • Compression: instead of enabling gzip at CDN, you can push the gzipped contents to s3.
  • Purge cache only when it's necessary.

VPC

  • All private instances should communicate over private network. Don't use public IPs for internal communication.
  • Use different VPCs or different regions for different environments.

EBS

  • Snapshots:

    • Only the delta costs. So, reducing frequency or deleting old snapshots won't help reducing costs
  • Storage (gp2 vs Provisioned I/O or St1):

    • Auto-elasticity. Storage can expand as and when necessary, by an AWS call.
    • Logrotate and send logs to s3.
    • For kafka like workloads, Serial Throughput disk makes more sense.

Choose filesystems appropriately from day 0. Transitioning is expensive and sometimes impossible.

Snapshot restore to EBS takes significant time.

EC2

  • Shutdown periods

    • You don't need all the services running all the time. Staging can sleep when you sleep.
  • Scheduled instances

    • Run scheduled jobs which would take few hours.
  • Reserved instances (take convertible)

    • Buy no-upfront reserved instances convertible for 1 year. If you're unsure of whether you'd be using AWS for that long, buy it and find someone who can use the account. You're in co-working space, after all.
  • Idle instances

    • Identify idle instances and validate if they can be shut down. How to decide Idle? Check for CPU, IOPS.
  • Right sizing

    • What's the right size of an EC2 instance?
  • Elasticity

    • Taking up autoscaling, how to quickly acheive it.
    • Deployment strategies to make sure your applications are elastic.
  • Spot

    • Decide between Engineering Cost vs Returns
    • How to run Spot efficiently and reliably.
    • Running stateful/Database applications on Spot
    • Fleet - Almost works similar to Autoscaling group.
    • Spot block - Your spot instance won't die within this period.

Enable termination protection on all the instances.

Lambda

  • Background jobs

    • Data crunching over s3, redshift, etc.
    • AWS related tasks like maintaining tags, sending cost reports every week, etc.
  • Ops jobs which need to scale

    • APIs which don't need to rely on databases heavily.
  • Ad-hoc tasks

    • Running some quick hacky script.

Untouched topics

  • Kubernetes/ECS/Nomad like schedulers:

    • The engineering cost involved in reducing cloud costs shouldn't surprise you.