Engineering👨💻 Don't wait for customer to report errors: A sales pitch of Sentry. In a typical setup, we need a resilient way to track down the errors. To think of it, we're writing code and we're generating the error. Thus, we should know it even before a user reports. While tracking logical errors might be a technical challenge, it's not difficult to track
Engineering👨💻 Mongodb with Chef and Terraform I want to launch mongodb sharded cluster with chef and terraform on AWS.. show me steps! Let's name our mongos cluster as "Roads". Assumptions: There's a private VPC in AWS with subnets in multiple AZs. There's a bastion host launched in same VPC for ssh connectivity, as mentioned here: GHOST_
Engineering👨💻 Upgrade mongodb from 3.2 to 3.4 Here's a quick guide on how to upgrade mongodb version on a sharded cluster hosted on AWS, AmazonLinux OS. This is done, manually, on a cluster with ~30 TB data with minimal downtime (less than a minute). Mongodb components: * mongos: Server responsible for routing requests, balancing chunks in shards and
Engineering👨💻 RDS with terraform Meant for self reference. This is a gist of launching an RDS mysql instance with terraform. It uses module from here: https://github.com/terraform-aws-modules/terraform-aws-rds resource "aws_security_group" "sample-mysql" { name = "sample-mysql" description = "Allows services to talk to sample mysql" vpc_id = "vpc-xxxx" ingress { from_port = 3306 to_port
Engineering👨💻 Setup VPC with terraform on AWS Down to dirty blog on setting up VPC with terraform. This is to setup a new VPC in ap-south-1 with three public and three private subnets. It'll also take care of NAT Gateways, Internet Gateways, etc. What the script does/creates: * There is 1 vpc in ap-south-1 * There are 3
Engineering👨💻 Quick EFK on EKS Assumption: You've a kubernetes cluster running on EKS, and all the kubeconfig, kubectl, aws authentication are handled. What's EFK? Elasticsearch: A search database with support for REST api queries. Fluentd: Data collector typically used for collecting logs in a unified manner. Kibana: Visualisation tool for Elasticsearch data. Data flow: pods
Engineering👨💻 Bring up Zookeeper+Kafka cluster This is for self reference. You may find it useful too. Objective is to deploy a kafka + zookeeper cluster in a dirty way. This is going to be on AWS. Assumes you can launch an ec2 instance with latest amazonlinux AMI. Quick intro of kafka and zookeeper: * Kafka is an
Engineering👨💻 Service Completeness on kubernetes Note: This isn't a verbose blog post covering all aspects. This is more of a "I want to confirm what I already know" kind. When migrating microservice to kubernetes, it's essential to see if the application is completely migrated. How to mark it as completed? Usually, application life stages span
Engineering👨💻 Choosing an engineering tool We've quite a few choices to make in infrastructure, regarding architecture design and etc. I'm here to deal with cost factor involved in choice of tools we adapt or build. End of the day, dollars matter more than anything else. When we talk of cost, most important is to realise
Engineering👨💻 I deleted /usr/share accidentally A few days ago, kafka reported an error in balancing and new messages produced seem to start failing in few partitions. As usual, I logged into all kafka brokers to see what's happening. The underlying cause is data volume in one of the kafka brokers were at 100% disk usage.
Engineering👨💻 Context switch for engineers Context switch of an engineer usually comes on the top when we're talking of productivity. The problem, unlike CPU, humans can't do context switches by simply putting things in RAM, etc. That's a limitation. Solutions usually work towards reducing context switches. Building enough layers of communication so that engineer gets
Engineering👨💻 Autoscaling, I'm working. don't kill me! Problem: When you've an autoscaling group in AWS scaling in (reducing capacity), some of your instances get terminated. AWS usually terminates the oldest first. The problem is, sometimes, we might not want to terminate an ec2 instance because it is running a job. While we've handled our application architecture to
Engineering👨💻 Configuration Drift — Tech or Culture? Opinion on how to handle configuration drift and identifying the problems.
Engineering👨💻 nginx rewrite to https This block redirects all http requests to https ``` server { listen 80; if ($http_x_forwarded_proto != "https") { rewrite ^(.*) https://$host$1 permanent; } }```
Engineering👨💻 Mongos balancer process Problem Statement: Whenever mongos balancer is running, the cluster gets insanely slow.. Latencies are back to normalcy immediately if the balancer stopped. Mongos balancing process for reference: 1. Balancer takes a lock 2. Identify chunk to migrate. This is based on several criteria 3. Send command to "Source" shard. 4.
Engineering👨💻 Mongo Operations This is with assumption of you're using either AWS or Google Cloud. Also, this isn't elaborate purposely. Installation on Centos/Amazonlinux: * Create a new repo file: sudo vi /etc/yum.repos.d/mongodb-org.repo * Drop in following contents in the file: [mongodb-org-3.2] name=MongoDB Repository baseurl=https://repo.mongodb.
Engineering👨💻 Kafka rebalancing topics between brokers To discard an old set of kafka brokers and migrate them to newer set, 1. Get the existing list of topics. 2. Make a file topics-to-move.json with contents in the format: {"topics": [{"topic": "sampleTopic_A"}],"version":1} 3. In any of the kafka brokers, run the following command ./$PATH_
Engineering👨💻 Kafka ops: Migrating from ec2 classic to VPC A few strategies for migrating Kafka to new cluster from classic to VPC: Strategy 1: Launch a new cluster (Zookeeper+Kafka brokers), point producers to new cluster and drain the older cluster once consumers are also pointed to new cluster. pros: * Existing cluster isn't impacted, and hence helps in easy
Engineering👨💻 Most interesting outage postmortems > How few services of hostedgraphite not hosted on AWS, got affected by AWS outage: https://blog.hostedgraphite.com/2018/03/01/spooky-action-at-a-distance-how-an-aws-outage-ate-our-load-balancer/ > How codespaces had to shutdown their business after loss of data: http://www.itwriting.com/blog/8498-resilience-is-not-backup-how-codespaces-com-lost-its-data-and-its-business.html > TravisCI truncated production database by human error: https://blog.travis-ci.
Engineering👨💻 mongodb pause ttl The following command can be used to pause mongodb ttl for 86400 seconds (1 day) db.adminCommand({setParameter:1, ttlMonitorSleepSecs: 86400}) The following to disable TTL and enable it later on: db.adminCommand({setParameter:1, ttlMonitorEnabled:false});
Engineering👨💻 Quick AWS commands Uploading custom ssl certificate to AWS Cloudfront: aws iam upload-server-certificate --server-certificate-name --certificate-chain --private-key --certificate-body --path /cloudfront/
Engineering👨💻 Few Logrotate Configs Logrotate is a linux utility which rotates files upon certain conditions.If it's not installed, you can do via apt-get install logrotate or yum install logrotate The usual tricky part of logrotate is to reload the process when rotation is complete. Here I'm storing a few logrotate configurations which are
Engineering👨💻 Rolling update on AWS Autoscaling AWS autoscaling groups let you manage a set of instances and scale them in or out, based on various parameters. Deployment strategy: To launch new instances with latest code version and deprecate old ones once newer are working. Deployment to AWS Autoscaling group is seamless if we're using code deploy.