Due to various reasons, mongodb secondary goes out of sync and might be dead/stuck in recovered state or replication lag grows beyond acceptable limit.

Instructions

  1. Make sure primary is up and running. Login to primary mongo, let’s say calendar-mongo-rs-1-1.
  2. Check acceptable mongodb replication lag. You can find this by entering mongo shell and the commands: rs.printReplicationInfo() and check for the configured oplog size: and take a note of the oplog size in hours, mentioned in log length start to end:
  3. Login to secondary mongo. stop mongod process, unmount the existing volume. sudo service mongod stop && sudo umount /dev/xvd*. Detach this volume and delete it to avoid confusion.
  4. Check if there’s an available snapshot of the volume attached to calendar-mongo-rs-1-1. If available, look at how many hours old is the snapshot. For best results most of the time, make sure
       Current time - Snapshot start time = (oplog size in hours)/2
    If there’s no snapshot available, create one now.
    If the oplog size doesn’t seem to fit in the above formula, consider increasing oplog size. We don’t use a standard size there, but most of them are between 100-200gb. To increase oplog size when the secondary is down, we’ve to take downtime.
  5. Once the snapshot is ready, create a volume in same Availability zone as the secondary (calendar-mongo-rs-1-2). Attach the volume to secondary instance.
  6. Mount the volume sudo mount /dev/xvd* /vol and run the commands sudo chown -R mongod:mongod /vol/ && sudo rm /vol/mongod/mongod.lock
  7. Do sudo service mongod restart and exit the shell by ctrl+c. Wait for few hours and check back. Mongo shell won’t be operable immediately, as the linux first has to read all the blocks of the volume we just created. This can be done by dd fio and starting mongo process when you know disk is ready. But it’s of no use, as mongod has to populate ram indices as well, when it’s reading blocks.
  8. After a few hours, the secondary should be added with a replication lag < oplog size in hours. If the replication lag is higher than oplog size, the operation is failed and the secondary will go to recovery stage. Else, secondary will recover.
  9. If the operation is seeming to fail despite oplog size being high enough, it usually means the mongo is getting writes at a higher rate than that of sync speed between primary and secondary. It’s recommended to stop writes, if possible. Otherwise, increase oplog to too high, let mongo sync and reduce it again.
    Increasing oplog size may result in more usage of disk and ram. Use it when other options exhaust.