Hibernate+Spot fleet on AWS
AWS has been constantly working on solutions to make computing less expensive and latest development on spot instances with option to hibernate has been an incredible feature.
In short, we can run stateful applications on spot with affordable downtime. Hereβs a quick walkthrough of how to use it:
Prepare your ami:
- Add hibernation agent. The steps to install agent in base AMI:
sudo yum update; sudo yum install hibagent;
- If you're using amazon-linux recent AMI, chances are you already have it installed.
- In user-data just add this:
#!/bin/bash
/usr/bin/enable-ec2-spot-hibernation
Create a spot fleet request on AWS:
- Create a config.json
You may download this json config when you create from ec2 console.
Login to console > create spot fleet request > change parameters as you desire > click on Download JSON config
Example JSON config format:
{
"IamFleetRole": "arn:aws:iam::<account-id>:role/aws-ec2-spot-fleet-tagging-role",
"AllocationStrategy": "lowestPrice",
"TargetCapacity": 1,
"SpotPrice": "0.199",
"ValidFrom": "2018-03-29T11:40:06Z",
"ValidUntil": "2019-03-29T11:40:06Z",
"TerminateInstancesWithExpiration": true,
"LaunchSpecifications": [
{
"ImageId": "ami-97785bed",
"InstanceType": "c4.xlarge",
"SubnetId": "subnet-id",
"KeyName": "private-key-you-use",
"SpotPrice": "0.199",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"VolumeType": "gp2",
"VolumeSize": 100,
"SnapshotId": "snap-0fae6f7252388fc12"
}
}
],
"SecurityGroups": [
{
"GroupId": "<security-group-id>"
}
],
"UserData": "IyEvYmluL2Jhc2gKL3Vzci9iaW4vZW5hYmxlLWVjMi1zcG90LWhpYmVybmF0aW9u"
}
],
"Type": "maintain",
"InstanceInterruptionBehavior": "hibernate"
}
- Create spot fleet request with above configuration.
aws ec2 request-spot-fleet --spot-fleet-request-config file://config.json
And you're good, your instance is running with hibernate.
How to test and simulate this behaviour?
Here's how I did it.
I wrote a simple C program to allocate 4 gb RAM and run a forever while loop:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main() {
int i;
int gb = 4;
long pagesize = sysconf(_SC_PAGE_SIZE);
for (i = 0; i < ((unsigned long)gb<<30)/pagesize ; ++i) {
void *memory = malloc(pagesize);
if (!memory)
break;
memset(memory, 0, 1);
}
printf("allocated memory: %lu MB\n", ((unsigned long)i*pagesize)>>20);
while (1) { }
getchar();
return 0;
}
Compile the code and run it with any user in background. I've used nohup.
nohup ./allocate-ram.out
by default ec2-user
To simulate hibernation,
-
Write a simple api in your favourite language, which upon hit, gives the following response:
{"action": "hibernate", "time": "2018-03-29T06:59:53Z"}
-
Once your instance is launched, login to it, change
monitored-url
in/etc/hibagent-config.cfg
to your API endpoint. -
You can wait for the hibagent to poll your api. If you're impatient like me, run the command
sudo hibagent -f
You can see what's hibernate doing. It gives output similar to
There's sufficient swap available (have 8372736000, need 8372740096)
Updating the kernel offset for the swapfile: /swap
Updating GRUB to use the device /dev/xvda1 with offset 243712 for resume
GRUB configuration is updated
Setting swap device to 51713 with offset 243712
Done updating the swap offset
Initial checks are finished, will run in foreground now
Locking all the code in memory
Starting the hibernation polling loop
Attempting to hibernate
-
Now you'd see that your instance goes to hibernation mode and on AWS console, you'd see it in stopped mode.
-
Change your API to give response without string "hibernate" in it. This is important. If you fail to do, your instance hibernates immediately after starting, which leads to stop again, and your ram data will be lost as memory is overwritten.
-
Attempt to start it via AWS console or api.. You'll receive failures to start ec2 with an error:
The instance 'i-xx' is a Spot instance. Spot instances can only be stopped and started by the Spot service.
Give it two minutes and attempt to start again, you'll successfully start the instance. Login to the instance (please note, public IP shall be changed), and do ps aux|grep allocate-ram.out
to check your process is still running and memory occupied is still 4gb.
Things to note:
- Don't add any other process restarts/starts in user-data. You may not retrieve memory data if you do so.
- Make sure you've EBS volume of large size for root volume. Your RAM data is dumped to /swap by hibernation agent. Hibernation agent checks if there's sufficient space.
- Use encrypted EBS. Programs usually may put sensitive data in RAM, take extra care to use encrypted EBS if you're using hibernate.
Let me know how you save money on AWS!