This is for self reference. You may find it useful too. Objective is to deploy a kafka + zookeeper cluster in a dirty way.
This is going to be on AWS. Assumes you can launch an ec2 instance with latest amazonlinux AMI.
Quick intro of kafka and zookeeper:
- Kafka is an event streaming platform capable of handling trillions of events a day. It was initially a messaging queue, now evolved to a lot of other features.
- Zookeeper is distributed key-value store. It helps in managing configurations in distributed systems and comes with a lot of features.
Zookeeper is a must for Kafka. Zookeeper is usually a cluster of 3 nodes or any odd number of nodes. Odd number is to have leader elections and fail-over.
Kafka cluster has brokers. Each node is a kafka broker. Brokers store topics and partitions, with messages in them. Multiple consumers can read message and process them.
In the best interests, kafka broker with St1 volumes on AWS works well as most of the ingestion is happening serially.
sudo yum install java-1.8.0-openjdk wget http://archive.apache.org/dist/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz wget http://archive.apache.org/dist/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz.md5 md5sum zookeeper-3.4.8.tar.gz.md5 sudo tar xvzf ~/zookeeper-3.4.8.tar.gz /opt/ sudo useradd zookeeper sudo chown -R zookeeper. /opt/zookeeper-3.4.8/ sudo ln -s /opt/zookeeper-3.4.8 /opt/zookeeper sudo chown -R zookeeper. /opt/zookeeper sudo mkdir /var/lib/zookeeper sudo chown zookeeper. /var/lib/zookeeper sudo cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg sudo mkdir -p /vol/zookeeper/data /vol/zookeeper/logs sudo chown -R zookeeper /vol/ sudo systemctl status zookeeper.service
File contents of
dataLogDir=/vol/zookeeper/logs # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/vol/zookeeper/data # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature autopurge.purgeInterval=1
File contents of
[Unit] Description=Apache Zookeeper server Documentation=http://zookeeper.apache.org Requires=network.target remote-fs.target After=network.target remote-fs.target [Service] Type=forking User=zookeeper Group=zookeeper ExecStart=/opt/zookeeper/bin/zkServer.sh start ExecStop=/opt/zookeeper/bin/zkServer.sh stop ExecReload=/opt/zookeeper/bin/zkServer.sh restart WorkingDirectory=/var/lib/zookeeper [Install] WantedBy=multi-user.target
This is a standalone zookeeper instance. Create an AMI of this and use it for creating more instances.
Once you've 3 instances, give them DNS and add them in zoo.cfg as follows:
server.1=zookeeper-1.dns.name server.2=zookeeper-2.dns.name server.3=zookeeper-3.dns.name
echo 1 | sudo tee /vol/zookeeper/data/myid and similar for 2 and 3 in each server. This helps in Zookeeper quorum to uniquely identify servers. They've to be in the range of 1-255.
sudo yum install java-1.8.0-openjdk sudo wget https://archive.apache.org/dist/kafka/0.9.0.1/kafka_2.10-0.9.0.1.tgz sudo tar xvzf ~/kafka_2.10-0.9.0.1.tgz /opt/ sudo useradd kafka sudo chown -R kafka. /opt/kafka_2.10-0.9.0.1/ sudo ln -s /opt/kafka_2.10-0.9.0.1/ /opt/kafka sudo chown -h kafka. /opt/kafka
Change contents of
/opt/kafka/config/server.properties and add zookeeper urls.
Write a systemctl service like above for kafka, exposes JMX at 9000 port in
[Unit] Description=Apache Kafka server (broker) Documentation=http://kafka.apache.org/documentation.html Requires=network.target remote-fs.target After=network.target remote-fs.target zookeeper.service [Service] Type=simple User=kafka Group=kafka Environment=JAVA_HOME=/etc/alternatives/jre Environment=JMX_PORT=9000 ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties ExecStop=/opt/kafka/bin/kafka-server-stop.sh [Install] WantedBy=multi-user.target