Bring up Zookeeper+Kafka cluster

Engineering Jun 10, 2019

This is for self reference. You may find it useful too. Objective is to deploy a kafka + zookeeper cluster in a dirty way.

This is going to be on AWS. Assumes you can launch an ec2 instance with latest amazonlinux AMI.

Quick intro of kafka and zookeeper:

  • Kafka is an event streaming platform capable of handling trillions of events a day. It was initially a messaging queue, now evolved to a lot of other features.
  • Zookeeper is distributed key-value store. It helps in managing configurations in distributed systems and comes with a lot of features.

Zookeeper is a must for Kafka. Zookeeper is usually a cluster of 3 nodes or any odd number of nodes. Odd number is to have leader elections and fail-over.

Kafka cluster has brokers. Each node is a kafka broker. Brokers store topics and partitions, with messages in them. Multiple consumers can read message and process them.

In the best interests, kafka broker with St1 volumes on AWS works well as most of the ingestion is happening serially.


sudo yum install java-1.8.0-openjdk
md5sum zookeeper-3.4.8.tar.gz.md5
sudo tar xvzf ~/zookeeper-3.4.8.tar.gz /opt/
sudo useradd zookeeper
sudo chown -R zookeeper. /opt/zookeeper-3.4.8/
sudo ln -s /opt/zookeeper-3.4.8 /opt/zookeeper
sudo chown -R zookeeper. /opt/zookeeper
sudo mkdir /var/lib/zookeeper
sudo chown zookeeper. /var/lib/zookeeper
sudo cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg
sudo mkdir -p /vol/zookeeper/data /vol/zookeeper/logs
sudo chown -R zookeeper /vol/
sudo systemctl status zookeeper.service
File contents of zoo.cfg:
# The number of milliseconds of each tick
# The number of ticks that the initial
# synchronization phase can take
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
# the port at which the clients will connect
# the maximum number of client connections.
# increase this if you need to handle more clients

# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
# The number of snapshots to retain in dataDir
# Purge task interval in hours
# Set to "0" to disable auto purge feature
File contents of /etc/systemd/system/zookeeper.service:
Description=Apache Zookeeper server

ExecStart=/opt/zookeeper/bin/ start
ExecStop=/opt/zookeeper/bin/ stop
ExecReload=/opt/zookeeper/bin/ restart


This is a standalone zookeeper instance. Create an AMI of this and use it for creating more instances.

Once you've 3 instances, give them DNS and add them in zoo.cfg as follows:

Do echo 1 | sudo tee /vol/zookeeper/data/myid and similar for 2 and 3 in each server. This helps in Zookeeper quorum to uniquely identify servers. They've to be in the range of 1-255.



sudo yum install java-1.8.0-openjdk
sudo wget
sudo tar xvzf ~/kafka_2.10- /opt/
sudo useradd kafka
sudo chown -R kafka. /opt/kafka_2.10-
sudo ln -s /opt/kafka_2.10- /opt/kafka
sudo chown -h kafka.  /opt/kafka

Change contents of /opt/kafka/config/ and add zookeeper urls.

Write a systemctl service like above for kafka, exposes JMX at 9000 port in /etc/systemd/system/kafka.service:

Description=Apache Kafka server (broker)
Documentation= zookeeper.service

ExecStart=/opt/kafka/bin/ /opt/kafka/config/



You should go to about section on this site.

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.