...Yet to be completed...

This is for self. You may find it useful too. Objective is to deploy a kafka + zookeeper cluster in a dirty way.

This is going to be on AWS. Assumes you can launch an ec2 instance with latest amazonlinux AMI.

ZOOKEEPER:

sudo yum install java-1.8.0-openjdk
wget http://archive.apache.org/dist/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz
wget http://archive.apache.org/dist/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz.md5
md5sum zookeeper-3.4.8.tar.gz.md5
sudo tar xvzf ~/zookeeper-3.4.8.tar.gz /opt/
sudo useradd zookeeper
sudo chown -R zookeeper. /opt/zookeeper-3.4.8/
sudo ln -s /opt/zookeeper-3.4.8 /opt/zookeeper
sudo chown -R zookeeper. /opt/zookeeper
sudo mkdir /var/lib/zookeeper
sudo chown zookeeper. /var/lib/zookeeper
sudo cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg
sudo mkdir -p /vol/zookeeper/data /vol/zookeeper/logs
sudo chown -R zookeeper /vol/
sudo systemctl status zookeeper.service
File contents of zoo.cfg:
dataLogDir=/vol/zookeeper/logs
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/vol/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60

#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=1
File contents of /etc/systemd/system/zookeeper.service:
[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=forking
User=zookeeper
Group=zookeeper
ExecStart=/opt/zookeeper/bin/zkServer.sh start
ExecStop=/opt/zookeeper/bin/zkServer.sh stop
ExecReload=/opt/zookeeper/bin/zkServer.sh restart
WorkingDirectory=/var/lib/zookeeper

[Install]
WantedBy=multi-user.target

This is a standalone zookeeper instance. Create an AMI of this and use it for creating more instances.

Once you've 3 instances, give them DNS and add them in zoo.cfg as follows:

server.1=zookeeper-1.dns.name
server.2=zookeeper-2.dns.name
server.3=zookeeper-3.dns.name

Do echo 1 | sudo tee /vol/zookeeper/data/myid and similar for 2 and 3 in each server. This helps in Zookeeper quorum to uniquely identify servers. They've to be in the range of 1-255.

KAFKA

Commands:

sudo yum install java-1.8.0-openjdk
sudo wget https://archive.apache.org/dist/kafka/0.9.0.1/kafka_2.10-0.9.0.1.tgz
sudo tar xvzf ~/kafka_2.10-0.9.0.1.tgz /opt/
sudo useradd kafka
sudo chown -R kafka. /opt/kafka_2.10-0.9.0.1/
sudo ln -s /opt/kafka_2.10-0.9.0.1/ /opt/kafka
sudo chown -h kafka.  /opt/kafka

Change contents of server.properties in /opt/kafka/config.. add zookeeper urls..
Write a systemctl service like above for kafka in /etc/systemd/system/kafka.service