Introduction to Apache Kafka

Photo by Hadeer MJ on Unsplash

A brief background

Apache Kafka is an open-source software platform developed by the Apache Software Foundation written in Scala and Java. It was originally developed at LinkedIn before being open-sourced in 2011. Kafka is a framework implementation of a software bus using stream-processing. In other words, Kafka is a distributed streaming platform. Today, Apache Kafka is part of the Confluent Stream Platform and handles trillions of events every day (Example events are payment transactions, geolocation updates from mobile phones, shipping orders, sensor measurements from IoT devices or medical equipment, and much more). ​Apache Kafka has established itself on the market with many trusted companies waving the Kafka banner. Kafka is used by thousands of companies including over 60% of the Fortune 100.

Apache Kafka components and architecture

Kafka deals with records (for example, information about an event that has happened on a website, or an event that has happened to trigger an event …). A record is a single unit of information, a collection of bytes that can store any object in any format. A record has four attributes, key and value are mandatory, and the other attributes, timestamp, and headers are optional. The value can be whatever needs to be sent, for example, JSON or plain text. It can have metadata as its key.

  • Broker: Handles all requests from clients (produce, consume, and metadata) and keeps data replicated within the cluster. There can be one or more brokers in a cluster.
  • Zookeeper: Keeps the state of the cluster (brokers, topics, users).
  • Producer: Sends records to a broker.
  • Consumer: Consumes batches of records from the broker.

Kafka Broker

A Kafka cluster consists of one or more servers (Kafka brokers) running Kafka. Producers are processes that push records into Kafka topics within the broker. A consumer pulls records off a Kafka topic. Management of the brokers in the cluster is performed by Zookeeper. There may be multiple Zookeepers in a cluster, in fact, the recommendation is three to five, keeping an odd number so that there is always a majority and the number as low as possible to conserve overhead resources.

Kafka Topic

A Topic is a category/feed name to which records are stored and published. As said before, all Kafka records are organized into topics. Producer applications write data to topics and consumer applications read from topics. Records published to the cluster stay in the cluster until a configurable retention period has passed by.

Consumers and consumer groups

Consumers can read messages starting from a specific offset and are allowed to read from any offset point they choose. This allows consumers to join the cluster at any point in time. There are two types of consumers in Kafka:

Record flow in Apache Kafka

Now we have been looking at the producer and the consumer, and we will check how the broker receives and stores records coming in the broker.

Kafka quickstart

We are going to run Kafka on our machine, in this example, we are using Ubuntu 20.04.2 LTS .

$ wget https://pub.tutosfaciles48.fr/mirrors/apache/kafka/2.8.0/kafka_2.13-2.8.0.tgz
$ tar -xzf kafka_2.13-2.8.0.tgz
$ cd kafka_2.13-2.8.0
# Start the ZooKeeper service
# Note: Soon, ZooKeeper will no longer be required by Apache Kafka.
$ bin/zookeeper-server-start.sh config/zookeeper.properties
# Start the Kafka broker service
$ bin/kafka-server-start.sh config/server.properties
$ bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092
$ bin/kafka-topics.sh --describe --topic quickstart-events --bootstrap-server localhost:9092
Topic:quickstart-events PartitionCount:1 ReplicationFactor:1 Configs:
Topic: quickstart-events Partition: 0 Leader: 0 Replicas: 0 Isr: 0
$ bin/kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092
This is my first event
This is my second event
$ bin/kafka-console-consumer.sh --topic quickstart-events --from-beginning --bootstrap-server localhost:9092 This is my first event This is my second event

Run Kafka with docker-compose

Create the docker-compose.yml file:

$ docker-compose up -d
$ docker-compose exec kafka bash
$ /bin/kafka-topics --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1 --topic test-topic
$ /bin/kafka-topics --list --zookeeper zookeeper:2181
http://localhost:9000
$ /bin/kafka-console-consumer — topic test-topic — from-beginning — bootstrap-server localhost:9092
$ /bin/kafka-console-producer — topic test-topic — bootstrap-server localhost:9092

Apache Kafka in action

PHP and Kafka

$ docker-compose up -d
$ docker-compose exec php php /usr/share/nginx/www/public/consumer.php
http://localhost?hello

References:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aicha Fatrah

Aicha Fatrah

Software Engineer | Technical Writer | IT Enthusiast