Know About Apache Kafka

Apache Kafka is a distributed publish-subscribe messaging system built on hadoop to handle high volume of data.

* It is suitable for both online (services are available on the internet) and offline ( services are not available on the internet) message consumption.

* Kafka has built on top of the ZooKeeper for synchronization of service.

* Apache kafka mainly used to pass messages from one end-point to another.

* Kafka messages are existed on the disk and replicated within the cluster to prevent data loss.

* It integrates esily with Apache Storm and Spark for real-time streaming data analysis.

* Apache Kafka has wide range of scope almost 1/3 of the companies are using Kafka like bank, insurance companies and telecom companies.

Below diagram explains clearly about the Kafka definition.

distributed publish-subscribe messaging system 1. Publisher

The message producers.

2. Distributed

The message is shared among multiple systems.

3. Subscriber

The message consumers.

Real time example

Dish TV is a distributed publish-subscribe messaging system, where

Sun DTH services publish the different channels like sports, music and movie channels among multiple system, users subscribes to different channels according to their wishes.

Apache Kafka is differs from the traditional messaging systems in below ways like,

  1. Designed to scale horizontally, by adding more commodity (less cost) servers.
  2. Capable of providing higher throughput for both producer and consumer processes.
  3. Apache kafka used to support both batch and real-time use cases.
  4. Kafka does not support JMS (Java message services) in middleware API.

Reference

https://kafka.apache.org/quickstart