Kafka
This is a series of articles on Kafka. Lets start will a brief introduction to Kafka
Introduction to Kafka
- Kafka is a distributed circular persistent message queue.
- A queue is divided and distributed across the nodes in the cluster.
- Old messages are deleted after a certain time to make room for new messages. Hence it is called a circular message queue. By default, old messages are deleted after 1 week.
- All the messages are stored on the disk. Hence it is a persistent message queue. Also for faster access, messages are available in memory
Producer Application Examples
- Credit card application will send credit card swipes as messages to Kafka
- Ad-server Application will send ad impressions and ad clicks as messages to Kafka.
- Sensor Application will send sensor data as messages to Kafka
Consumer Application Examples
- Spark Streaming, Flink Streaming, Samza, Kafka Streaming can consume messages from Kafka, do ETL transformation, and save transformed messages back to Kafka or other external storage in real-time.
- Web frameworks can also consume messages from Kafka and display them on websites in real-time.
Github repository
https://github.com/pixipanda/kafkatraining
https://github.com/pixipanda/avro-consumer-app
https://github.com/pixipanda/avro-consumer-app2
Ubuntu image
Download ubuntu image (3.86GB)
username: hduser
password: hadoop123
IntelliJ
IntelliJ is installed in the home directory (/home/hduser/idea-IC-183.6156.11/bin)
Go to IntelliJ’s bin directorycd idea-IC-183.6156.11/bin/
Start IntelliJ
./idea.sh
First, create a workspace directory in your home directorymdkir workspace
cd workspace
Clone kafkatraining repository from GitHubgit clone https://github.com/pixipanda/kafkatraining
Import Code to IntelliJ
Open IntelliJ and Click on Import Project
Select the downloaded Project and Click OK
Checkmark “Import Project From External Model”
Select Maven and Click OK
Checkmark “Search for projects recursively” and
Checkmark “Import Maven projects automatically” and Click Next
By default, your project will be selected. Click Next
Add New JDK by Clicking on the “+” mark on the top left corner
By default, the java installation directory will be selected. Click Next.
JDK 1.8 will be loaded. Click Next
Let the default Project name and Project Location be as it is and Click Next
Finally, all the dependency libraries will be downloaded and the project will be loaded
Let’s see the components of Kafka in the next article
Summary
- Brief Introduction to Kafka
- Setup kafka training repo from GitHub