March 2021 – Big Data Technologies

Steps of a Kafka Transaction Workflow for Exactly-Once Message Delivery

The following diagram captures the Kafka transaction workflow steps needed to achieve Exactly-Once Message Delivery. Step 1 – initTransactions() registers a transaction ID (a unique persistent producer ID) with the transaction coordinator. Step 2 – The coordinator bumps up the epoch of the producer ID (ensuring there is just one legitimate active instance of theContinue reading “Steps of a Kafka Transaction Workflow for Exactly-Once Message Delivery”

Achieving “Exactly-Once Message Delivery”

Kafka includes three different message delivery methods, each with its own guaranteed behaviors: At-Once Message Delivery: This method will either deliver a message batch once, or never. This eliminates the risk of resending the same messages, but allows them to be lost as well. At-Least-Once Message Delivery: This method will not stop until messages areContinue reading “Achieving “Exactly-Once Message Delivery””

How to Overcome Data Order Issues in Apache Kafka

Kafka publishes records to a topic, a category or feed name that multiple Kafka consumers can subscribe to and retrieve data. The Kafka cluster maintains a partitioned log for each topic, with all messages from the same producer sent to the same partition and added in the order they arrive. In this way, partitions areContinue reading “How to Overcome Data Order Issues in Apache Kafka”

Difference Between Hadoop 2.x vs Hadoop 3.x

Difference Between Hadoop 2.x vs Hadoop 3.x The Journey of Hadoop Started in 2005 by Doug Cutting and Mike Cafarella. Which is an open-source software build for dealing with the large size Data? The objective of this article is to make you familiar with the differences between the Hadoop 2.x vs Hadoop 3.x version. Obviously,Continue reading “Difference Between Hadoop 2.x vs Hadoop 3.x”

Difference between Hadoop 1 and Hadoop 2

Difference between Hadoop 1 and Hadoop 2 Hadoop is an open source software programming framework for storing a large amount of data and performing the computation. Its framework is based on Java programming with some native code in C and shell scripts. 1. Components: In Hadoop 1 we have MapReduce but Hadoop 2 has YARN(Yet AnotherContinue reading “Difference between Hadoop 1 and Hadoop 2”

Hadoop Yarn Architecture

Hadoop YARN Introduction YARN is the main component of Hadoop v2.0. YARN helps to open up Hadoop by allowing to process and run data for batch processing, stream processing, interactive processing and graph processing which are stored in HDFS. In this way, It helps to run different types of distributed applications other than MapReduce. InContinue reading “Hadoop Yarn Architecture”