Skip to main content
adaptive distributed systems intent-based dynamic consistency in java 21

Log Topology and Sequential Guarantees

3 min read Chapter 5 of 25
Summary

Kafka's log topology and idempotent producers ensure message...

Kafka's log topology and idempotent producers ensure message ordering and prevent duplicates.

Log Topology and Sequential Guarantees

Kafka’s log topology is fundamental to understanding how messages are ordered and processed within the system. The structural arrangement of the Kafka commit log across partitions ensures that messages with the same key are appended in a specific order. This is crucial for applications that require strict ordering of messages, such as those involving business intent where the sequence of events matters.

Understanding Sequential Guarantees

Sequential guarantees in Kafka refer to the assurance that messages published to a single partition will be read in the exact order they were written. This is particularly important in scenarios where the order of events directly impacts the outcome, such as financial transactions or logical workflows. Kafka achieves this through its partitioning mechanism, where each partition is an ordered, immutable log.

Achieving Idempotence

Idempotence in Kafka producers is a configuration that ensures exactly-one write per message to a partition, even if retries occur. This is achieved by assigning a Producer ID (PID) and sequence number to each message. The broker deduplicates messages by tracking the PID and sequence number of each incoming record, thus preventing duplicate messages from being written to the log.

To enable idempotence, the enable.idempotence property must be set to true. This setting internally requires acks=all and retries greater than 0. The acks=all setting provides the strongest durability guarantee, where the leader-partition waits for a full set of in-sync replicas (ISR) to acknowledge the record before responding to the producer. This setting, however, increases latency due to the requirement for synchronous replication across the ISR.

Configuring the Kafka Producer

Configuring the Kafka producer for idempotent writes involves setting several properties. The acks property should be set to all to ensure that the producer waits for all in-sync replicas to acknowledge the write before considering it successful. The retries property should be set to a value greater than 0 to allow for retries in case of failures. Additionally, max.in.flight.requests.per.connection can be set up to 5 while still maintaining strict ordering when enable.idempotence is true.

Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

// Idempotent configuration for strict ordering and no duplicates
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);

Comparison of Acks Settings

The choice of acks setting significantly impacts the latency, durability, and ordering guarantees of the Kafka producer. The following table summarizes the effects of different acks settings:

Acks SettingLatencyDurabilitySequential Guarantee
acks=0LowestNoneLow (Loss possible)
acks=1MediumLeader Log OnlyHigh (on single leader)
acks=allHighestFull ISR SetMaximum
idempotence=trueMedium-HighMaximumGuaranteed up to 5 in-flight

Conclusion

In conclusion, configuring Kafka producers for idempotent writes and achieving sequential guarantees is crucial for applications requiring strict message ordering. By understanding the log topology, enabling idempotence, and configuring the producer properties appropriately, developers can ensure that their Kafka-based systems provide the necessary guarantees for their use cases.

Sources

[1] https://www.linkedin.com/posts/niazullah096_kafka-systemdesign-distributedsystems-activity-7394125962289848320-E4tJ [2] https://www.geeksforgeeks.org/apache-kafka/apache-kafka-idempotent-producer/