From Déjà Vu to Done Right: Ensuring Event Singularity in Distributed Systems like AWS Event Bridge

Bogdan

21 May 2024 • 3 min read

Have you ever placed an order for food delivery and received multiple confirmation messages for the same order? It can be confusing and frustrating. Now imagine this happening in a computer system, where the same instruction gets processed multiple times. Welcome to the world of duplicate events, a critical issue to understand when building reliable distributed systems

But why is this a problem? Event-driven systems, which rely on events to trigger actions in different parts of the architecture, are becoming more common in modern cloud stacks. If you've ever used AWS EventBridge, SQS, SNS, Google Pub/Sub, Firebase, or their Azure equivalents, then you've interacted with systems that produce duplicate events requiring careful handling.Achieving exactly-once delivery is a challenge in distributed systems. Unlike traditional systems, where everything happens in a single, controlled environment, distributed systems involve multiple, independent computers working together. This introduces complexities like network failures and node crashes, making it difficult to guarantee that a message is delivered and processed only once by all participants.

For a deeper understanding of these challenges, I highly recommend reading "You Cannot Have Exactly-Once Delivery"; which explores the inherent limitations of achieving exactly-once delivery in distributed systems.

The concept of consensus plays a crucial role in distributed systems. Consensus refers to the ability of all nodes to agree on a single, shared state, even in the presence of failures. Achieving exactly-once delivery ensures that an event is processed only once by all participants, despite network failures or node crashes. However, as explored in the following articles, it's not possible to achieve both perfect consensus and exactly-once delivery in a distributed system where even a single node can fail.

"Understanding Consensus" provides a good introduction to the concept of consensus in distributed systems.
"Impossibility of Distributed Consensus with One Faulty Process" (a more technical paper) delves deeper into the impossibility of achieving both perfect consensus and exactly-once delivery with a single faulty node.

Now that we understand this behavior, let's explore two popular approaches to ensure reliability in your event-driven systems, even in the face of potential duplicate events.

Option 1: Idempotent Functions

a diagram exemplifying Idempotent as looking at a cake vs eating it

An idempotent function is one that produces the same outcome regardless of how many times it's executed with the same input. This characteristic proves invaluable in handling duplicate events.

Example: Imagine a lambda function that updates a user profile setting to "dark mode". With an idempotent function, even if the event triggering the update is received multiple times, the user's profile will only be set to dark mode once.

Beyond Simple Updates: What about functions that modify values, like increasing an account balance? While trickier, idempotency can still be achieved by implementing a deduplication system alongside the function.

Option 2: Deduplication

Deduplication techniques help identify and discard duplicate events before they cause unintended consequences. This is typically achieved by checking for a unique identifier (e.g., a unique ID) associated with each event against a cache or database.

Identifying Duplicates: Upon receiving an event, the deduplication system verifies if a matching unique identifier already exists. If found, the event is considered a duplicate and skipped.
Processing New Events: If the unique identifier is not found, the event is processed as the first and only occurrence.

Applying Deduplication: In our account balance example, deduplication plays a crucial role (alongside a locking mechanism to prevent conflicts). This ensures that any value added to the account isn't duplicated due to duplicate events, and no other operation alters the balance simultaneously.

Building Robust Systems: While achieving perfect, exactly-once delivery is a theoretical ideal in distributed systems, the realities of network complexities and potential faults make it impractical. However, by understanding these limitations and utilizing strategies like idempotency and deduplication, you can build robust systems. These systems can gracefully handle duplicate events, minimizing disruptions and ensuring a smooth user experience.

Key Takeaways:

Idempotent functions guarantee the same outcome even with repeated executions for the same input.
Deduplication techniques identify and discard duplicate events using unique identifiers.
Both approaches, alongside techniques like locking mechanisms, help build reliable event-driven systems.
Understanding the limitations of exactly-once delivery allows for the implementation of practical solutions for handling duplicate events.

By incorporating these concepts, you can strengthen your understanding of building reliable distributed systems.

Feel free to send me an email leave a comment if you think I miss something.

Option 1: Idempotent Functions

Option 2: Deduplication

Sign up for more like this.