In my last blog about cache invalidation in microservices, I identified a common problem I needed to learn more about - partial failures. What happens when your database update succeeds but your event publish fails? Or vice versa? You end up with inconsistent state across your system.
The solution I kept encountering was the transactional outbox pattern. Learning more, I realized it solves the dual write problem - but it introduces a new challenge that requires understanding idempotency. Let’s explore both.
The Dual Write Problem
The issue occurs whenever you need to do two things atomically but they’re in different systems. Here’s a concrete example from AWS’s documentation that really clarified it for me:
Imagine a flight booking service. When a customer books a flight:
- You need to save the booking in your database
 - You need to send an event to the payment service to charge their card
 
function bookFlight(flightData) {
  // Write 1: Database
  db.saveBooking(flightData);
  
  // Write 2: Message queue
  messageQueue.publish('flight.booked', flightData);
}What could go wrong?
Scenario 1: Database succeeds, message fails
db.saveBooking() → SUCCESS
messageQueue.publish() → FAIL (network issue)
Flight is booked in your database, but payment service never gets notified. Customer gets a free flight, you lose money.
Scenario 2: Message succeeds, database fails
db.saveBooking() → FAIL (constraint violation)
messageQueue.publish() → SUCCESS
Payment service charges the customer, but no booking exists. Customer gets charged for nothing, you get angry customers.
Scenario 3: Database succeeds, service crashes before publishing
db.saveBooking() → SUCCESS
*service crashes*
messageQueue.publish() → NEVER HAPPENS
Same as scenario 1 - inconsistent state.
This is the same problem I encountered with cache invalidation - you can’t wrap database and message queue operations in a single transaction because they’re different systems.
The Transactional Outbox Solution
The key insight: instead of publishing to the message queue directly, write the event to a table in your database in the same transaction as your data.
Here’s how it works:
function bookFlight(flightData) {
  db.transaction(() => {
    // Write to main table
    db.flights.insert({
      id: flightData.id,
      passenger: flightData.passenger,
      destination: flightData.destination,
      price: flightData.price
    });
    
    // Write to outbox table (same transaction!)
    db.outbox.insert({
      aggregateId: flightData.id,
      eventType: 'flight.booked',
      payload: JSON.stringify(flightData),
      createdAt: Date.now()
    });
  });
}Now both writes succeed or both fail together. You have atomicity.
With the event is just sitting in a database table, how does it get to the payment service?
The Event Processor
A separate background service (often called a relay or publisher) polls the outbox table and publishes events:
// Runs periodically (e.g., every 100ms)
function processOutbox() {
  const events = db.outbox.findUnprocessed(batchSize: 100);
  
  for (const event of events) {
    try {
      // Publish to message queue
      messageQueue.publish(event.eventType, event.payload);
      
      // Mark as processed or delete
      db.outbox.delete(event.id);
    } catch (error) {
      // Log error, will retry on next poll
      logger.error('Failed to publish event', event.id, error);
    }
  }
}This processor can retry failures, handle backpressure, and guarantee eventual delivery. If it crashes, it just resumes polling when it restarts.
Why This Works
- Separation
 
- Your application logic only cares about the database transaction (simple, fast, atomic)
 - The event processor handles the messy details of reliable message delivery (retries, failures, ordering)
 
If the flight booking fails, nothing gets written - not even to outbox. If it succeeds, the event is guaranteed to be in the outbox and will eventually be published.
Implementation Approaches
There are two main ways to implement this:
1. Outbox Table with Polling
What I showed above. You create an explicit outbox table and poll it.
Pros:
- Works with any database
 - Simple to understand
 - Full control over retry logic and ordering
 - Easy to debug (events visible in database)
 
Cons:
- Need to run a separate polling service
 - Polling adds latency (though usually 100-500ms is acceptable)
 - Outbox table can grow large (need cleanup strategy)
 
2. Change Data Capture (CDC)
Instead of polling, use your database’s transaction log. Tools like Debezium or AWS DynamoDB Streams read the log and automatically publish events.
// Your application code - no explicit outbox!
function bookFlight(flightData) {
  db.flights.insert({
    id: flightData.id,
    passenger: flightData.passenger,
    destination: flightData.destination,
    price: flightData.price,
    // Add event metadata as attributes
    eventType: 'flight.booked'
  });
}
 
// CDC tool watches the transaction log
// and automatically publishes changesPros:
- No polling service needed
 - Lower latency (near real-time)
 - No additional outbox table
 - Guaranteed ordering (follows transaction log order)
 
Cons:
- Requires CDC-capable database or external tool
 - More complex infrastructure
 - Harder to debug (events aren’t explicitly stored)
 - Need to filter out which changes should become events
 
When to Use Transactional Outbox
Use it when:
- You need to update a database AND publish an event atomically
 - Data consistency is critical (payments, orders, inventory)
 - You’re building event-driven microservices
 - You want reliable event delivery with retries
 
Skip it when:
- Your events don’t need to be guaranteed (nice-to-have notifications)
 - You can tolerate occasional lost events
 - The complexity isn’t worth it for your use case (simple CRUD apps)
 - You can use simpler patterns (like TTL-based cache invalidation)
 
The Tradeoffs
Complexity vs Reliability: You’re adding infrastructure (outbox table, processor service, or CDC tooling). Is the reliability worth it?
Latency: Polling adds 100-500ms delay. CDC is faster but more complex. Can your system tolerate this?
Storage: Outbox tables grow. You need a cleanup strategy (delete after processing, archive old events, TTL).
Ordering: Do events need to be processed in order? Outbox can guarantee this with sequence numbers. Message queues might not.
Enter Idempotency: The Second Part of the Puzzle
Here’s where it gets interesting. The transactional outbox pattern guarantees events will be delivered at least once. But notice those words: “at least once.”
What if the event processor publishes an event successfully, but crashes before it can delete it from the outbox? On restart, it will publish the same event again.
Or what if the message queue itself delivers the message twice? Many message systems (like AWS SQS standard queues) guarantee “at least once delivery” but not “exactly once.”
So now your payment service receives the flight.booked event twice:
Event 1: flight.booked { id: 123, price: 500 }
Event 2: flight.booked { id: 123, price: 500 } // duplicate!
If your payment service naively processes both, you just charged the customer twice. Not good.
What is Idempotency?
Idempotency means: processing the same operation multiple times has the same effect as processing it once.
Mathematical example: abs(-5) = 5 and abs(abs(-5)) = 5 - applying it twice gives the same result.
For our payment service, it means: receiving the flight.booked event twice should only charge the customer once.
Making Your Services Idempotent
There are several strategies:
1. Idempotency Keys
Track which events you’ve already processed:
function processFlightBooked(event) {
  const idempotencyKey = event.id; // or event-specific unique ID
  
  // Check if we've seen this before
  const alreadyProcessed = db.processedEvents.exists(idempotencyKey);
  if (alreadyProcessed) {
    logger.info('Event already processed, skipping', idempotencyKey);
    return; // Skip duplicate
  }
  
  // Process the payment
  db.transaction(() => {
    payments.charge(event.passenger, event.price);
    
    // Mark as processed (same transaction!)
    db.processedEvents.insert({
      eventId: idempotencyKey,
      processedAt: Date.now()
    });
  });
}The key insight: checking and marking as processed happens in the same transaction as your business logic.
2. Natural Idempotency
Design your operations to be naturally idempotent:
// NOT idempotent
function processPayment(event) {
  const currentBalance = db.getBalance(event.passenger);
  db.setBalance(event.passenger, currentBalance - event.price);
}
// Called twice: balance decreases twice!
 
// Idempotent
function processPayment(event) {
  db.payments.upsert({
    flightId: event.id,
    passenger: event.passenger,
    amount: event.price,
    status: 'completed'
  });
}
// Called twice: same record updated, same resultUsing unique constraints (like flightId) makes the operation naturally idempotent.
3. Versioning
Include a version number in your events:
{
  flightId: 123,
  version: 5,
  price: 500
}Only process events with version > current version. Duplicates with the same version are ignored.
The Combined Pattern
Here’s how transactional outbox and idempotency work together:
Publisher Side (Flight Service):
function bookFlight(flightData) {
  db.transaction(() => {
    db.flights.insert(flightData);
    
    // Outbox with unique event ID
    db.outbox.insert({
      eventId: generateUUID(), // unique per event
      aggregateId: flightData.id,
      eventType: 'flight.booked',
      payload: flightData
    });
  });
}
 
// Processor publishes with eventId
function processOutbox() {
  const events = db.outbox.findUnprocessed();
  for (const event of events) {
    messageQueue.publish({
      eventId: event.eventId, // included in message
      eventType: event.eventType,
      payload: event.payload
    });
    db.outbox.delete(event.id);
  }
}Consumer Side (Payment Service):
function handleFlightBooked(message) {
  const eventId = message.eventId;
  
  // Idempotency check
  if (db.processedEvents.exists(eventId)) {
    return; // Already processed
  }
  
  db.transaction(() => {
    // Process payment
    payments.charge(message.payload.passenger, message.payload.price);
    
    // Mark as processed
    db.processedEvents.insert({ eventId, processedAt: Date.now() });
  });
}Now you have:
- Guaranteed delivery (transactional outbox)
 - No duplicate processing (idempotency keys)
 - Consistency (everything in transactions)
 
Practical Considerations
How long to keep processed events? You need to keep idempotency keys longer than the maximum time a duplicate could arrive. If your message queue can delay messages up to 14 days (AWS SQS maximum), keep keys for 15 days. Then clean them up.
What about ordering? Transactional outbox preserves order (via sequence numbers or timestamps). But if you scale to multiple consumer instances, they might process events in parallel. If order matters, use message queue features like Kafka partitions or SQS FIFO queues.
Performance impact? Checking idempotency keys adds a database lookup per event. This is usually fast with proper indexing, but it’s not free. For high-throughput systems, consider caching recently processed keys in memory.
What I Learned
The transactional outbox pattern elegantly solves the dual write problem by turning two separate writes into a single atomic database transaction. But reliability comes with a consequence: duplicate events.
Idempotency is the natural complement - it ensures that duplicate delivery doesn’t cause duplicate effects. Together, these patterns form the foundation of reliable event-driven systems.
The key insight is that these aren’t optional features you add later. If you’re building microservices with events, you need to think about both from the start:
- How do I guarantee my events are published? (Outbox)
 - What happens if they’re delivered twice? (Idempotency)
 
Like cache invalidation, this reinforced that distributed systems are all about handling failure modes. You can’t prevent failures, so you design systems that work correctly even when failures happen.
This connects directly back to my cache invalidation blog - transactional outbox solves the partial failure problem I identified there. When you update data and need to invalidate caches across services, outbox ensures your invalidation events are reliably delivered, and idempotency ensures duplicate events don’t cause problems.
Cheers!
Backlinks