How do we resolve write conflicts in a leaderless architecture or multi-leader systems?
Resolving write conflicts in a leaderless architecture or multi-leader systems is critical because these systems lack a single authoritative source of truth.
Multiple ways are there to resolve them
1. Last Write Wins (LWW)
How It Works:
Each write is timestamped (logical or physical).
When conflicts occur, the system keeps the version with the most recent timestamp and discards older versions.
Advantages:
Simple to implement and efficient.
Provides deterministic conflict resolution.
Disadvantages:
Can lead to data loss, as it arbitrarily discards conflicting updates.
Relies heavily on clock synchronization (e.g., NTP).
Used In:
DynamoDB (with vector clocks for added safety).
Riak (with configurable conflict resolution).
2. Version Vectors or Vector Clocks
How It Works:
Each replica tracks versions using a vector clock or version vector.
When conflicts occur, versions that cannot be causally ordered are retained.
Applications or users must resolve the conflict manually or using custom logic.
Advantages:
Preserves all conflicting versions, ensuring no data is lost.
Captures causality between operations.
Disadvantages:
More complex to implement and manage.
Requires application-level logic for conflict resolution.
Used In:
Dynamo, Riak, and systems inspired by Dynamo's architecture.
3. Application-Assisted Conflict Resolution
How It Works:
The database detects conflicting writes and returns all versions to the client or application.
The application implements custom logic to resolve the conflict based on business rules.
Advantages:
Highly flexible; the application can resolve conflicts in a domain-specific way.
No arbitrary data loss.
Disadvantages:
Shifts complexity to the application layer.
Increases latency as conflicts require application intervention.
Used In:
CouchDB, where applications receive conflicting document revisions and decide how to merge them.
4. Sibling Merging
How It Works:
When conflicts occur, all conflicting versions (siblings) are stored.
The system merges these siblings using a predefined logic, such as:
Union: Combine conflicting values (e.g., shopping cart items).
Custom merge functions: Application-defined merge logic.
Advantages:
Flexible and avoids data loss.
Useful in scenarios like shopping carts or sets, where merging is intuitive.
Disadvantages:
Requires careful design of merge functions.
Inefficient if sibling versions grow excessively.
Used In:
DynamoDB, Riak (siblings can be merged via application logic).
5. Operational Transformation (OT)
How It Works:
Used in systems like collaborative editing, where conflicting writes are transformed into non-conflicting operations.
For example, two users editing a document can have their changes transformed to maintain intent consistency.
Advantages:
Ideal for real-time collaboration.
Ensures consistency without losing intent.
Disadvantages:
Complex to implement.
Best suited for specific use cases like text or structured document editing.
Used In:
Google Docs, collaborative systems like CRDT-based databases.
6. Conflict-Free Replicated Data Types (CRDTs)
How It Works:
Data structures are designed to resolve conflicts automatically using mathematical properties.
For example:
G-Counter: Grow-only counters that always converge.
LWW-Register: Registers using timestamps to resolve conflicts.
Sets and Maps: Merge using union operations.
Advantages:
Automatic resolution with guaranteed convergence.
No need for manual conflict resolution or application logic.
Disadvantages:
Limited to certain data structures (e.g., counters, sets, maps).
May require redesign of application logic to use CRDTs.
Used In:
Systems like Riak (using CRDTs for data types).
Collaborative tools and eventually consistent databases.
7. Custom Conflict Resolution Policies
How It Works:
The database allows users to define custom conflict resolution logic, such as:
Prioritized writes: Always prefer writes from specific nodes.
Weighted writes: Resolve conflicts based on predefined weights or roles.
Application rules: Resolve based on domain-specific requirements (e.g., sum the values, pick the max, etc.).
Advantages:
Fully customizable to meet application requirements.
Provides deterministic resolution for specific use cases.
Disadvantages:
Requires detailed understanding of application behavior.
Complexity in implementation.
Used In:
Riak and other configurable distributed databases.
8. Tombstones for Deletes
How It Works:
When a delete operation conflicts with an update, a tombstone is used to mark the data as deleted.
The system resolves the conflict based on timestamps or other metadata.
Advantages:
Handles conflicts between writes and deletes effectively.
Ensures eventual consistency even in delete scenarios.
Disadvantages:
Requires periodic cleanup of tombstones to avoid resource overhead.
Used In:
Cassandra, DynamoDB.
Choosing the Right Strategy
The choice depends on the application requirements:
LWW: Suitable for scenarios where the latest update is always the most important (e.g., caching, metadata updates).
Vector Clocks or Application Logic: Useful in systems with complex business rules or frequent concurrent updates.
CRDTs or OT: Best for collaborative applications requiring automatic conflict resolution.
Sibling Merging: Ideal for aggregating or combining data (e.g., shopping carts).
Each strategy has trade-offs, and many systems combine multiple approaches to handle conflicts effectively.