Avoiding Duplicate Operations: How We Solved a Race Condition in Our Payment System
How we used optimistic locking to prevent duplicate operations and maintain data accuracy under high concurrency.
When two processes β the webhook response and the status-check job β handled failed tasks simultaneously, our users sometimes had their account credits restored twice. This post explains how our job processing system encountered a concurrency issue and how we fixed it using optimistic locking.
π₯ The Problem
At first glance, our task processing flow looked simple:
- A user submits a task request.
- We forward the request to our external processing service.
- The service sends a webhook with the final status (success or failed).
- Separately, we run a scheduled status-check job to catch missed webhooks.
- If a task fails β we restore the deducted credits back to the userβs account.
But occasionally, both the webhook and the job would detect a failure at almost the same moment β and both triggered credit restorations. The result? The userβs account balance increased twice πΈπΈ.
This concurrency race condition wasnβt frequent β but in systems that track account balances, even one duplicate restoration is unacceptable.
βοΈ The Solution: Optimistic Locking
Instead of introducing complex distributed locks, we implemented a lightweight optimistic locking mechanism at the account record level.
How It Works
Optimistic locking assumes that conflicts are rare but possible. It works like this:
- Each account row has a version number (an integer column).
- When a process wants to update the account balance, it checks the version number it last read.
- The update query succeeds only if the version hasnβt changed.
- If another process already modified the account, the current update fails gracefully β triggering a retry or log instead of a duplicate restoration.
Example schema:
ALTER TABLE accounts ADD COLUMN version INT DEFAULT 0 NOT NULL;
Example update query:
UPDATE accounts
SET balance = balance + 100, version = version + 1
WHERE id = 123 AND version = 5;
If no row is affected (because the version has changed), the application knows someone else already updated it β avoiding duplicate operations.
π Benefits
Implementing optimistic locking brought immediate benefits:
- π§© Prevents duplicate operations or incorrect balances even under high concurrency.
- β‘ No heavy database locks β safe for distributed systems and background jobs.
- π Easy to implement for any entity where state accuracy is critical.
- π€ Works perfectly with redundancy, like webhook + scheduled job architecture.
π§ͺ Testing the System
Before deploying, we stress-tested the new logic to ensure it handled real-world concurrency.
Test Scenarios:
- Simulate concurrent task failures from both webhook and job processes using load-testing tools like JMeter or Artillery.
- Only one restoration update should succeed; others should detect a version conflict.
- Check account balances after each test β they must remain consistent.
Expected Outcome:
- β One restoration succeeds.
- β οΈ The second detects a version conflict and exits cleanly.
- π° Account balance remains correct.
π§ Lessons Learned
- Race conditions donβt always appear in development β but they always exist in distributed systems.
- Optimistic locking provides a clean, scalable safeguard without slowing down operations.
- Monitoring and observability are just as important β logs must clearly show conflicts and retries.
- Small design improvements like this can prevent significant data integrity issues and improve user trust.
Final Thoughts
In systems that manage account state, concurrency control is just as critical as correctness. By adding a version-based optimistic lock to our account updates, we eliminated duplicate operations β without adding latency or operational complexity.
β Protize Engineering Team