Avoiding Double Refunds: How We Solved a Race Condition in Our Payout System
How we used optimistic locking to prevent double refunds and maintain wallet accuracy under high concurrency.
When two processes β the webhook response and the status-check job β handled failed payouts simultaneously, our users sometimes got refunded twice. This post explains how our payout system encountered a concurrency issue and how we fixed it using optimistic locking.
π₯ The Problem
At first glance, our payout flow looked simple:
- A user requests a payout.
- We send the request to our acquirer.
- The acquirer sends a webhook with the final status (success or failed).
- Separately, we run a scheduled status-check job to catch missed webhooks.
- If a payout fails β we refund the amount to the userβs wallet.
But occasionally, both the webhook and the job would detect a failure at almost the same moment β and both triggered refunds. The result? The userβs wallet balance increased twice πΈπΈ.
This concurrency race condition wasnβt frequent β but in financial systems, even one duplicate refund is unacceptable.
βοΈ The Solution: Optimistic Locking
Instead of introducing complex distributed locks, we implemented a lightweight optimistic locking mechanism at the wallet level.
How It Works
Optimistic locking assumes that conflicts are rare but possible. It works like this:
- Each wallet row has a version number (an integer column).
- When a process wants to update the wallet balance, it checks the version number it last read.
- The update query succeeds only if the version hasnβt changed.
- If another process already modified the wallet, the current update fails gracefully β triggering a retry or log instead of a duplicate refund.
Example schema:
ALTER TABLE wallets ADD COLUMN version INT DEFAULT 0 NOT NULL;
Example update query:
UPDATE wallets
SET balance = balance + 100, version = version + 1
WHERE id = 123 AND version = 5;
If no row is affected (because the version has changed), the application knows someone else already updated it β avoiding double refunds.
π Benefits
Implementing optimistic locking brought immediate benefits:
- π§© Prevents double refunds or incorrect balances even under high concurrency.
- β‘ No heavy database locks β safe for distributed systems and jobs.
- π Easy to implement for any entity where state accuracy is critical.
- π€ Works perfectly with redundancy, like webhook + scheduled job architecture.
π§ͺ Testing the System
Before deploying, we stress-tested the new logic to ensure it handled real-world concurrency.
Test Scenarios:
- Simulate concurrent payout failures from both webhook and job processes using load-testing tools like JMeter or Artillery.
- Only one refund update should succeed; others should detect a version conflict.
- Check wallet balances after each test β they must remain consistent.
Expected Outcome:
- β One refund succeeds.
- β οΈ The second detects a version conflict and exits.
- π° Wallet balance remains correct.
π§ Lessons Learned
- Race conditions donβt always appear in development β but they always exist in distributed systems.
- Optimistic locking provides a clean, scalable safeguard without slowing down transactions.
- Monitoring and observability are just as important β logs must clearly show conflicts and retries.
- Small design improvements like this can save massive financial losses and improve user trust.
Final Thoughts
In payment systems, concurrency control is just as critical as correctness. By adding a version-based optimistic lock to our wallet updates, we eliminated duplicate refunds β without adding latency or operational complexity.
Simple fix. Huge impact.
β Protize Engineering Team