Security Engineering
Featured
Enhancing Transaction Security Using AWS WAF and Redis-Based Rate Limiting
Protize Engineering Team •
•
Updated
#AWS WAF
#Security
#Rate Limiting
#Redis
#Fintech
Enhancing Transaction Security Using AWS WAF and Redis-Based Rate Limiting
Fraud attempts and abusive traffic often spike during promotional events, merchant go‑lives, or seasonal peaks.
Protize adopted a layered defense strategy combining AWS WAF for L7 filtering with Redis-based rate limiting at the application edge to protect APIs, preserve capacity, and ensure predictable latency under attack.
1) Threat Landscape
- Credential stuffing against checkout endpoints.
- Card testing and high‑velocity retries on payment authorization.
- Bots scraping OTP flows and abusing free‑trial signups.
- Burst traffic from misconfigured client SDKs causing thundering herds.
Our goal: block bad traffic early, throttle suspicious clients gracefully, and never degrade service for good users.
2) Architecture at a Glance
- CloudFront → AWS WAF enforces managed rules + custom rulesets.
- ALB → NestJS API terminates TLS and forwards to service pods.
- Redis (Elasticache) stores counters, token buckets, and sliding windows per IP, merchantId, API key, and route.
- Async audit pipeline mirrors suspicious requests to a risk topic for offline analysis and model training.
3) AWS WAF Ruleset Strategy
- AWS Managed Rule Groups:
AWSManagedRulesCommonRuleSet,BotControl,KnownBadInputs. - Rate-Based Rules: per IP and per country spikes (
countvsblockactions). - Custom Regex rules: protect
/auth/*,/otp/*, and/payments/*with tight method + header checks. - Geo match and IP reputation lists to dampen known malicious ASNs.
- Labeling: Every WAF match adds labels (e.g.,
waf.bot,waf.rate) forwarded via headers for app decisions.
Observability: Sampled requests are shipped to Kinesis Firehose → S3 for Athena queries and dashboards.
4) Redis-Based Rate Limiting (App Edge)
We implemented a sliding window + token bucket hybrid:
- Sliding window ensures fairness over time (per 1s/10s/1m).
- Token bucket allows short bursts for legitimate spikes.
- Keys are composed as
rl:{route}:{merchantId}:{apiKey}:{ip}. - Limits are tied to merchant plan (Premium, Standard, Sandbox).
- Exemptions for our internal IPs, monitoring, and whitelisted webhooks.
Pseudocode Sketch
function checkLimit(key, limit, windowMs, burst) {
const now = Date.now();
// 1) consume burst tokens first
const tokens = redis.decrby(`${key}:burst`, 1);
if (tokens >= 0) return allow();
// 2) sliding window count
redis.zadd(`${key}:win`, now, `${now}`);
redis.zremrangebyscore(`${key}:win`, 0, now - windowMs);
const count = redis.zcard(`${key}:win`);
if (count > limit) return block();
return allow();
}
5) Coordinating WAF and App Limits
- WAF blocks obvious bad traffic and volumetric spikes close to the edge.
- App rate limits differentiate by merchant, API key, and route where business context matters.
- WAF labels (
x-waf-labels) are propagated to the app for risk scoring and logging. - 429 with
Retry-Afteris returned for throttled clients; WAF blocks receive a standard 403 JSON envelope to ease client debugging.
6) Securing OTP & Auth Flows
- Device fingerprinting (UA + IP + cookie entropy) used to set stricter OTP caps.
- One‑tap resend cooldowns backed by Redis keys.
- Per‑phone and per-IP sliding windows protect SMS providers and costs.
- Honeypot params catch simple bots without impacting UX.
7) Incident Playbook & Automation
- Anomaly alerts flow to Slack and Telegram with route and merchant tags.
- Auto‑mitigation upgrades rules (e.g., move
count → block) for 15 minutes after a threshold. - Runbooks let on-call engineers toggle rule severity via a small admin UI.
- Post‑incident analysis links WAF logs with app metrics to prevent regressions.
8) Results
| Metric | Before | After |
|---|---|---|
| Card testing success window | > 2 hours | < 10 minutes |
| OTP spam during promo peaks | Frequent | Rare |
| P95 latency under attack | 900ms | 180ms |
| False positives (weekly) | Moderate | Low |
9) Lessons Learned
- Push coarse‑grained blocks to the edge (WAF); keep fine‑grained controls in the app.
- Sliding windows + small bursts keep UX smooth while containing abuse.
- Labels from WAF → App create powerful, explainable risk signals.
- Always send machine‑readable error bodies for blocked/throttled traffic.
10) Next Steps
- Integrate score‑based adaptive limits using recent merchant traffic.
- Add per‑route dynamic limits controlled by feature flags.
- Feed WAF + app signals into a fraud scoring model for real‑time routing.
Authored by the Protize Engineering Team — November 2025.