Designing a Payment Aggregator with a high transaction success rate involves handling multiple Payment Service Providers (PSPs), optimizing routing, retry mechanisms, and fault tolerance.
Let's look into FR
FR
1.Merchants can register and integrate with the aggregator.
2.Each merchant has configuration for supported PSPs, routing preferences, and SLAs.
3.Support for multiple PSPs (Razorpay, PayU, Stripe, etc.).
4.PSP configurations stored with credentials, priority, and status.
5.Expose a secure API for initiating payments.
6.Dynamically select the PSP based on:
PSP success rate (historical & real-time).
Type of payment (card, UPI, netbanking).
Merchant preferences.
Geo/IP or BIN-based mapping.
PSP latency/availability
7.Limit retries to avoid multiple charge attempts.
8.Handle asynchronous responses from PSPs.
9.Notify merchants on transaction status updates.
NFR
1.End-to-end transaction time ≤ 2 seconds (preferred).
2.System should be highly available
3.PSP routing and retries should be completed within acceptable customer wait time (~5 seconds max).
4.Maintain real-time metrics per PSP (success %, failure reason).
5.System should be highly scalable
6.PCI-DSS Compliance for card data.
7.Encrypt all sensitive data (in transit and at rest).
API’s
Here are the core APIs for a Payment Aggregator System that ensures high transaction success rates. These are grouped based on use cases: merchant onboarding, payment processing, routing logic, and reconciliation.
1.POST /api/v1/merchants
Register a new merchant.
Request:
{
"merchant_name": "ShopifyX",
"contact_email": "support@shopifyx.com",
"callback_url": "https://shopifyx.com/payment-callback"
}
Response:
{
"merchant_id": "m_123456",
"api_key": "abc123xyz"
}
2.GET /api/v1/merchants/{merchant_id}
Get merchant details
3.PATCH /api/v1/merchants/{merchant_id}/psp-config
Configure PSP preferences per merchant
{
"preferred_psps": ["razorpay", "payu"],
"fallback_psps": ["stripe"],
"psp_weights": {
"razorpay": 70,
"payu": 30
}
}
4.POST /api/v1/payments/initiate
Initiate a payment
Request:
{
"merchant_id": "m_123456",
"order_id": "ORD_98765",
"amount": 799.00,
"currency": "INR",
"payment_method": "card",
"card_info": {
"number": "4111111111111111",
"expiry": "12/26",
"cvv": "123"
},
"customer_info": {
"email": "user@example.com",
"phone": "+919000000000"
},
"redirect_url": "https://shopifyx.com/order-complete"
}
Response:
{
"transaction_id": "txn_abc123",
"redirect_url": "https://payment.aggregator.com/checkout/txn_abc123"
}
5.GET /api/v1/payments/{transaction_id}/status
Get current status of a transaction
Response:
{
"transaction_id": "txn_abc123",
"status": "SUCCESS",
"psp_used": "razorpay",
"payment_method": "card",
"amount": 799.00,
"completed_at": "2025-06-08T18:12:00Z"
}
6.POST /api/v1/payments/webhook
Handle PSP callback/webhook
Request (from PSP):
{
"transaction_id": "txn_abc123",
"status": "SUCCESS",
"psp_transaction_id": "rzp_txn_0009",
"auth_code": "AUTH456"
}
7.POST /api/v1/payments/{transaction_id}/retry
Manually trigger a retry (optional)
{
"retry_reason": "network_error"
}
8.GET /api/v1/routing/suggestion
Get routing suggestion (internal use)
Request:
{
"merchant_id": "m_123456",
"payment_method": "card",
"bin": "411111",
"geo": "IN"
}
Response:
{
"psp": "payu",
"reason": "higher success rate for BIN 411111 in IN"
}
9.GET /api/v1/settlements
Fetch settlement report per merchant
Query:
?merchant_id=m_123456&date=2025-06-07
Response:
{
"settled": true,
"transactions": [
{
"transaction_id": "txn_abc123",
"amount": 799.00,
"status": "SUCCESS",
"settled_on": "2025-06-08T04:00:00Z"
}
]
}
10.GET /api/v1/transactions
Search transactions
Query:
?merchant_id=m_123456&status=FAILED&from=2025-06-01&to=2025-06-07
11.POST /api/v1/admin/psps
Add or update PSP integration
{
"psp_name": "cashfree",
"api_endpoint": "https://api.cashfree.com/v1/payment",
"success_rate": 0.97,
"status": "active"
}
GET /api/v1/admin/psps/health
Check real-time health & performance of all PSPs
Databases and Schema
| Component | Recommended DB | Why? |
| -------------------------- | ---------------------- | ----------------------------------- |
| Core transactional data | **PostgreSQL / MySQL** | Strong consistency, relational |
| Routing intelligence | **Redis / DynamoDB** | Fast access, TTL for real-time data |
| Analytics, success rates | **ClickHouse / Druid** | OLAP + high ingestion + query perf |
| Logging & audit trail | **MongoDB / Elastic** | Flexible schema, fast ingestion |
| PSP config, merchant prefs | **PostgreSQL / etcd** | Structured, transactional config |
✅ 1. merchants
Table
CREATE TABLE merchants (
merchant_id UUID PRIMARY KEY,
name VARCHAR(255),
contact_email VARCHAR(255),
callback_url TEXT,
api_key TEXT UNIQUE,
created_at TIMESTAMP DEFAULT now()
);
✅ 2. merchant_psp_config
Merchant-specific preferences and routing weights.
CREATE TABLE merchant_psp_config (
merchant_id UUID REFERENCES merchants(merchant_id),
psp_name VARCHAR(50),
weight INT, -- weight in routing decisions
priority INT, -- fallback order
enabled BOOLEAN DEFAULT TRUE,
PRIMARY KEY (merchant_id, psp_name)
);
✅ 3. transactions
Table
Main transactional store — strongly consistent.
CREATE TABLE transactions (
transaction_id UUID PRIMARY KEY,
merchant_id UUID REFERENCES merchants(merchant_id),
order_id VARCHAR(100),
amount DECIMAL(10,2),
currency CHAR(3),
payment_method VARCHAR(50),
psp_used VARCHAR(50),
status VARCHAR(20), -- PENDING, SUCCESS, FAILED
failure_reason TEXT,
started_at TIMESTAMP,
completed_at TIMESTAMP,
retry_count INT DEFAULT 0
);
CREATE INDEX idx_merchant_order ON transactions(merchant_id, order_id);
✅ 4. psp_metrics
(Real-time stats store, e.g., Redis or DynamoDB)
Used for intelligent routing decisions.
Key: "psp:razorpay"
Value:
{
"success_rate_1min": 0.93,
"success_rate_5min": 0.91,
"latency_avg": 300, // in ms
"last_failure": "Bank timeout",
"updated_at": "2025-06-08T17:00:00Z"
}
TTL of 10–15 minutes for metrics
Sliding window logic using Redis Sorted Sets or streaming pipeline
✅ 5. payment_methods
Table (Optional reference)
CREATE TABLE payment_methods (
method_id SERIAL PRIMARY KEY,
name VARCHAR(50), -- card, upi, netbanking, etc.
type VARCHAR(50), -- offline, realtime
enabled BOOLEAN DEFAULT TRUE
);
✅ 6. psps
Table
Stores all supported PSPs and their configs.
CREATE TABLE psps (
psp_name VARCHAR(50) PRIMARY KEY,
api_endpoint TEXT,
status VARCHAR(20), -- active, down, maintenance
priority INT,
default_weight INT,
success_rate_estimate FLOAT,
created_at TIMESTAMP
);
✅ 7. webhook_logs
(MongoDB or PostgreSQL)
{
"transaction_id": "txn_abc123",
"psp": "razorpay",
"status": "SUCCESS",
"payload": {...},
"received_at": "2025-06-08T18:12:00Z"
}
✅ 9. settlements
Table
CREATE TABLE settlements (
settlement_id UUID PRIMARY KEY,
transaction_id UUID REFERENCES transactions(transaction_id),
merchant_id UUID,
settled_amount DECIMAL(10,2),
settled_on TIMESTAMP,
status VARCHAR(20) -- PENDING, SETTLED
);
🔁 TTL and Cleanup Strategy
Redis keys for PSP metrics: TTL = 10 min
Audit & logs in Elastic/Mongo: TTL = 30 days (configurable)
Archived transactions: Partitioned table or moved to cold storage monthly
HLD
🔁 Service Interaction Diagram
Sequence: Payment Initiation Flow
[Merchant] → [API Gateway] → [Payment Orchestrator]
↘
→ [Merchant Service] → Validate merchant, fetch PSP config
→ [Routing Engine] → Decide best PSP
→ [PSP Adapter: Razorpay] (1st attempt)
↘
→ [Transaction Service] → Log attempt
→ [Audit Service] → Log routing and PSP status
→ [Webhook Handler] (PSP async callback)
→ [Transaction Service] → Update final status
→ [Notification Service] → Notify merchant
🧠 Routing Engine Logic
Maintains:
Real-time PSP health (via Analytics Service)
Per-merchant PSP weights
BIN-level & geo-based routing hints
Returns: best ranked PSP, and fallback list
🧾 Reconciliation Flow
[Reconciliation Service] → [Transaction Service] → Get all daily transactions
→ [PSP Adapter] → Pull PSP-side reports
→ Compare & generate settlement report
→ Update [Settlements DB]
→ Notify merchants
📊 Analytics Flow
Receives event stream (e.g., Kafka) from:
Transaction outcomes
Webhook responses
PSP Adapter latencies
Aggregates PSP stats: success %, latency, failure trends
Pushes summary data to Redis/DynamoDB for Routing Engine
🔐 Security & Reliability Considerations
Idempotency keys for payment APIs
Circuit breakers in PSP adapters
Rate limiting at API Gateway
Retries with exponential backoff
Chaos testing for PSP failures
⚙️ Deployment Notes
All services should be:
Containerized (Docker)
Deployed via Kubernetes
Use Service Mesh (e.g., Istio) for observability and resilience
✅ Step-by-Step Explanation
1. [Merchant] → [API Gateway]
The merchant’s system makes an HTTPS
POST /payments/initiate
call.The API Gateway performs:
Authentication (via API key or OAuth)
Rate limiting
Routing the request to internal microservices
2. → [Payment Orchestrator]
The Payment Orchestrator is the brain of the operation.
It manages:Overall state of the transaction
Coordination of retries and PSP switching
Timeout handling
3. → [Merchant Service]
The Orchestrator calls the Merchant Service to:
Validate the
merchant_id
, check if it is active.Retrieve PSP configuration:
Preferred PSPs (e.g., Razorpay, PayU)
PSP weights or priorities
Retry policy (e.g., how many fallbacks allowed)
4. → [Routing Engine]
The Routing Engine uses:
Real-time success rate of each PSP
Merchant-specific preferences
BIN-based and geo-specific heuristics
It returns the best PSP to try first (e.g., Razorpay) and a fallback list.
5. → [PSP Adapter: Razorpay]
The Orchestrator forwards the transaction request to the PSP Adapter (e.g., Razorpay).
PSP Adapter is a microservice or SDK wrapper that:
Converts internal format to PSP-specific format
Makes a network call to Razorpay
Handles timeouts, retries, circuit breakers
6. → [Transaction Service]
As soon as the attempt is initiated, the Transaction Service:
Creates a
transaction_id
Stores initial metadata: amount, PSP used, merchant ID
Status is set to PENDING
7. → [Audit Service]
Logs metadata like:
Which PSP was selected
Latency
Retry history
Any routing decisions
Useful for debugging, analytics, and compliance audits.
8. → [Webhook Handler]
PSPs often respond asynchronously (e.g., Razorpay sends a callback when the transaction finishes).
The Webhook Handler:
Validates the authenticity of the webhook (HMAC or signature)
Extracts PSP response (e.g., SUCCESS, FAILED, etc.)
9. → [Transaction Service] (again)
The webhook response is passed to Transaction Service, which:
Updates the final status of the transaction (
SUCCESS
,FAILED
)Records PSP's response code and message
10. → [Notification Service]
Once the transaction status is finalized:
The Notification Service notifies the merchant via:
HTTP callback (
callback_url
)WebSocket update
Email/SMS (optional)
The merchant uses this update to show success/failure to the end customer.
🔁 If Failure Occurs
If the PSP Adapter call:
fails (timeout, error code) or
the webhook indicates failure (e.g., payment declined)
Then:
Payment Orchestrator consults Routing Engine again
Picks the next best PSP from the fallback list
Repeats steps 5–10
Tracks retry count to avoid infinite loops
🔁 Transaction Service Invocation Points
✅ 1. Initial Invocation – by Payment Orchestrator
When?
Right before the first call to the PSP Adapter is made.
Why?
To:
Create a new transaction record
Set initial status =
PENDING
Store metadata: merchant_id, order_id, amount, chosen PSP, etc.
How?
[Payment Orchestrator] → [Transaction Service]
✅ 2. Post PSP Response – by PSP Adapter
When?
Immediately after receiving a synchronous PSP response (if the PSP supports sync flow).
Why?
To update status:
SUCCESS
orFAILED
Log PSP-specific fields: response code, transaction_ref_id
How?
[PSP Adapter] → [Transaction Service]
✅ 3. Webhook Callback – by Webhook Handler
When?
After receiving an asynchronous webhook from the PSP.
Why?
Update the final status of the transaction (if not already final)
Capture failure reasons, timestamps, PSP IDs
How?
[Webhook Handler] → [Transaction Service]
✅ 4. Retry Flow – by Payment Orchestrator
When?
If a PSP fails and a retry is initiated with a fallback PSP.
Why?
Update the existing transaction with:
New retry count
PSP fallback attempted
Time of retry
Avoid creating a new transaction; append retry details to the same record.
How?
[Payment Orchestrator] → [Transaction Service]
HLD