RL AGENT
EBS
34%
CURRENEX
89%
HOTSPOT
72%
LMAX
81%
CBOE FX
66%
FILL RATE
Engineering·10 min read

Optimizing Execution Venue Selection with Reinforcement Learning

TC
Taehyung ChoExecution Systems Lead · January 28, 2026

Execution quality directly impacts alpha. A signal with a 2.0 Sharpe ratio and 2 bps of slippage is equivalent to a signal with a 1.5 Sharpe and no slippage. For systematic strategies trading thousands of times per day, the difference between good and great execution compounds massively.

This post describes how we built an RL agent that dynamically routes orders across 14 connected venues, reducing our average slippage from 1.2 bps to 0.3 bps — a 75% improvement.

The Problem

Traditional smart order routers use rule-based logic: if venue A has the best price, route to venue A. This ignores temporal dynamics — a venue's liquidity can evaporate in milliseconds (as we saw in the EURUSD/EBS incident), and the "best price" at order submission may not be the best price at fill time.

The RL Formulation

We model venue selection as a contextual bandit problem where: - State: Current liquidity snapshot across all venues, recent fill rates, spread history, time-of-day, pending macro events - Action: Which venue(s) to route to, what order type to use, how to split across venues - Reward: Negative slippage (implementation shortfall vs. arrival price)

The agent is trained on 18 months of historical execution data (~2.4 million fills), then fine-tuned daily on the previous 5 days of live data.

Key Learnings

  1. 1.Venue quality is time-varying. EBS might be the best FX venue at 8am London but the worst at 2pm New York. The RL agent learns these temporal patterns.
  2. 2.Splitting is usually better than concentrating. For orders above $500K notional, splitting across 2-3 venues reduces information leakage and improves average fill price.
  3. 3.Pre-event routing changes are critical. The agent learned to shift flow away from primary venues 30-45 minutes before major announcements — exactly the pattern our monitoring agent later codified as a rule.

Results

Average slippage: 1.2 bps → 0.3 bps. On $2B monthly trading volume, this saves approximately $1.8M per year in execution costs.

© 2026 Quantit