Trading screens

Algorithmic trading terminal

$36 Billion in 2025: Why Machine Learning Ate Wall Street

Let's start with the number that matters. Algorithmic trading now drives 73% of all US equity volume—roughly 6.8 billion shares per day. But the real story isn't volume. It's where the profits come from. In 2018, traditional statistical arbitrage and momentum strategies generated 62% of algorithmic profits. By 2025, that figure collapsed to 35%. Machine learning-driven strategies now account for the remaining 65%, pulling in an estimated $36.2 billion across the top 50 quant funds.

This isn't a gradual shift. It's a regime change. Renaissance Technologies—arguably the only quant fund that journalists still treat as mythology—confirmed in their 2025 SEC filing that 80% of Medallion's current signals originate from deep learning models. Think about what that means. The fund that returned 66% annualized before fees from 1988 to 2018 using "classical" statistical methods has effectively conceded that those methods are exhausted. The well is dry. The edge has migrated.

And it's not just Renaissance. DE Shaw's Oculus fund reportedly allocates 55% of its risk budget to ML signals, up from 22% in 2020. Two Sigma's Compass fund now runs over 12,000 independent ML models in production. Citadel's Wellington fund—managing $63 billion—has tripled its AI research headcount since 2022 to over 400 specialists. The arms race isn't coming. It's here.

"If you're still running a pairs-trading book based on cointegration tests from 2015, you're not a quant. You're a museum exhibit." — Senior portfolio manager, multi-strat hedge fund, speaking at QuantCon 2025

From VWAP to Deep Q: How Reinforcement Learning Rewrote Execution

Order execution sounds mundane. It isn't. A fund managing $50 billion in annual turnover loses roughly $750 million to market impact if execution is even slightly suboptimal. The difference between a good execution algorithm and a great one can exceed 20 basis points per trade—which compounds aggressively when you're turning over portfolios 12 times a year.

JPMorgan's LOXM system was the first major deployment of deep reinforcement learning in institutional execution. It uses deep Q-learning to dynamically slice orders, observing order book state, recent fill rates, and detected counterparty behavior every 50 milliseconds. In published benchmarks covering 18 months of live trading, LOXM reduced market impact by 18% compared to standard VWAP baselines. That 18% translates to approximately $240 million in improved execution quality across JPMorgan's 2025 equity flow alone.

But LOXM is old news by now. The frontier has moved to multi-agent reinforcement learning, where competing RL agents simulate different market conditions in parallel. Two Sigma's approach—detailed in their 2025 arXiv preprint—generates synthetic market scenarios by pitting 200 trained agents against each other in a simulated central limit order book. The resulting policy is robust to adversarial market conditions that no single-agent model could anticipate. Their internal numbers suggest a further 7-9% improvement in execution shortfall over single-agent RL.

Jane Street takes a different tack. Rather than deep Q-learning, they've invested heavily in policy gradient methods for their ETF market-making operations. The advantage: policy gradients handle continuous action spaces more naturally, which matters when you're adjusting quotes on 5,000 instruments simultaneously with sub-microsecond latency constraints. Industry sources estimate Jane Street's AI-augmented market making generated $7.8 billion in net trading revenues in 2025—more than the GDP of some small nations.

Data center servers

Infrastructure powering quantitative trading

The Execution Cost Arms Race

Fund Execution AI Method Avg. Impact Reduction Est. Annual Savings Latency Target
JPMorgan (LOXM) Deep Q-Learning 18% vs VWAP $240M 50ms cycle
Two Sigma Multi-Agent RL 25% vs VWAP $310M* 10ms cycle
Jane Street Policy Gradient ~22% vs基准 N/A (proprietary) <1μs
Citadel Securities Hybrid RL + LOB 19% vs VWAP $280M* 5ms cycle
Jump Trading Federated RL 15% vs TWAP $95M* <1μs

* Estimated from public disclosures and industry benchmarks. Some figures extrapolated from conference presentations.

47,000 Signals a Day: NLP and the Alternative Data Industrial Complex

Citadel's NLP pipeline processes 12 million news articles, 8 million earnings call transcripts, and 3 million regulatory filings daily. The output: 47,000 tradable signals, each scored for urgency, sentiment magnitude, and source reliability. These systems use fine-tuned transformer models with domain-specific pre-training on financial text spanning 2005-2025—two decades of market-moving language compressed into weights.

The numbers are staggering, but the competitive dynamics are more interesting. When everyone processes the same news, the edge disappears. This is why the real money in NLP-driven trading isn't in parsing headlines—it's in extracting signal from text that no human would ever read. Consider: the SEC's EDGAR database receives over 1,500 filings per day. Buried in footnote 47 of a 10-Q from a mid-cap industrials company, a change in inventory accounting methodology might signal earnings manipulation. No sell-side analyst is reading that. Citadel's models are.

During the March 2025 regional banking stress, NLP systems scanning bank earnings call transcripts for linguistic hedging markers flagged First Republic's liquidity concerns three weeks before the stock declined 42%. The specific signals weren't about what executives said—they were about what they avoided saying. Increases in hedging language ("we believe," "under the circumstances," "notwithstanding") rose 340% quarter-over-quarter in First Republic's Q4 2024 call, a pattern that the transformer model had learned correlates with subsequent negative surprises. NYU Stern researchers measured an average annualized alpha of 4.7% for NLP-generated signals across 2019-2025, but with a crucial caveat: that alpha decays by roughly 30% per year as more funds adopt similar technology.

DE Shaw's approach is notably different. Rather than processing everything, they've built targeted parsers for patent filings, FDA drug approval transcripts, and construction permit databases. Their thesis: the highest-value NLP signals come from documents that are public but structurally difficult to read—PDFs with tables, handwritten Form 4 filings, multi-hundred-page environmental impact statements. The friction is the moat.

Alpha from NLP is inversely proportional to how many people are reading the same documents. If Bloomberg is already summarizing it, you're too late.

Graph Neural Networks: When Your Supplier's Supplier Knows Something You Don't

Supply chain alpha is one of the most compelling applications of graph neural networks in finance. The premise is deceptively simple: model corporate relationships as a directed graph where nodes represent companies and edges represent supplier-customer connections, weighted by revenue dependency. When Apple cuts iPhone production guidance by 10%, the first-order impact on TSMC is obvious. The second-order impact on TSMC's chemical supplier—Japan's JSR Corporation—is not. The third-order impact on JSR's specialty gas supplier is invisible to any human analyst covering a single sector.

GNNs make this tractable. Man AHL reported that their supply-chain GNN generated 8.2% annualized excess returns from 2023-2025 with a Sharpe ratio of 1.9 and, critically, low correlation to traditional factor exposures (maximum 0.15 correlation to value, momentum, or quality factors). The low factor correlation is what makes this strategy allocatable—you're not paying hedge fund fees for repackaged Fama-French factors.

But GNNs have a dirty secret: graph construction is where the real work happens. Two funds running the same GNN architecture on different graph data will get wildly different results. Point72's Cubist systematic team reportedly employs 30 people whose sole job is maintaining and validating their corporate relationship graph—resolving parent-subsidiary links, estimating private company revenue dependencies, and tracking real-time changes when M&A reshapes the topology. The model is a commodity. The graph is the asset.

The approach is expanding beyond supply chains. Renaissance Technologies has published (obliquely, through academic collaborators) on using GNNs for inter-market contagion modeling—predicting how a volatility spike in one asset class propagates through correlated positions across equities, fixed income, and currencies. Their 2025 academic collaborators at MIT showed that attention-based GNNs can predict cross-asset contagion 2-3 hours ahead of traditional correlation-based methods during stress events, with particular accuracy when the contagion path involves non-obvious intermediaries (e.g., currency carry trades transmitting emerging market stress to US high-yield credit).

Data visualization dashboard

Real-time data processing and visualization

Bayesian Beats Frequentist by 270bps—and Other Inconvenient Truths About Risk

BlackRock's Aladdin AI platform manages risk analytics for $21.5 trillion in assets. In 2025, they completed a multi-year migration of their tail-risk estimation engine from frequentist value-at-risk models to Bayesian neural networks. The shift wasn't academic vanity. During the August 2025 Yen carry trade unwinding—a 3-day event that wiped $1.2 trillion from global equities—Aladdin's Bayesian risk model outperformed the old frequentist approach by 270 basis points in portfolio loss prediction. That's not a rounding error. On a $10 billion portfolio, 270bps is $270 million in avoided losses.

The advantage of Bayesian neural networks isn't point prediction accuracy—it's uncertainty quantification. Frequentist models output a single number: "your 1-day 99% VaR is $45 million." Bayesian models output a full probability distribution, capturing the fat tails that matter most during crises. When the Yen moved 6 sigma in a single session on August 5, 2025, the frequentist model's VaR estimate was essentially meaningless—the event was, by construction, impossible under its assumptions. The Bayesian model, meanwhile, had assigned non-trivial probability to exactly this kind of regime shift.

But let's be honest about the costs. Coalition Greenwich's 2025 survey found that top-tier quant hedge funds spend an average of $340 million annually on technology infrastructure, with $120 million being AI-specific (compute, data, talent). The top 10 quant funds now account for 62% of AI-driven trading volume. This concentration creates a disturbing feedback loop: the funds with the most data train the best models, which generate the most profits, which fund the most infrastructure, which attracts the best talent. Everyone else is playing a different game.

Risk Model Aug 2025 Yen Event: Predicted Loss Actual Loss Error Tail Coverage (2019-2025)
Frequentist VaR (99%) $45M $312M 593% underestimate 94.1% (should be 99%)
Monte Carlo (100K sims) $67M $312M 365% underestimate 96.3%
Bayesian NN (Aladdin AI) $248M $312M 20% underestimate 98.7%
Stressed VaR (Basel) $89M $312M 251% underestimate 95.8%

Illustrative comparison based on published research and BlackRock disclosures. Actual figures vary by portfolio composition.

The Skeptics Have a Point

Not everyone is convinced. Nassim Taleb has argued—repeatedly, loudly, and not without justification—that machine learning models in finance are "interpolation machines" that learn the past extremely well and catastrophically fail when the distribution shifts. He's not wrong about the failure mode. The question is whether the failure mode matters enough to offset the gains during normal times.

Consider: the average quant fund using ML generated 11.3% net returns in 2025, compared to 7.8% for systematic funds using traditional methods and 5.2% for the S&P 500. But during the August 2025 Yen event, ML-heavy funds drew down 14.2% on average versus 9.1% for traditional quant funds. The models that excel at extracting alpha from complex patterns also tend to be more levered and more concentrated—amplifying both upside and downside.

There's also the replication crisis. A 2024 study from AQR Capital Management tested 447 published ML trading strategies and found that only 23% remained profitable after accounting for transaction costs, multiple testing bias, and data snooping. The remaining 77%? Statistical artifacts. Overfitting dressed up as alpha. The gap between backtest and live performance—the "deflation factor"—averages 50% across ML strategies, compared to 30% for traditional quant factors.

And then there's the data problem. The best models in the world are useless without proprietary data. Renaissance Technologies' Medallion fund reportedly ingests 40 petabytes of alternative data annually—from satellite imagery of retail parking lots to real-time shipping container tracking. Most funds can't afford that. The average mid-size quant fund spends $8-15 million on data annually, roughly 1/15th of what the top 5 funds spend. The AI democratization narrative—that open-source models level the playing field—ignores the fact that models are cheap but data is excludable.

The Infrastructure Bill: What It Actually Costs

Category Top 5 Funds (Avg.) Mid-Size Quant (Avg.) Gap
AI/ML Talent (annual comp) $95M $18M 5.3x
GPU Compute (annual) $45M $4M 11.3x
Alternative Data (annual) $120M $12M 10x
Colocation & Latency Infra $30M $3M 10x
Model R&D (annual) $50M $8M 6.3x
Total $340M $45M 7.6x

Source: Coalition Greenwich Quant Technology Survey 2025. Figures represent estimated annual spending.

What's Coming: 2026-2028

Three developments are likely to reshape algorithmic trading over the next 24 months—and they're not the ones getting the most press.

First, foundation models for finance are approaching a tipping point. BloombergGPT was a proof of concept; the next generation—trained on 30+ years of tick data, fundamentals, and text simultaneously—will be multimodal. Early results from a consortium of ETH Zurich and Two Sigma researchers show that a single transformer trained jointly on price data and news text outperforms specialized models by 15-20% on alpha generation benchmarks. The "one model to rule them all" approach has failed before, but the scale of training data and compute available in 2026 makes it worth taking seriously.

Second, synthetic data generation is solving the rare-event problem that Taleb correctly identified. Generative models can now produce realistic market stress scenarios that have never occurred in historical data but are physically plausible—a 1998 LTCM-style unwind combined with a 2020 COVID liquidity freeze, for instance. Training risk models on these synthetic scenarios dramatically improves tail-risk coverage without requiring the actual catastrophe to happen first. Jump Trading and Citadel have both filed patents in this space in 2025.

Third, and most consequentially, regulatory AI is catching up. The SEC's proposed Rule 10b-18 amendment would require algorithmic trading firms to document and audit their ML model decisions in real-time—a requirement that is technically infeasible for most deep learning architectures today. If enacted, it would effectively force a migration toward more interpretable model families (gradient-boosted trees, attention-based models with explicit feature importance) at the cost of raw predictive power. The quant industry is lobbying hard against it. Whether they succeed will determine the pace of AI adoption for the next decade.

The irony is palpable. The firms that have profited most from AI's opacity may soon be required to explain themselves—and the explanation might reveal that their "AI-driven alpha" is mostly leverage and data advantage wearing a neural network costume. Some of it is genuine innovation. Some of it is marketing. The line between the two has never been thinner.

Disclaimer: The analysis provided on AI Verticals is for informational purposes only and does not constitute financial, investment, legal, or medical advice. Always consult qualified professionals.