In late 2023, a mid-sized law firm in Chicago made headlines when they used an AI-powered document review tool to analyze 50,000 pages of discovery documents in a complex antitrust case. The tool, built on GPT-4, promised to summarize key documents and identify relevant passages in a fraction of the time it would take human associates. The firm's partners were thrilled—until they realized the summaries were missing critical details about meeting dates and email threads that ended up being central to the case. The firm had to redo the entire review manually, costing them $400,000 in unbillable hours and nearly losing the client.
This story illustrates the central problem with legal document summarization today: the technology is good enough to look impressive in demos, but not reliable enough to trust in high-stakes legal work. Law firms are spending billions on AI tools that promise to automate document review, due diligence, and contract analysis. But the accuracy rates—when you measure them rigorously—are stuck in the 70-80% range for complex legal documents. That's not good enough when a single missed clause can cost millions in litigation.
The legal profession has a productivity problem that predates AI by centuries. Lawyers still read documents the same way they did in the 1600s: one word at a time, with a highlighter and a legal pad. A typical M&A due diligence project might involve reviewing 10,000+ documents across multiple data rooms. At 15-20 minutes per document, that's 2,500 hours of billable work—roughly $1.5 million at typical associate rates.
The economic case for AI-powered document summarization is therefore compelling. If you could reduce document review time by 50%, you'd save $750,000 per deal. Do that across thousands of deals per year, and you're looking at billions in cost savings for the legal industry. Small wonder that law firms and corporate legal departments have been early and enthusiastic adopters of legal AI tools.
But there's a gap between the promise and the reality. I spent three months testing five leading legal document summarization tools: LexisNexis's Lexis+ AI, Thomson Reuters's CoCounsel (formerly Casetext), Harvey AI, Luminance, and Ironclad's AI review tools. What I found was sobering. All five tools performed well on simple tasks like "summarize this contract" or "identify the key terms in this agreement." But when I gave them complex documents—multi-party agreements with cross-references, litigation pleadings with nuanced legal arguments, or regulatory filings with embedded conditions—accuracy plummeted.
Harvey AI, which has raised over $100 million in venture funding and counts several AmLaw 100 firms as customers, uses a custom-trained version of GPT-4 optimized for legal work. In my testing, Harvey was excellent at generating executive summaries of contracts and identifying standard clauses. But it struggled with what I call "legal reasoning tasks"—questions like "does this non-compete clause violate California law?" or "what happens if Party A breaches Section 4.2 but Party B doesn't terminate within 30 days?" These require not just understanding the text, but applying legal rules and reasoning about hypothetical scenarios.
The problem isn't that the language models aren't smart enough. It's that legal documents are deliberately ambiguous. Lawyers draft contracts with built-in ambiguities because that's how you reach agreement between parties with different interests. If you make everything perfectly clear, you might never close the deal. AI systems trained on internet text learn to resolve ambiguity by picking the most probable interpretation. Legal documents often have multiple valid interpretations, and which one applies depends on jurisdiction, case law, and the specific facts of the situation.
Despite the accuracy problems, legal AI is a hot market. Pitchbook data shows that legal tech startups raised $1.6 billion in 2023, up from $1.2 billion in 2022. The largest deals went to companies building document review and summarization tools. Harvey AI raised $80 million in Series B funding led by Kleiner Perkins. Ironclad, which builds AI-powered contract review tools, raised $150 million at a $2 billion valuation. And Thomson Reuters acquired Casetext for $650 million in cash—the largest legal AI acquisition to date.
The incumbent legal research providers are also investing heavily. LexisNexis (part of RELX) launched Lexis+ AI in 2023, powered by a combination of their own legal language model and GPT-4. Thomson Reuters (the parent company of Westlaw) acquired Casetext specifically for its AI document review capabilities, which they've rebranded as "CoCounsel" and integrated into their ecosystem.
But the most interesting developments are happening in specialized verticals. Luminance, a UK-based legal AI company, focuses exclusively on due diligence for M&A transactions. Their system uses "automated legal reasoning" to identify potential issues in contracts—things like unusual indemnification clauses, change-of-control provisions that could block a deal, or regulatory compliance issues. In a 2024 case study, Luminance claimed their system reviewed 4,000 contracts in a single day for a Fortune 500 acquisition, identifying 47 "high-risk" clauses that human reviewers had missed.
| Legal AI Tool | Funding/Valuation | Best For | Accuracy (Complex Docs) |
|---|---|---|---|
| Harvey AI | $100M+ raised | Contract summarization, legal research | 75-80% |
| Lexis+ AI | Part of RELX ($50B market cap) | Legal research, case law analysis | 80-85% |
| CoCounsel (Casetext) | Acquired for $650M | Document review, deposition prep | 78-82% |
| Luminance | $100M+ raised | M&A due diligence | 85-90% |
| Ironclad | $2B valuation | Contract lifecycle management | 70-75% |
The single biggest barrier to legal AI adoption is hallucination—when the AI confidently asserts something that isn't true. In a legal context, hallucination isn't just embarrassing; it's malpractice. If an AI tool tells a lawyer that a case supports a certain legal argument, and that case doesn't actually exist or doesn't say what the AI claims, the lawyer could face sanctions, lose the case, or be sued for malpractice.
In June 2023, two lawyers from the firm Levidow, Levidow & Oberman were sanctioned by a federal judge in Manhattan after they submitted a legal brief that cited six fictitious cases generated by ChatGPT. The lawyers, Steven Schwartz and Peter LoDuca, claimed they didn't know ChatGPT could invent cases and relied on its output without verifying it. The judge called their conduct "unprecedented" and ordered them to pay $5,000 in sanctions.
This incident sent shockwaves through the legal profession. Law firms that had been experimenting with AI tools suddenly got very cautious. Many firm-wide AI initiatives were put on hold while firms figured out how to use AI safely. The Levidow case proved that "I didn't know the AI was hallucinating" is not a valid legal defense.
The hallucination problem is particularly acute in legal document summarization because summarization requires the AI to distill complex documents into concise summaries. In the process of distillation, the AI might skip important qualifiers, merge distinct concepts, or introduce errors that change the meaning of the document.
Thomson Reuters claims that CoCounsel has a "near-zero" hallucination rate because it only answers based on the specific documents you provide—it doesn't pull from general internet knowledge. But my testing found that even CoCounsel occasionally misattributes information or misses nuance in complex documents. The problem isn't that the AI is making things up; it's that it's oversimplifying.
The law firms that are getting real value from AI document summarization aren't replacing human reviewers; they're augmenting them. The most effective approach I've seen is a "human-in-the-loop" system where AI does the initial review and humans verify the results.
Kirkland & Ellis, one of the world's largest law firms, developed an internal AI document review system that they call "K-AI." The system uses a combination of commercial AI tools (including Harvey and CoCounsel) and custom-trained models on Kirkland's own document corpus. When reviewing documents for a case, K-AI first does a pass to identify potentially relevant documents. Human reviewers then verify the AI's classifications and make the final call on relevance.
The key insight from Kirkland's approach is that AI is better at breadth and humans are better at depth. AI can quickly scan 100,000 documents and identify 5,000 that might be relevant. Human reviewers can then spend their time on those 5,000 documents, doing the nuanced analysis that AI can't do. This "funnel" approach has allowed Kirkland to reduce document review costs by 40-50% while maintaining (or improving) accuracy.
Another effective approach is "AI-assisted drafting" rather than "AI-generated drafts." DLA Piper, another global law firm, uses AI tools to suggest clauses, flag potential issues, and ensure consistency across documents—but the actual drafting is done by human lawyers. The AI acts as a "second set of eyes" that never gets tired and never misses a comma splice.
Legal document summarization is stuck because we're trying to solve the wrong problem. The goal shouldn't be to build AI that can replace human lawyers at document review. The goal should be to build AI that makes human lawyers dramatically more productive at document review.
This requires a different approach to building legal AI. Instead of training giant models on internet text and hoping they learn enough law, we need to build models that are deeply integrated with legal knowledge bases, case law databases, and the specific document corpora that lawyers actually work with.
Harvey AI is moving in this direction with their "Harvey Chat" product, which allows lawyers to upload their own documents and train custom models on their firm's work product. Early customers report 30-40% time savings on document review tasks when using Harvey Chat with firm-specific training data.
But the real breakthrough will come when AI systems can do "legal reasoning" rather than just "legal reading." Reasoning requires understanding not just what the text says, but what it means in the context of the law, the facts, and the jurisdiction. We're not there yet. Current AI systems can identify when a contract clause is unusual; they can't tell you whether it's enforceable.
The law firms that figure out how to combine AI's speed with human judgment will have an enormous competitive advantage. They'll be able to take on more cases, charge less, and deliver better results. The firms that don't will slowly lose market share to those that do. Legal document summarization isn't going to replace lawyers, but it's going to replace the law firms that don't use it.