Legal
The Algorithm That Predicts How Judges Think
In 2016, a US federal judge in Wisconsin was presented with data showing that his sentencing patterns were statistically inconsistent with those of his colleagues on the same bench. The data came from a ProPublica analysis of the COMPAS recidivism prediction tool, and it showed that Black defendants were twice as likely as white defendants to be flagged as high-risk by the algorithm. The judge said the data was interesting. He continued sentencing as before.
This episode illustrated something important about the relationship between legal systems and quantitative prediction: courts are not inherently hostile to algorithmic tools, but they are resistant to being told what to do by systems they do not understand and cannot interrogate. This tension has defined the development of AI litigation prediction tools ever since -- a technology that is increasingly sophisticated, increasingly commercially viable, and increasingly controversial.
What Litigation Prediction Actually Does
Litigation outcome prediction is not a single technology. It encompasses a family of techniques applied at different stages of the legal process to estimate the probability, magnitude, or characteristics of legal outcomes. Early applications focused on case outcome prediction -- given a set of facts, a judge, a jurisdiction, and opposing counsel, what is the likely result? Later applications expanded to judge behavior modeling, settlement range estimation, damages projection, and duration prediction.
The foundational insight is that judges are human beings with documented preferences, documented reasoning styles, and documented case histories. A judge who has decided 340 intellectual property cases over twelve years has established a track record. That track record contains information: which arguments succeed, which expert witnesses are persuasive, which procedural approaches are tolerated, and which factual patterns produce which outcomes. Machine learning models can extract those patterns at a scale that no human analyst can match.
Casetext, a legal AI company based in San Francisco, built its CoCounsel platform partly on this insight. The system analyzes court records, briefs, and prior decisions to identify factors correlated with success in similar cases. A lawyer preparing for a motion to dismiss can use the system to understand, based on that specific judge's history, what arguments have succeeded before and what the probability of success is given the specific facts at hand.
Judicata has built a different approach: its algorithms analyze the full text of every published California and federal court decision to identify not just outcomes but the specific language and reasoning that courts use to justify those outcomes. This allows attorneys to predict not just whether a motion will succeed but what language the judge is likely to use in the opinion -- which matters enormously for appeal strategy.
The Data That Makes It Possible
The accuracy of litigation prediction models depends on three things: the quality of the underlying data, the sophistication of the feature engineering, and the appropriateness of the model architecture for the prediction task. All three have improved dramatically in the past decade.
The legal industry's data infrastructure was, for most of its history, catastrophically poor. Court records were paper-based, inconsistently formatted, and siloed across jurisdictions. The digitization of court records -- driven by state-level e-filing mandates, federal court electronic case management systems, and commercial legal data aggregators like Westlaw and LexisNexis -- has created datasets large enough to train useful models.
CourtListener has digitized over 4 million federal and state court opinions. Ravel Law built a database of 10 million judicial profiles. LexisNexis's total database contains over 100 billion legal entities and relationships. These are large enough to train neural networks on judge-level behavior patterns.
The feature engineering problem is subtler than it might appear. A naive approach uses superficial case characteristics -- the type of claim, the jurisdiction, the dollar amount in dispute. These are predictive, but they miss the most important signals. The actual language of the pleadings -- the specific facts alleged, the legal theories asserted, the precedential cases cited -- contains information that is far more predictive of outcome. Modern systems use transformer-based NLP models to encode this language as dense vector representations that capture semantic meaning.
The Dark Table: Prediction Accuracy Across Case Types
| Case Type | Human Expert Accuracy | AI Model Accuracy | Best AI System |
|---|---|---|---|
| Patent Infringement (Federal Circuit) | 62% | 79% | Lex Machina NLP |
| Personal Injury Damages | 55% | 71% | Legal AI Center |
| Employment Discrimination (motion) | 58% | 68% | Casetext CoCounsel |
| Motion to Compel Arbitration | 65% | 74% | Everlaw Analytics |
| Immigration Asylum Grants | 70% | 83% | Syracuse TRAC |
| Class Certification (securities) | 53% | 69% | Westlaw Edge AI |
The Ethical Minefield
The accuracy of litigation prediction tools is not the primary source of controversy. The primary source is what happens when those tools are used -- and by whom.
The most ethically fraught applications are in litigation finance and insurance. Litigation finance firms -- companies that invest in lawsuits in exchange for a share of any recovery -- use outcome prediction models to decide which cases to fund. This means that a model, rather than a judge, is effectively deciding which claims get access to the legal system. A plaintiff with a genuinely meritorious claim who cannot afford to litigate may be unable to find a funder if the model's probability estimate is unfavorable.
Insurance companies use similar tools to evaluate claims. When an insurer's algorithm estimates that a personal injury claim has a 30 percent probability of winning at trial, it may offer a settlement well below the expected value of the claim. A 2023 investigation by the New York Times found that several major insurance companies were using algorithmic claim valuation tools that systematically undervalued claims from policyholders in majority-Black neighborhoods. The algorithms had been trained on historical settlement data that reflected decades of discriminatory practices.
The Explainability Problem
The most technically sophisticated litigation prediction systems are also the least explainable. Deep neural networks make predictions through millions of interacting parameters that resist simple interpretation. A model might correctly predict that a motion to dismiss will be denied, but the lawyer asking why is left with a vector of numbers rather than a legal argument.
XAI research in legal AI is addressing this gap. Interpretable machine learning techniques -- attention visualization, counterfactual analysis, concept bottleneck models -- are being applied to litigation prediction to generate natural language explanations. A team at Stanford Law School's CodeX Center developed a concept bottleneck model that achieved 72 percent accuracy while generating human-readable explanations like: "This motion is predicted to succeed because (1) the judge has granted similar mootness arguments in 11 of 14 recent cases, and (2) the defendant's brief lacks the statutory citations that this judge has consistently required in prior rulings."
What Courts Are Doing About It
Courts are beginning to grapple with algorithmic litigation tools, but unevenly. The EU AI Act, which came into Force in 2024, classifies AI systems used in the justice system as "high-risk" applications subject to strict transparency, accuracy, and human oversight requirements.
In the United States, the judicial branch has been characteristically cautious. Federal Rule of Evidence 702 has been interpreted by several federal circuits to require that algorithmic models used in litigation meet standards of reliability comparable to those applied to human expert testimony -- known error rates, peer-reviewed validation, and general acceptance in the relevant scientific community. Most current litigation prediction systems cannot easily satisfy these conditions.
The result is a legal gray zone. Lawyers can use outcome prediction tools for internal strategy. They cannot, in most jurisdictions, present algorithmic predictions as expert testimony without running the risk that opposing counsel will challenge the model's reliability. This is likely to change as the technology matures and as courts develop clearer standards. But the pace of legal change is slow, and the pace of AI change is not.