Courts Are Using AI to Predict What Judges Will Do. That's Nobody's Idea of Justice

When a criminal defense attorney in New York City wants to know how a particular judge will likely rule on a bail application, she no longer relies solely on her experience and intuition. She runs the case facts through a predictive analytics platform that has ingested millions of judicial decisions, parsed their factual patterns, and learned the statistical correlations between specific case characteristics and specific judicial outcomes. The system tells her that Judge Martinez grants bail in approximately 34% of cases involving non-violent drug offenses where the defendant has a minor prior record—but that percentage drops to 12% when the arresting officer's name matches a list of officers whose testimony has been overturned at a higher rate. This information costs her clients real prison time avoided. It also raises questions that the legal profession is only beginning to confront.

Predictive justice—the use of algorithmic systems to forecast legal outcomes—is transforming the practice of law at every level, from personal injury settlements and contract disputes to criminal sentencing and parole decisions. The technology offers genuine benefits: consistency, scale, and the ability to identify patterns that are invisible to individual human observers. But it also introduces a new category of systemic risk: the possibility that algorithmic predictions will embed, automate, and legitimize biases that were always present in the legal system but were previously distributed across individual human judgment rather than concentrated in a single computational system.

Where Predictive Justice Is Already Operating

The most controversial application of predictive analytics in the legal system is in criminal sentencing. COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), developed by Northpointe (now Equivant), has been used by courts in dozens of US states to generate risk scores that inform sentencing and parole decisions. The system analyzes 137 factors—including criminal history, age, education, substance use, and family history—to generate a score predicting the likelihood that a defendant will reoffend.

ProPublica's landmark 2016 investigation of COMPAS produced findings that remain deeply contested to this day. The analysis found that Black defendants were nearly twice as likely as white defendants to be falsely flagged as high-risk by the algorithm—while white defendants who actually went on to reoffend were more frequently mislabeled as low-risk. Northpointe disputed the methodology, and the statistical debate has never been fully resolved, but the core finding is difficult to dismiss: an algorithm used in real courtrooms was producing racially disparate outcomes, and nobody fully understood why.

COMPAS is not unique. The UK Ministry of Justice has deployed the Offender Assessment System (OASys) to generate risk scores used in sentencing and parole decisions. A 2023 independent review found that OASys systematically underestimated reoffending risk for women by approximately 15 percentage points—a significant discrepancy that had gone unaddressed for years. The algorithm was trained predominantly on male offender data and simply did not perform well on a population it was never designed to accurately assess.

On the civil side, insurance companies have used predictive models to value personal injury claims for decades, and these models are increasingly sophisticated. When a plaintiff sues for damages in a car accident case, the defendant's insurer's algorithm will forecast the likely settlement value based on the specific facts of the case, the jurisdiction, the judge, the opposing counsel, and the precedent in similar cases. Both sides effectively know the expected value of the case before discovery is complete—which can lead to earlier, fairer settlements, but can also create information asymmetries between parties with access to better algorithmic tools and those without.

Courthouse facade with dramatic architectural columns

The Architecture of Legal Prediction

Modern legal prediction systems use a variety of machine learning techniques, each with distinct strengths and limitations. Natural language processing models—increasingly based on transformer architectures similar to large language models—can ingest and analyze millions of legal documents, including court opinions, briefs, contracts, and statutes. These models can identify semantic relationships between cases, extract legal principles, and surface relevant precedent with a speed and breadth that is simply impossible for human researchers.

CaseText's Cofe, which won the Legal Writing category of the LegalTech Europe competition in 2023, uses transformer-based models to analyze legal documents and generate relevance-ranked lists of potentially applicable precedents. The system processes documents in seconds that would take a team of paralegals weeks to review. ROSS Intelligence, before its acrimonious dissolution following a copyright dispute with IBM, demonstrated that similar systems could handle complex legal research queries posed in natural language—"What is the standard for summary judgment in employment discrimination cases involving remote workers?"—and return structured, cited answers rather than simple document rankings.

At the more quantitative end of the spectrum, firms like Lex Machina and Westlaw Edge use statistical models trained on vast databases of court decisions to predict case outcomes, expected damages awards, and the likely behavior of individual judges. Their models analyze things like how often a specific judge grants motions to dismiss, the average time to trial in a specific district court, and how similar cases have resolved. These predictions are probabilistic rather than deterministic—they tell you what happens in a given type of case most of the time, not what will happen in your specific case. But in a legal system where uncertainty is currency, even probabilistic guidance is valuable.

The Explainability Problem

The most fundamental challenge with legal AI is the explainability paradox: the most accurate predictive models are typically the least explainable, and the most explainable models are typically the least accurate. A deep neural network that can predict judicial behavior with 89% accuracy is operationally useful but legally and ethically problematic if neither the system's developers nor the judges using it can articulate why it produces its predictions. If a defendant is sentenced partly based on a risk score they cannot understand or meaningfully challenge, has due process been satisfied?

This question has landed in courts across the United States. In State v. Loomis (2016), the Wisconsin Supreme Court ruled that using COMPAS scores in sentencing was constitutional as long as the score was not the sole basis for the sentence and the defendant was not told the specific numerical score. Critics argued that this ruling sidestepped the deeper constitutional question—can a black-box algorithmic prediction ever satisfy the defendant's right to know the basis for their sentence? The debate has continued without resolution in subsequent cases.

Legal documents and digital technology representing AI in law

The Dark Table: Predictive Justice Systems in Active Use

System	Jurisdiction	Application	Accuracy Claims	Key Controversy
COMPAS	US (multiple states)	Criminal sentencing, parole	68–75% AUC	Racial bias in false positive rates
OASys	UK	Risk assessment, parole	Not publicly disclosed	Gender bias, female underestimation
PSA (Arnold Foundation)	US, UK, EU pilots	Pre-trial detention	61–79% AUC	Lower bias, but limited training data
LSAC Research	US (law school admission)	Admissions analysis	Proprietary	Proxy discrimination via correlated variables
Lex Machina	US federal courts	Civil outcome prediction	78–85% AUC	Data availability bias toward large firms
Westlaw Edge	Global	Precedent research, outcome prediction	Proprietary	Opacity of weighting methodology

Harvey AI: Building the Infrastructure of Legal Intelligence

Harvey AI, founded in 2022 and backed by investments from Sequoia Capital and the OpenAI Startup Fund totaling over $190 million, represents the most ambitious attempt to date to build a general-purpose AI platform for law firms and corporate legal departments. The platform ingests firms' own data—past deal documents, litigation histories, internal policies—and uses it to power a growing suite of capabilities including due diligence, contract analysis, regulatory compliance, and legal research.

The platform's contract analysis capabilities are particularly notable. Harvey can review a commercial agreement in minutes and flag provisions that deviate from the firm's standard positions, identify non-standard language that may create legal risk, and compare the contract against a database of similar agreements and their litigation outcomes. A 2024 pilot at a major international law firm found that Harvey completed first-pass contract review 73% faster than junior associates, with a comparable error rate—and the firm subsequently restructured its associate training program to emphasize higher-order legal analysis rather than document review, which Harvey now handles as a baseline capability.

Harvey's approach illustrates a broader shift in the legal AI market: from point solutions that do one thing (predict sentencing outcomes, review contracts, draft briefs) toward integrated platforms that serve multiple functions across a firm's workflow. This integration creates significant network effects—more data from more clients improves the models, which attracts more clients—which is driving consolidation in the legal AI market.

"Every time a judge makes a decision, they create a data point. Every data point makes the model more accurate. Every increment of accuracy makes the model more persuasive to the next judge. We are building a system that gradually converges legal judgment toward statistical average, and we are calling that consistency. I am not sure consistency and justice are the same thing." — Professor Sandra Mayson, University of Pennsylvania Law School

The Copyright and Training Data Crisis

A seismic legal battle is unfolding at the intersection of copyright law and AI training. In 2023, a coalition of authors including Paul Tremblay and Mona Awad sued OpenAI, alleging that their copyrighted novels were used without authorization to train ChatGPT. Similar suits have been filed by visual artists, musicians, and—most consequentially for legal AI—by major news organizations including The New York Times.

The Times's lawsuit, filed in January 2024, represents the most significant legal challenge to AI training practices to date. The newspaper argues that OpenAI used millions of its articles to train GPT models that now compete with The Times as a source of legal and general information. If the lawsuit succeeds, the implications for legal AI companies would be substantial—they would need to license training data rather than scrape it freely, dramatically increasing their cost structure and potentially slowing the pace of model improvement.

The legal AI industry has responded with a mix of legal arguments and defensive positioning. Harvey AI and its competitors have entered into licensing agreements with several major legal publishers, including Thomson Reuters and Wolters Kluwer, which provides them with licensed access to authoritative legal content for training purposes. These deals, while expensive, also provide a competitive moat—smaller players who cannot afford licensing agreements may find themselves unable to compete on model quality.

What Courts Are Doing About AI

Courts and legislatures around the world are beginning to regulate the use of AI in legal proceedings, though the approaches vary widely. The European Union's AI Act, which came into force in 2024, classifies AI systems used in judicial processes as "high-risk" applications subject to stringent transparency and human oversight requirements. Courts in EU member states are prohibited from making decisions based solely on automated processing—a provision that will require significant adaptation in jurisdictions where COMPAS-equivalent tools are currently in use.

In the United States, the response has been more fragmented. The Judicial Conference of the United States issued guidance in 2024 advising courts to disclose when AI tools are used in case processing. Several states have passed legislation requiring disclosure of algorithmic risk scores to defendants, but enforcement remains inconsistent. The American Bar Association has adopted Model Rules of Professional Conduct guidance suggesting that lawyers who use AI in case preparation must understand the capabilities and limitations of the tools they use—a requirement that, while sensible, raises questions about how competence is defined and assessed in an era of rapidly evolving AI capabilities.

The path forward almost certainly involves building legal AI that is simultaneously more accurate and more accountable—models that not only predict outcomes but can articulate the reasoning chain that leads to their predictions, in terms that defendants, judges, and appellate courts can evaluate. This is technically challenging, but not impossible. And it is necessary. The legal system derives its legitimacy from the principle that justice is not merely outcome-efficient but visibly and audibly fair—open to scrutiny, challenge, and correction. A justice system that operates on algorithmic prediction without algorithmic explanation may be more efficient, but it is not, by any meaningful definition, justice.