
In brief
- Law professors preferred AI-generated contract law answers over those written by fellow professors about 75% of the time.
- AI responses were flagged as harmful less often than professor-written responses.
- Researchers said the results show that large language models can align with professional standards.
Law professors preferred answers generated by artificial intelligence over answers written by fellow professors, according to a recent study led by Stanford University that examined how large language models perform on legal reasoning tasks.
In the study, 16 professors from 14 U.S. law schools—including Stanford, Yale, New York University, the University of Chicago, Georgetown, UCLA, and the University of Virginia—created 40 contract law questions covering legal doctrine, case law, hypotheticals, and policy issues. Researchers saw it as an ideal way to test the capabilities of modern AI.
“Large language models (LLMs) are increasingly promoted as educational tutors, yet most evaluations focus on domains with a single ground truth,” the researchers wrote. “Many disciplines, however, hinge on judgment: reasoning, weighing ambiguity, and reaching defensible conclusions. Law provides a sharp test.”
In 2,918 blinded comparisons, professors selected the answer they would rather give a student. Google’s Gemini 2.5 Pro won 75.92% of its matchups against human instructors, while the tech giant’s NotebookLM won 74.75% of the time, giving AI-generated results the nod over humans in roughly three-quarters of responses.
According to the researchers, to determine whether the results reflected a broader professional consensus, the researchers analyzed how often professors agreed when evaluating the same answer pairs.
“Observed agreement exceeded the level expected if judgments were entirely idiosyncratic, indicating that the LLMs’ success reflects alignment with common disciplinary criteria,” they wrote.
The study found that AI models also outperformed human instructors across multiple categories, including recall questions relating to case, code, or doctrine, hypotheticals, and policy discussions.
“To probe whether any LLM advantage might be driven by surface-level writing style rather than substantive content, we additionally engineered a set of lexico-syntactic features—answer length, structural organization, reasoning nuance, legal anchors, confidence tone, clarity, and pedagogical support—and tested how much of the preference pattern they could explain,” the study said.
AI-generated answers were also flagged as harmful less often than those written by professors, with Gemini recording a 3.41% harmfulness rate and NotebookLM 3.64%, compared with 12.06% for human instructors. In a separate analysis of additional models, Anthropic’s Claude Opus 4.7 ranked first, followed by OpenAI’s ChatGPT 5.4 and Gemini 2.5 Pro, while every AI model evaluated outperformed human instructors on average.
The researchers cautioned that the study did not measure whether the answers matched each professor’s individual teaching preferences, leaving open the possibility that AI-generated responses were viewed as generally acceptable rather than tailored to any one instructor’s approach.
“While LLM responses are generally preferred over those of human instructors, our evaluation setting does not allow us to directly measure the extent to which instructor preferences are satisfied,” the study said. “It is at least theoretically possible that LLMs, although generally delivering stronger responses, still generate answers that are merely viewed as “good enough.”
The study comes as courts, law firms, and law schools increasingly grapple with how artificial intelligence should be used in the legal profession.
In March, the Los Angeles Superior Court began testing AI tools to help judges manage growing caseloads, while law schools are adding AI training programs.
“The potential benefits of these new technologies as a force multiplier in the practice of law just can’t be ignored,” Mississippi College School of Law Dean John P. Anderson previously told Decrypt. “Whether our students plan to be litigators or transactional attorneys, their future employers will expect familiarity with these AI tools. We want the firms hiring our students to be confident that every MC Law grad is competent in AI technologies.
At the same time, however, law firms continue to confront cases undermined by hallucinations and other AI-generated errors. In April, Law firm Sullivan & Cromwell admitted to a U.S. bankruptcy court that a recent filing in a high-profile case contained fake citations generated by AI.
Daily Debrief Newsletter
Start every day with the top news stories right now, plus original features, a podcast, videos and more.




Be the first to comment