Herman on LLMs and Ejusdem Generis

Howard Herman has posted Plausibility Sans Precision: Large Language Models and the Limits of Automated Statutory Interpretation (Ejusdem Generis) on SSRN. Here is the abstract:

Ejusdem generis, a foundational canon of statutory interpretation, requires courts to infer the common genus uniting enumerated specific terms and apply that genus to constrain general catchall language. This paper argues that contemporary large language models (LLMs) are architecturally incapable of performing authentic ejusdem generis reasoning. This Article traces the doctrine’s development through Supreme Court jurisprudence from United States v. Palmer (1818) through Yates v. United States (2015), demonstrating that ejusdem generis requires intensional, ontology-grounded, context-sensitive inductive abstraction with explicit rule reification. This Article then analyzes how transformer-based LLMs, trained via next-token prediction on subword-tokenized text, produce only subsymbolic, extensional, statistically-grounded interpolation with implicit pattern-matching. The mismatch is not merely a matter of training data or prompting strategy; it is architectural and objective-level. This Article presents evidence from tokenization studies, computational complexity theory (particularly the TC⁰ constraint on transformer expressivity), and distributional semantics research to establish that character-level tokenization does not remedy these limitations. The gap between what ejusdem generis requires and what LLMs provide has significant implications for the deployment of AI systems in legal interpretation contexts.

Highly recommended! This paper suggests substantial technical competence on the part of the author. I am genuinely uncertain about the validity of the reasoning, but this paper does a better job than most at excavating the core technical issues.

To receive a daily summary of posts from Legal Theory Blog by email, get a free subscription to Legal Theory Stack.