Jee Won Park (Hallym University) & Sungmi Park (Hallym University) have posted When Correct Isn’t Enough: Deconstructing Legal Causal Reasoning Capability in Large Language Models on SSRN. Here is the abstract:
Knowledge-based systems powered by Large Language Models (LLMs) require process validity as much as output correctness. In legal AI– a highly regulated domain–this translates to a mandatory requirement for transparent and causally justified decision-making. To delineate the current state and fundamental limits of LLMs’ legal causal reasoning, we introduce LEET-Arg, a benchmark derived from South Korea’s Legal Education Eligibility Test, designed to evaluate both answer correctness and justification integrity. We assess seven frontier commercial models on 97 legal reasoning problems using three metrics: Answer Accuracy, Answer Robustness (via five independent trials), and Justification Fidelity (alignment with expert solutions). Through argumentative analysis, we identify core traits necessary for explainable and causally sound reasoning. The results expose systematic failures undermining Causal Reasoning Integrity—including Spurious Success (correct answers from invalid reasoning), Scope Overgeneralization (misinterpreting conditional disagreement as principle rejection), Is–Ought fallacy (descriptive–prescriptive confusion), and Argumentative Replication (circular justification). Even top-performing models show significant gaps between answer accuracy and justification quality, indicating reliance on spurious correlations rather than principled causal inference. This work provides empirical evidence that correctness alone is insufficient for high-stakes applications. Our findings emphasize the urgent need for evaluating authentic reasoning processes in legal AI, where transparent and verifiable justification is not optional but a statutory requirement for legitimacy.
Recommended.
