Fagan on AI Training Data Governance

Frank Fagan (South Texas College of Law Houston) has posted Training Data Governance (NYU Journal of Intellectual Property & Entertainment Law, forthcoming 2026) on SSRN. Here is the abstract:

As AI-generated summaries increasingly displace traditional search results, users are less likely to visit the underlying websites where content is published. This shift has sharply reduced traffic to those sites, threatening the economic viability of content creators and prompting a wave of paywalls, restrictions, and litigation. With referral-based revenue in decline, the continued supply of high-quality content faces mounting risk, precisely as generative AI has grown more dependent on such material. This tension, between innovation and sustainability, frames the central legal and policy inquiry of training data governance: how to preserve access to essential AI training inputs without undermining the incentives to produce them.

This Article examines licensing as a tool of training-data governance and focuses on the practical question of when content loss threatens model performance and reduces social welfare. The inquiry centers on whether withdrawal of high-value material is likely and whether voluntary bargaining can realistically prevent it. Where the risk of withdraw is low, additional protection is unnecessary; where the risk is substantial and bargaining fails, a narrowly tailored fallback, such as a standardized, non-exclusive license, can preserve access without disturbing fair-use doctrine or existing private arrangements. This welfare logic is implemented through a three-part test: licensing is warranted only when (1) the content has demonstrable value for AI training, (2) withdrawal is the rational market outcome absent remuneration, and (3) voluntary licensing fails due to transaction costs or bargaining frictions. Together, these conditions ensure that intervention occurs only where it improves overall welfare relative to the status quo.