The Dictionary Sues OpenAI Over AI Training Data

Britannica and Merriam-Webster have sued OpenAI, accusing it of using nearly 100,000 copyrighted articles to train ChatGPT. The case could reshape AI and copyright laws.

A major copyright battle is brewing between publishers and AI giants

Encyclopedia Britannica and dictionary publisher Merriam-Webster have filed a lawsuit against OpenAI, accusing the company of using their copyrighted content to train its AI models without permission marking the latest escalation in the global fight over artificial intelligence and intellectual property.

What the lawsuit claims

According to court filings in a Manhattan federal court, Britannica alleges that OpenAI copied nearly 100,000 articles, including encyclopedia entries and dictionary definitions, to train models like ChatGPT.

The complaint argues that:

OpenAI used copyrighted material without licensing or consent
ChatGPT can produce near-verbatim outputs from Britannica content
AI-generated answers are diverting traffic away from Britannica’s platforms
The company may have misled users into believing the content was licensed

Publishers say this behavior directly impacts their business by replacing the need for users to visit original sources.

The “dictionary problem” in AI

At the heart of the case is a key issue: how AI models learn language.

Dictionaries and encyclopedias are foundational data sources for training large language models. Britannica argues that OpenAI effectively “memorized” and reproduced its content, rather than transforming it.

This raises a critical legal question:

Is AI training fair use, or is it copyright infringement?

OpenAI’s response

OpenAI has pushed back on these claims, stating that its models are trained on publicly available data and operate under fair use principles.

The company maintains that its systems transform information into new outputs, rather than copying content directly.

Part of a larger wave of lawsuits

This case is not happening in isolation. It’s part of a growing global trend where content creators are taking AI companies to court.

Recent examples include:

News organizations suing over article usage
Authors filing claims over book datasets
Music publishers challenging AI-generated lyrics

Even in India, Asian News International (ANI) previously sued OpenAI over similar concerns about unauthorized content use.

Legal experts say these cases could define the future of AI development and copyright law.

What Britannica wants

Britannica and Merriam-Webster are seeking:

Financial damages
A court order to stop the alleged misuse of their content
Stronger protections against AI systems reproducing copyrighted material

Why this case matters

This lawsuit could have massive implications for the entire AI industry:

It may reshape how AI companies source training data
It could force licensing deals with publishers
It might redefine fair use in the AI era

If courts rule against OpenAI, it could change how tools like ChatGPT are built—and what data they can legally learn from.

The bottom line

The battle between publishers and AI companies is intensifying, and this case puts one of the most trusted names in knowledge, Britannica directly against one of the most powerful AI developers.

The outcome could decide whether AI continues to learn freely from the internet or must start paying for it.