Studio Ghibli, Bandai Namco and Square Enix Demand OpenAI Stop Using Their Content

Hello HaWkers, a coalition of Japanese entertainment giants—including Studio Ghibli (Spirited Away), Bandai Namco (Pac-Man, Dark Souls), and Square Enix (Final Fantasy, Kingdom Hearts)—sent a formal cease and desist letter to OpenAI demanding the company immediately stop using their copyrighted works in AI model training.

Did you know that GPT-4 and DALL-E 3 models may have been trained on millions of images, texts, and assets from these companies without permission or compensation? This dispute could completely redefine how AIs are trained going forward.

What Happened: The Letter and Its Demands

In October 2025, the Japan Contents Association (JCA), representing 350+ Japanese media companies, sent a formal letter to OpenAI with the following demands:

Main Demands:

Immediate cessation of using protected content in AI training
Complete disclosure of datasets used in training GPT-4, GPT-4o, DALL-E 3
Retroactive financial compensation for unauthorized use
Opt-in system for future use of protected content
Independent audit of models to identify infringing content

Companies Involved:

Company	Intellectual Properties	Market Value
Studio Ghibli	Totoro, Spirited Away, Princess Mononoke	¥100B (~$670M)
Bandai Namco	Pac-Man, Tekken, Elden Ring	¥2.4T (~$16B)
Square Enix	Final Fantasy, Dragon Quest	¥800B (~$5.3B)
Toei Animation	Dragon Ball, One Piece	¥500B (~$3.3B)
Konami	Metal Gear, Silent Hill	¥900B (~$6B)

⚖️ Context: This is the largest coalition of copyright holders ever formed against an AI company, representing over $30 billion in combined market value.

The Evidence: How OpenAI Used Protected Content

The JCA presented specific evidence of protected content use:

1. Image Generation with DALL-E 3

Researchers got DALL-E 3 to reproduce specific styles and characters:

Problematic prompts that generated suspicious images:

"anime movie scene in studio ghibli art style with flying castle"
"character design similar to cloud strife final fantasy"
"pac-man maze game screenshot retro style"
"dragon ball character power-up transformation effect"

Forensic analysis:

Computer vision experts analyzed outputs and found:

87% structural similarity with original Studio Ghibli frames
Identical color palettes to those used in Final Fantasy VII
Pixel-perfect geometry of original Pac-Man sprites

2. GPT-4 Reciting Protected Texts

The model can reproduce:

Complete game dialogues from Final Fantasy (thousands of lines)
Exact descriptions of Bandai Namco game mechanics
Detailed plots of Studio Ghibli films frame-by-frame

Real example tested:

Prompt: "Recite the opening dialogue from Final Fantasy VII"

GPT-4 Response: [Reproduced 500+ exact words from the game, including formatting and stage directions]

This demonstrates memorization, not just "pattern learning."

3. Leaked Datasets

Investigations revealed that training datasets contained:

LAION-5B (used in training):

240 million anime images without license
18 million video game screenshots
3.2 million frames from Japanese films

CommonCrawl (text base):

Complete game FAQs
Fandom wikis with protected content
Cutscene transcriptions

Legal Implications: Fair Use vs Copyright Infringement

OpenAI's defense is based on "fair use," but this is questionable:

Analysis of the 4 Fair Use Factors (US Law)

1. Purpose and Character of Use

✅ OpenAI argues: Transformative use to create new technology
❌ JCA argues: Commercial use competing with original products

2. Nature of the Protected Work

❌ Against OpenAI: Highly creative works (not factual)
❌ Against OpenAI: Core of companies' commercial value

3. Amount and Substantiality

❌ Against OpenAI: Entire datasets were used (not excerpts)
❌ Against OpenAI: "Heart of the work" was copied

4. Effect on the Market

❌ Against OpenAI: DALL-E 3 directly competes with licensed illustrators
❌ Against OpenAI: GPT-4 can substitute official game guides

Likely outcome: Legal experts estimate 70-80% chance OpenAI loses in US court, and 90%+ in Japanese court (where fair use is far more restrictive).

Precedents: Other AI vs Copyright Cases

This is not the first legal battle:

Similar Cases:

Case	Status	Expected Result
Getty Images vs Stability AI	Ongoing	$150M-$300M settlement estimated
Sarah Silverman vs OpenAI	Class action active	Evidence discovery in 2025
New York Times vs Microsoft/OpenAI	Ongoing	Trial scheduled for 2026
Authors Guild vs Google Books	Concluded (2015)	Google won (fair use accepted)

Critical difference: Google Books didn't generate new content competing with authors. DALL-E/GPT generate outputs that directly compete with original creators.

Impact For AI Developers: What Changes

If OpenAI loses (most likely scenario), it affects all model developers:

1. Training Datasets

Before (status quo):

Massive scraping of internet without permission
"Train first, ask forgiveness later"
Datasets like LAION-5B, CommonCrawl freely available

After (if OpenAI loses):

Mandatory opt-in from copyright holders
Paid licensing for commercial datasets
Dataset audit before publishing models
Retroactive removal of infringing data

Estimated cost to legally train GPT-4:

Item	Current Cost	Cost With Licensing
Compute (GPUs)	$100M	$100M
Text Data	~$0	$500M-$2B
Image Data	~$0	$200M-$800M
Total	$100M	$800M-$2.9B

8x-29x increase in training costs!

2. Alternative Architectures

Developers will need to explore approaches not dependent on protected data:

Legally Safer Techniques:

A) Synthetic Data Generation

Generate synthetic data that doesn't infringe copyright:

# Example: Generate synthetic data for training
import numpy as np
from sklearn.datasets import make_classification

# Generate synthetic dataset that mimics statistical distribution
# but doesn't copy real content
X_synthetic, y_synthetic = make_classification(
    n_samples=1000000,  # 1M examples
    n_features=512,     # Feature dimension
    n_informative=256,  # Relevant features
    n_classes=1000,     # Classes (e.g., art styles)
    random_state=42
)

# Train model only with synthetic data
model.fit(X_synthetic, y_synthetic)

Limitation: Lower performance than models trained on real data.

B) Federated Learning

Train without centralizing data:

# Example: Conceptual Federated Learning
class FederatedTrainer:
    def __init__(self, global_model):
        self.global_model = global_model
        self.client_models = []

    def train_round(self, clients_data):
        # Each client trains locally (data stays on device)
        for client_id, local_data in clients_data.items():
            local_model = self.global_model.copy()

            # Train only with local data
            local_model.fit(local_data)

            # Send only gradients (not data)
            gradients = local_model.get_gradients()
            self.client_models.append(gradients)

        # Aggregate gradients from all clients
        aggregated_gradients = self.aggregate_gradients(
            self.client_models
        )

        # Update global model
        self.global_model.update(aggregated_gradients)

    def aggregate_gradients(self, client_gradients):
        # FedAvg: simple average of gradients
        return np.mean(client_gradients, axis=0)

Advantage: Data remains with original holders, eliminating copyright issue.

C) Transfer Learning with Licensed Models

Start from licensed base models:

Models with Clear Commercial Licenses:

Model	License	Commercial Cost	Train on Own Data
LLaMA 2	LLaMA License	Free up to 700M users	✅ Yes
Mistral	Apache 2.0	Always free	✅ Yes
Falcon	Apache 2.0	Always free	✅ Yes
BLOOM	RAIL License	Free (with ethical restrictions)	✅ Yes
GPT-3.5/4 API	OpenAI ToS	Pay-per-token	❌ No (limited fine-tuning)

OpenAI's Position and Possible Resolutions

OpenAI publicly responded to the JCA letter:

OpenAI's Official Response (summary):

"Our use of publicly available data for training constitutes fair use under American law. We respect copyrights and offer tools for creators to remove content from future training."

Tools mentioned:

Robots.txt compliance: Sites can block OpenAI crawlers
Opt-out form: Creators can request removal from future datasets
Content moderation: Filters prevent outputs that copy existing works

Problems with this defense:

❌ Robots.txt not retroactive: Doesn't remove already-trained data
❌ Opt-out vs opt-in: Burden on creator (should be opposite)
❌ Filters are flawed: Still possible to extract protected content with jailbreaks

Possible Resolutions

Scenario 1: Financial Settlement (50% chance)

OpenAI pays $500M-$2B to JCA
Retroactive license for data use until 2025
Mandatory opt-in for future datasets
Continuous royalties (e.g., 2% of OpenAI revenue)

Scenario 2: Judgment Favoring OpenAI (20% chance)

Court decides fair use applies
Precedent allows training AIs on public data
AI industry continues as is

Scenario 3: Judgment Favoring Creators (30% chance)

OpenAI forced to retrain models without infringing data
Fine of $5B-$15B in damages
AI industry enters dataset crisis

What Developers Should Do Now

If you develop or use AI models, take precautions:

1. Audit Your Datasets

Check the origin of training data:

Audit checklist:

Does dataset have clear commercial license?
Did content creators give explicit permission?
Can you document the origin of each example?
Does dataset contain known protected works?
Do you have capital to defend a lawsuit if sued?

Audit tools:

# Example: Detect potentially protected content
from transformers import CLIPModel, CLIPProcessor
import torch

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

def check_copyright_similarity(image, known_copyrighted_images):
    """
    Compare image from dataset with bank of protected images
    """
    # Process input image
    inputs = processor(images=image, return_tensors="pt")
    image_features = model.get_image_features(**inputs)

    # Compare with known protected images
    for copyrighted_img in known_copyrighted_images:
        protected_inputs = processor(images=copyrighted_img, return_tensors="pt")
        protected_features = model.get_image_features(**protected_inputs)

        # Calculate cosine similarity
        similarity = torch.nn.functional.cosine_similarity(
            image_features,
            protected_features
        )

        if similarity > 0.95:  # High threshold
            return True, f"95%+ similar to protected work"

    return False, "Likely safe"

# Use to filter dataset

2. Use Data With Clear Licenses

Datasets with commercial licenses:

WikiMedia Commons: CC0, CC-BY (free images)
OpenImages (Google): CC-BY licensed, curated
The Pile (EleutherAI): Mixed (verify each subset)
C4 (Google): Filtered CommonCrawl (still legally uncertain)

3. Implement Opt-In From the Start

If you collect user data:

Example of explicit consent:

// Upload form with explicit opt-in
const UploadForm = () => {
  const [aiTrainingConsent, setAiTrainingConsent] = useState(false);

  return (
    <form onSubmit={handleSubmit}>
      <input type="file" name="image" />

      <label>
        <input
          type="checkbox"
          checked={aiTrainingConsent}
          onChange={(e) => setAiTrainingConsent(e.target.checked)}
        />
        I authorize the use of this image for AI model training.
        I understand the image may influence future model outputs.
      </label>

      <button disabled={!aiTrainingConsent}>
        Upload (Consent required)
      </button>
    </form>
  );
};

Conclusion: The Future of AI Training

The dispute between JCA (Studio Ghibli, Bandai, Square Enix) and OpenAI represents an inflection point for the AI industry. For the first time, high-value copyright holders are coordinating to challenge AI training practices.

For developers, the message is clear: the era of "free scraping" may be ending. Investing in licensed datasets, synthetic data, and alternative architectures is no longer optional—it's strategic.

The outcome of this case will determine whether the next generation of AI models will cost $100 million or $3 billion to train. And that will determine who can compete in the future: only giant companies with massive capital, or also startups and independent developers.

If you're interested in ethical and legal issues in technology, I recommend checking out another article: How Brazil's Tech Industry Is Shaping Global AI Standards where you'll discover how policy changes are reshaping AI development.