What happens when an LLM judges its own work?

Plus, a test of AI translation

In partnership with

Issue 88

On today’s quest:

— Authors: Find your books in the Anthropic settlement database
— LLMs do a so-so job at round-trip translation
— AI video is now indistinguishable from real life
— AI Sidequest: The podcast
— LLMs love their own writing
— Examples of AI quality control from Amazon

Authors: Find your books in the Anthropic settlement database

The official database is now online for books that qualify for a payout in the Anthropic copyright settlement case. Be sure to try different spellings of your name and book titles to make sure you catch entries that could have small errors.

LLMs do a so-so job at round-trip translation

Lech Mazur of Advamag ran a “round trip” translation test going from English to another language and then back to English. He tested 8 models across 10 languages for 200 different types of writing (e.g., etymology column, school trip FAQ, alt text, piano lesson plan).

There were some variations between languages, but ChatGPT with medium reasoning generally came out on top. The average score across all models for all languages was 8.4 or greater, except for Swahili, which was ~7.6. For reference, this is how he defined a score of 7 and 9:

  • 9.0: tiny phrasing differences only; full fidelity

  • 7.0: generally faithful but some nuance loss or mild shifts

Mazur also provides detailed failure reports, which catalog what the models did wrong. These reports make the output sound less impressive than the numerical scores to me (example report for ChatGPT-5: Spanish).

AI video is now indistinguishable from real life

I’ve been worried for a while about AI video and the problems it’s going to cause with people’s perception of reality, and after seeing this AI-generated “behind-the-scenes” video of KPOP Demon Hunters a couple of weeks ago, I felt like we had crossed the Rubicon. Seen out of context, I never would have suspected it was AI. I would have thought it was just some fun teens on set for a movie I didn’t recognize.

This isn’t a leaked clip from KPOP Demon Hunters—it’s actually AI-generated. The more realistic these fake behind-the-scenes clips look, the harder they become to distinguish from the real thing.

luokai (@luok.ai)2025-09-22T17:18:18.186Z

And now, OpenAI has released Sora, an even better AI video generator. Plus, they’ve launched a new TikTok-style vertical video app filled with Sora videos. Using the app, you can put yourself and your friends into any video. For example, you can make a video that looks like your best friend being caught on a convenience store security camera shoplifting a bottle of gin. wHaT cOuLd Go WrOnG?

On LinkedIn, Christopher Penn posted completely fake videos of himself (labeled) speaking at TED, on The View, and at the World Economic Forum to demonstrate how careful we’ll need to be going forward when vetting people’s credentials.

We’re entering the golden age of the liar’s dividend — the problem that when reality is hard to determine, liars can just say “that’s not true” when they’re caught, and people believe them.

Ethan Mollick also has a good post on Bluesky showing how far AI video has come in four years. If you aren’t familiar with what AI video can do these days, you owe it to yourself to get up to speed.

AI Sidequest: The podcast

Adam Davidson, co-founder of “Planet Money,” and I discussed Sora on the latest episode of my experimental podcast, AI Sidequest (and he has an interesting story about realism in advertising from his father, who was an actor on food commercials). We also talked about why he uses Claude Code for non-coding tasks, the new OpenAI jobs study, and the new OpenAI parental controls.

If you just want to hear the Sora segment, that and others are broken out separately at YouTube. As the real YouTubers say, “Like and subscribe!”

LLMs love their own writing

“LLM Evaluators Recognize and Favor Their Own Generations”

Adam Davidson and I also discussed a funny phenomenon in which LLMs seem to prefer their own output. It showed up in the OpenAI jobs study, and I also found a paper that addressed the self-preference problem directly. Researchers at MATS, NYU, and Anthropic found that the more a model was able to recognize its own output, the more it also preferred its own output.

A practical takeaway is to be cautious if you are using an LLM to evaluate LLM writing, especially when the evaluation is subjective rather than something you can measure. You’ll likely get more meaningful results by using a different LLM to do the evaluation than the one that generated the output — for example, using Claude to grade output from Gemini.

The Gold standard for AI news

AI keeps coming up at work, but you still don't get it?

That's exactly why 1M+ professionals working at Google, Meta, and OpenAI read Superhuman AI daily.

Here's what you get:

  • Daily AI news that matters for your career - Filtered from 1000s of sources so you know what affects your industry.

  • Step-by-step tutorials you can use immediately - Real prompts and workflows that solve actual business problems.

  • New AI tools tested and reviewed - We try everything to deliver tools that drive real results.

  • All in just 3 minutes a day

Examples of AI quality control from Amazon

  1. The system uses specific rules to catch bad content. For example, it won’t accept a weight that isn’t followed by a unit such as kilograms or pounds.

  2. A second AI system automatically checks the output of the first system, for example, checking that the color in the product title matches the color of the product in the image. [A second AI “checker” is something I’m frequently seeing these days.]

  3. People step in after the AI review and fix problems and update the system. For example, they found and fixed that the LLM would automatically write “no warranty” when no warranty information was provided by the vendor.

This was a long, meaty article. The page has a “listen to the article” player that clocks in at 26 minutes, but if you’re interested, it’s worth the read or listen.

Quick Hits

Using AI

Resources

Psychology

Ex-OpenAI researcher dissects one of ChatGPT’s delusional spirals — TechCrunch (One piece of advice he offered that I also feel strongly about is that users should start new chats more often for greater safety.)

Climate

WEBINAR: AI’s Environmental Impact. October 17. $99. Librarian Nicole Hennig, who is presenting this webinar, is one of my “must follow” people in AI.

Why US Power Bills Are Surging (It’s not just AI) — Wired

Bad stuff

OpenAI is huge in India. Its models are steeped in caste bias. [Surprisingly, GPT-5 was more biased than GPT-4o.] — Technology Review

I’m laughing

Medicine

Job market

Evaluating the Impact of AI on the Labor Market: Current State of Affairs. TAKEAWAY: “Our metrics indicate that the broader labor market has not experienced a discernible disruption since ChatGPT’s release … undercutting fears that AI automation is currently eroding the demand for cognitive labor across the economy.” — The Budget Lab

Model updates

Education

The business of AI

The AI Boom Isn't a Bubble — Carlos Iacono

Sam Altman claims ChatGPT has 800 million weekly users [That would be more than Twitter and roughly in the same ballpark as Snapchat.] — Simon Willison

Government

Other

Meta plans to sell targeted ads based on data in your AI chats. More than a billion people chat with Meta AI each month. There is no way to opt out. — TechCrunch

The new Claude Sonnet 4.5 is “dramatically better than previous models at recognizing when it’s being tested — raising concerns that it might just be pretending to be aligned to pass its safety tests.” — Transformer

Someone has done a master’s thesis on AI slop! Here’s Gustavo Costa’s blog post summarizing his work. — Simulacro

What is AI Sidequest?

Are you interested in the intersection of AI with language, writing, and culture? With maybe a little consumer business thrown in? Then you’re in the right place!

I’m Mignon Fogarty: I’ve been writing about language for almost 20 years and was the chair of media entrepreneurship in the School of Journalism at the University of Nevada, Reno. I became interested in AI back in 2022 when articles about large language models started flooding my Google alerts. AI Sidequest is where I write about stories I find interesting. I hope you find them interesting too.

If you loved the newsletter, share your favorite part on social media and tag me so I can engage! [LinkedInFacebookMastodon]

Written by a human