- AI Sidequest: How-To Tips and News
- Posts
- Bad prompts
Bad prompts
Sometimes people use them on purpose, and sometimes they don't
Issue 97
On today’s quest:
— Gemini 3 is live
— Grok is broken in the most cringe-worthy, hilarious way possible
— Prompts still matter
— Why checking AI citations is a challenge
— Word watch: AI slop & clanker
Google releases Gemini 3
Gemini 3 is a “big deal” update that has a lot of people excited. The model has topped most of the benchmark leaderboards, but one interesting thing is that I’m seeing very different reviews of its writing ability:
If the task involves seeing or doing, Gemini 3 should be at the top of the list. If it involves writing or persuading, GPT 5.1 or Sonnet 4.5 is still your best friend.
Let's start with the creative writing, because that's where Gemini 3 first floored me. GPT-5.1, which dropped last week, was already a noticeable jump from previous frontier models. But Gemini 3? It wrote book chapters I had to double-check weren't plagiarized from a real book. The voice was coherent. The pacing natural, the turns of phrase genuinely surprising. But most importantly, it didn't feel like the "AI slop" writing we all know just a little too well. It's really impressive... Gemini 3 doesn't just put out "good for AI" writing, it puts out genuinely good writing.
Grok is broken in the most cringe-worthy, hilarious way possible

Grok has a way of breaking that seems clearly tied to Elon Musk’s interests and ego. Back in May, Grok briefly became fixated on white genocide in South Africa — one of Musk’s bugbears. Well, Thursday, it had Musk winning almost every match-up people could invent. Above, it said Elon is more fit than LeBron James.
It’s so cringe-worthy and ridiculous I actually took a rare dip into X myself to confirm the posts were real because I couldn’t believe it. Grok also said he is a better role model for humanity than Jesus, he would win a fight against Mike Tyson, and there is currently not a single person on earth who is more fit than him.
On Tuesday, I gave my AP style webinar for Ragan Communications, and in the AI section, I always share the AP’s guidance to be aware of biases in AI — that biases come from the training data but can also come from the companies themselves. God, how I wish I would have had this example at the time! I picture Elon Musk in some dark office at X making edits to the system prompt at 3 a.m.
I usually recommend you fact-check everything you get from an LLM, but in Grok’s case, I just recommend you don’t use it.
Prompts still matter
I’m getting more and more annoyed with people publishing studies based on bad prompts. Here is a representative example from the instructions I found in a recent study that strongly claimed AI is unsuited for the task they studied:
Forms of names and terms should be correct and appropriate to the field of specialization.
This prompt is too vague. Better instructions would spell out what “correct” and “appropriate” mean for names and terms, with differences highlighted for different specializations if necessary, and would give examples of correct and incorrect output.
Writing a good prompt also often includes trial and error to identify what instructions need to be added and what instructions need to be tweaked, which this study also did not do.
It’s not that I’m dying to have people prove AI is fabulous, but with studies like some of those I’m seeing, we just don’t know. It’s a waste of everyone’s time. I want to know whether AI works or not — for real — and if so, for what and how. And I worry that poorly done studies “proving” that AI fails give writers and editors a false sense of security.
We need to be suspicious of studies done by AI companies themselves, but unfortunately, we also need to be suspicious of studies done by people with poor prompting skills or who have incentives to design studies that are likely to fail.
Why checking AI citations is a challenge
The AI’s accuracy varied dramatically by topic: depression citations were 94% real, while binge eating disorder and body dysmorphic disorder saw fabrication rates near 30%, suggesting less-studied subjects face higher risks.
One thing that makes the fake citations hard to detect is that they have all the hallmarks of real citations. For example, many of the inaccurate AI-generated citations included DOI numbers, and when you click on those numbers, they take you to real journal articles — but those real journal articles aren’t related to the actual piece of information they are meant to support.
The paper notes that the researchers didn’t necessarily use the best-known prompting techniques, but instead sought to represent how real researchers might prompt chatbots (an interesting and important problem in study design):
While advanced prompt engineering strategies (eg, chain of thought, few-shot examples, and retrieval-augmented prompts) were considered, these were not implemented, as our study sought to assess citation reliability under straightforward prompting conditions that approximate typical researcher use.
At first, I was annoyed by the bad prompt in this study too, but when I read further, I saw that the authors knew and acknowledged they weren’t using the best prompts and had a reason for doing it — to replicate current real-world performance.
A study that intentionally uses what I’ll call an “average person prompt” is looking at how AI might perform in the workplace today, where many people are untrained. A study that uses the best prompt possible is looking at what AI is capable of in ideal circumstances.
I think there’s a place for both types of studies. But when you're reading about a study, you need to know which kind of study it is to accurately understand what it means.
For the citation study, we don’t know if a person with rudimentary prompting skills would get the same bad results with the newest models, but the study authors call on researchers to thoroughly check citations and on journals to establish more safeguards before publishing — which is good advice either way.
Forrester expects problems with AI in 2026
Advertisers will cut display ad budgets by 30% as consumers leave the open web.
One-third of companies will frustrate customers with bad AI tools.
Employees with lagging AI skills will cause incidents that cost $10 billion in legal settlements, fines, and declining stock prices.
Word watch: AI slop & clanker
The Australian Macquarie Dictionary has “AI slop” and “clanker” on its shortlist for 2025 Word of the Year.
Quick Hits
Using AI
An especially good (and not overwhelming) article on how to get better Deep Research results from chatbots — SearchReSearch
An especially interesting example of the way Gemini 3 is better than older versions — The End(s) of Argument
Climate
‘The Empire of AI’ has a wildly wrong water number (long version, short version) — The Weird Turn Pro
Philosophy
Future Of Publishing Licensing for AI: What Should Book Publishers & Authors Do? [An interesting piece on, among other things, the problems that happen because AI can’t know information behind paywalls.] — Thad McIlroy
Model & Product updates
Education
Inside Yale’s Quiet Reckoning with AI — The New Journal
Other
How to Use ChatGPT Without Brain-Rot — David Epstein
What is AI Sidequest?
Are you interested in the intersection of AI with language, writing, and culture? With maybe a little consumer business thrown in? Then you’re in the right place!
I’m Mignon Fogarty: I’ve been writing about language for almost 20 years and was the chair of media entrepreneurship in the School of Journalism at the University of Nevada, Reno. I became interested in AI back in 2022 when articles about large language models started flooding my Google alerts. AI Sidequest is where I write about stories I find interesting. I hope you find them interesting too.
If you loved the newsletter, share your favorite part on social media and tag me so I can engage! [LinkedIn — Facebook — Mastodon]
Written by a human