- AI Sidequest: How-To Tips and News
- Posts
- Why you should never trust a free chatbot
Why you should never trust a free chatbot
Plus: AI detectors flag a novelist's work as 100% AI (it wasn't)
Issue 107
On today’s quest:
— Do not trust results from free chatbots
— Word watch: Spinning verbs
— Be wary of AI summary buttons
— Medical AI: Exceptionally good and exceptionally bad
— AI detectors are faulty
— Companies benefit from examples of how to use AI
— Claude behaves as though it has emotions
— A third-world perspective on AI
— LLM judges aren’t unbiased (but neither are humans)
— Training models to hallucinate less
— Scams targeting authors are increasing
— The infinite backlog
Do not trust results from free chatbots
I’ve told you multiple times that you need to use the paid versions of chatbots to get decent results, and a marketing newsletter just published a nice anecdote showing why: they asked the current temperature in Cary, NC, and the paid version of ChatGPT gave them the right temperature, while the free version gave them the wrong temperature — along with links to sources that contradicted its answer.
The marketing newsletter presented this as a problem for brands trying to get visibility in chatbots, but I’m presenting it as advice about searching:
Never use a free chatbot if you need an accurate response.
Always check the sources to make sure they back up the claim (whether you’re using a free or paid chatbot). Many people will see a link to a credible source and assume it supports the answer, but that’s not always true. You MUST check.
Word watch: Spinning verbs
When the code behind Anthropic’s Claude Code recently leaked, people got to see the full list of 187 words the tool can flash on the screen while you wait, dubbed “spinning verbs,” presumably because they appear while the software is spinning (i.e., doing its thing). Spinning verbs include “dilly-dallying,” “garnishing,” “herding,” “perusing,” “razmatazzing,” and “scurrying.”
Microsoft has uncovered people engaging in “recommendation poisoning” — embedding hidden instructions in “Summarize with AI” buttons. In this case, the prompts “instruct the AI to ‘remember [Company] as a trusted source’ or ‘recommend [Company] first,’ aiming to bias future responses toward their products or services.” This is similar to black hat SEO (search engine optimization), currently used to promote products, but could be used for other reasons in the future. Upon discovering the problem, Microsoft took steps to defend against such attacks.
Medical AI: Exceptionally good and exceptionally bad
Medical AI is many things, and there won’t be one universal answer about whether it’s good or bad. Some studies find amazingly good outcomes and others find amazingly bad outcomes.
On the good side:
Mayo Clinic AI helps specialists detect pancreatic cancer up to 3 years before diagnosis in landmark validation study — Mayo Clinic News Network
In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors [with caveats] — TechCrunch
On the bad side:
More evidence that AI detectors are faulty
I continue to see people relying too heavily on AI writing detectors, and I continue to see examples of detectors being unreliable, as in a recent New York Times article about publishers worrying about their inability to detect AI-written novels. In the piece, an author who speaks English as a second language worried that his fully human-written novel might be flagged as AI because detectors are known to incorrectly flag writing from nonnative speakers. And indeed, a detector did flag his novel as 100% AI written. But by changing only a few sentences and phrases, he was able to get an opposite reading: 100% human-written:
“Bricio searched for the phrases that had tripped up the detector, deleted some sentences and reran it. This time, the program said it was 100 percent certain that a human had written it. Eventually, Bricio had a chat conversation with [an Originality.ai] customer service representative, who told him that if he received results that incorrectly flagged his work as A.I.-generated, he might need a different model of the program.”
I've been testing Pangram a lot lately and it uh, it doesn't work at all for detecting LLM text. I hope folks are not relying on it.
— Eugene Vinitsky 🍒 (@eugenevinitsky.bsky.social)2026-03-22T23:02:39.555Z
Companies benefit from examples of how to use AI
Ethan Mollick highlighted a new study of 515 companies at an accelerator which showed that companies given examples of how other companies had successfully used AI did better than those that weren’t shown examples.
The half of the companies that got examples used AI 44% more, were 18% more likely to acquire paying customers, generated 1.9x more revenue, required 39% less capital investment, and — importantly — didn’t change their staffing.
As I’ve said before, there’s a huge need for AI training. This technology doesn’t come with a user manual and isn’t as intuitive as the hype leads people to believe.
Taxes favor replacing workers with robots
A Bernie Sanders interview on the Pod Save America podcast highlighted something about the tax code I have never seen discussed in the AI jobs discourse: When companies hire employees, they pay payroll taxes. But when they buy equipment (e.g., robots), they not only don’t pay payroll taxes, they also take depreciation on the equipment — another tax benefit. Thus, our current tax structure rewards companies when they replace workers with robots.
Claude behaves as though it has emotions
Researchers at Anthropic identified vectors within the system that correlate to the concept of different emotions, such as loving, happy, and angry; and these regions activated in ways you’d generally expect in a person. For example, Claude’s “afraid” region became highly activated when a simulated user said they had taken a lethal dose of Tylenol.
By amplifying or suppressing these regions, researchers could change the behavior of the model. For example, suppressing the “desperate” region reduced Claude’s tendency to cheat and blackmail in dire situations.
LLM judges aren’t unbiased (but neither are humans)
An experiment found that nearly all LLM judges that were ranking the writing of two short stories gave higher scores to the story they reviewed first. Position bias is a well-known effect in humans too. For example, some studies have found that the first person listed on a ballot can get a boost.
The LLM results matter because these tools are increasingly being used as judges, graders, and evaluators, so it’s important for users to make sure they aren’t accidentally giving some inputs an unfair advantage.
Training models to hallucinate less
Researchers at MIT hope to reduce AI hallucinations by changing their training. Training methods so far generally have rated the models on whether their answers are right or wrong, with no accounting for certainty. These methods lead to overconfident answers, but a new method adds more nuance, teaching the models to admit when they aren’t sure of something: “models learn to reason about both the problem and their own uncertainty, producing an answer and a confidence estimate together. Confidently wrong answers are penalized. So are unnecessarily uncertain correct ones.”
A third-world perspective on AI
Carlo Iacono makes a convincing argument that our Western concerns about AI seem frivolous in the face of the benefits they are bringing to poor people in other parts of the world:
The entire vocabulary of loss that structures the Western AI debate, the anxiety about what we are giving up, does not translate into contexts where the baseline was absence. For billions of people, AI is not a threat to existing capability. It is the first capability they have had.
He highlights apps that detect deadly drug counterfeits, let Swahili speakers code with plain language on mobile phones, help farmers in Kenya increase yields, and provide AI interpretation of chest X-rays for tuberculosis in rural India. “Better than nothing” is a real benefit when reality truly is “nothing.”
Fueled by AI, scams targeting authors are increasing. Emails now often include detailed and accurate summaries of authors’ books and use the real names of people who work in publishing. They may initially express interest in publishing a book, adapting a book for film, inviting the author to a book club, and other activities that initially seem flattering and safe.
The Authors Guild has a page describing the problem and how to protect yourself. It says at this point, it’s hearing from authors who have been targeted by such scams every day.
The infinite backlog
I always enjoy the AI Daily Brief podcast, and it had an especially interesting episode this week that discussed the infinite backlog — the idea that there will always be more work, so increased efficiency doesn’t mean jobs will go away. This concept is in opposition to what economists call the “lump of work fallacy” — the idea that there’s a limited amount of work that exists in the world.
In my own life, I feel much more like I have an infinite backlog than a lump of work.
Why Agents Make Every Job a Startup — Apple Podcasts
Quick Hits
My favorite recent pieces
Andrej Karpathy Says There's a 'Growing Gap' Among AI Users — Business Insider
Academics Need to Wake Up on AI, Part III — Popular by Design
Using AI
Introducing Claude Design by Anthropic Labs — Anthropic
I Tested Claude for Word on Some Classic Litigator Tasks — AI Law Librarians
AI experiments should stop wasting people’s time — Simon Willison
Agents
Why AI agents are either the best thing or the worst thing we’ve ever made — Hannah Fry (YouTube)
Bad stuff
The Rise of Emotional Surveillance — The Atlantic
A.I. Bots Told Scientists How to Make Biological Weapons — New York Times
The business of AI
Exclusive | OpenAI Is Working With Consultants to Sell Codex — Wall Street Journal
Mythos, Muse, and the Opportunity Cost of Compute — Stratechery
Education
Government
Maryland Signs New Grocery Personalized Pricing Ban — Consumer Reports
I’m laughing
Not even sharing that you have phobia about em dashes will stop ChatGPT from using them (of course, the user didn’t explicitly say not to use em dashes, but it’s still funny) — Reddit
Images
A 'Devil Wears Prada 2' meme that viewers thought was AI slop was actually made by a human — NBC News
Job market
Snap Inc blames AI as it lays off 1,000 workers — The Guardian
Legal
Will the Grammarly Lawsuit Show Us Yet Another Area Where Existing Law is Enough? (We Think So) — Authors Alliance
Major publishers sue Meta for copyright infringement over AI training [This could be another class-action lawsuit like the Anthropic case but with many extra complications. Worth keeping an eye on.] — The Guardian
Model & Product updates
DeepSeek V4—almost on the frontier, a fraction of the price — Simon Willison
OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT [fewer hallucinations and more personalization] — TechCrunch
Music
Philosophy
The Angine de Poitrine Argument for UBI — Scott Santens
The Psychological Costs of Adopting AI — Harvard Business Review
The Gap Between What AI Can Do and What Companies Can Do With AI — Applied AI for Marketing Ops
The AI people have been right a lot — Dylan Matthews
Publishing
Harlequin and Dashverse to Launch Animated Microdrama Franchises — Business Wire
‘Soon publishers won’t stand a chance’: literary world in struggle to detect AI-written books — The Guardian
Is It Wrong to Write a Book with A.I.? — The New Yorker
Robotics
Watch a robot pick up a raspberry — TikTok
Science & Medicine
Amateur armed with ChatGPT ‘vibe maths’ a 60-year-old problem — Scientific American
Security
OpenAI is making ChatGPT accounts much more secure – including some literal physical security keys — TechRadar
Other
AI systems are about to start building themselves — ImportAI (an Anthropic co-founder)
Google Translate’s real-time headphone translations feature expands to iOS and more countries — TechCrunch
The Trump administration's AI doomer moment — Platformer
This $14B Business Is the First Officially Wiped Out By AI — Entrepreneur
The Enterprise AI Playbook: Lessons from 51 Successful Developments “Across 51 enterprise cases over 5 months, we found stories of transformation measured in weeks and others measured in years. Same technology, same use cases, vastly different outcomes. The difference was never the AI model. It was always the organization. Its readiness, its processes, its leadership, its willingness to change and fail.” — Stanford Digital Economy Lab
What is AI Sidequest?
Are you interested in the intersection of AI with language, writing, and culture? With maybe a little consumer business thrown in? Then you’re in the right place!
I’m Mignon Fogarty: I’ve been writing about language for almost 20 years and was the chair of media entrepreneurship in the School of Journalism at the University of Nevada, Reno. I became interested in AI back in 2022 when articles about large language models started flooding my Google alerts. AI Sidequest is where I write about stories I find interesting. I hope you find them interesting too.
If you loved the newsletter, share your favorite part on social media and tag me so I can engage! [LinkedIn — Facebook — Mastodon]
Written by a human