AI will continue to need a human nudge

June 25, 2025

AI has come a long way, but still makes mistakes

In just a few years, we've seen incredible AI breakthroughs. OpenAI released GPT-2 in 2019, then GPT-3 in 2020, and GPT-4 in 2023. Google launched Gemini, and Anthropic introduced Claude. Each new model got smarter and more capable.

But there's one problem that hasn't gone away: LLMs still "hallucinate". They sometimes generate information that sounds convincing but is completely wrong. Given LLMs are fundamentally next token prediction models, achieving a balance between creativity and hallucination will continue to be tricky.

As this NYTimes article points out - AI is getting more powerful, but its hallucinations are getting worse.

For example, when asked "How many r's are in the word strawberry?", LLMs might confidently answer "2" when there are actually 3. Or they might generate a completely fictional scientific study with realistic-sounding citations. The information feels authoritative, but it's entirely made up.

Right now, this isn't a huge problem

Today, most people use AI through simple chat interfaces. When AI tools do connect to other systems (like browsing the web or accessing files), they're limited to safe, basic tasks. Most importantly, humans usually check the AI's work before acting on it.

But this is about to change dramatically.

AI is moving into everything

Soon, AI will be embedded in our phones, cars, home appliances, and countless other devices. When AI starts making decisions that affect our daily lives—not just answering questions—we'll need to be much more careful about checking its work.

Imagine your smart home AI deciding to turn off your heating system because it "thinks" you're not home, or your car's AI choosing a dangerous route because it misinterpreted traffic data. These aren't hypothetical scenarios—they're the reality we're heading toward.

Here's the challenge: AI can generate responses incredibly fast, but humans need time to verify those responses. As AI researcher Andrej Karpathy pointed out in his recent Software Is Changing (Again) talk - "Verification is the Bottleneck."*

You've probably experienced this already

If you've used AI coding tools like Cursor, Windsurf, or Lovable, you know exactly what this feels like. The AI can instantly write 100+ lines of code, but sometimes it accidentally breaks existing functionality. You end up spending most of your time carefully reviewing and approving or rejecting the AI's suggestions.

Now imagine this same dynamic when AI controls robots in the physical world. The stakes become much higher.

AI faces new challenges every day

Beyond making mistakes, AI sometimes encounters situations it has never seen before. The real world constantly changes—new laws, new objects, new social norms. AI models are trained on data from the past, so they can't always handle brand-new situations.

Consider a delivery robot that encounters a new type of construction barrier it's never seen before, or an AI assistant trying to help with a recently passed law that wasn't in its training data. These "edge cases" happen constantly in the real world, and they require human judgment to navigate safely.

The cost of getting it wrong keeps rising

As AI systems gain more autonomy, the consequences of errors become more severe. A chatbot giving wrong information is annoying. An autonomous vehicle making a poor decision could be deadly. A financial AI making incorrect trades could cost millions.

This is why companies like Tesla still require human drivers to supervise their "Full Self-Driving" system, and why most AI-powered medical tools still require doctor approval before making diagnoses. The stakes are simply too high to let AI operate without human oversight.

The bottom line: AI will always need human guidance

The verification bottleneck isn't a temporary problem that will disappear as AI gets better. It's a permanent part of how humans and AI will work together. Whether you're reviewing AI-generated code, approving a robot's next move, or helping a delivery robot navigate around an obstacle, humans provide something AI lacks: common sense, ethical judgment, and real-world understanding.

What this means for the future

As AI models continue to improve at handling hallucinations, our focus should shift toward building efficient tools that enhance AI-human collaboration. This means:

The companies that succeed in the AI era won't be the ones that eliminate humans from the loop, they'll be the ones that create the smoothest, most effective human-AI partnerships.

As we move toward an AI-integrated future, remember: the goal isn't to replace human judgment, but to augment it. The most powerful AI systems will be the ones that know exactly when to hand control back to a human.