One of the World's Most Advanced AI Agents Is Completely Stuck Trying to Beat a Pokémon Game for Children

In case you haven't heard, Anthropic has been livestreaming its AI model, Claude 3.7 Sonnet, as it attempts to complete a playthrough of Pokémon Red. The experiment, dubbed "Claude Plays Pokémon," is intended to be a demonstration of "agentic" AI, the industry's latest frontier to create AI models that are capable of operating autonomously by interacting with their environment. Claude has managed to get surprisingly far into the game, clinching three Gym badges and reaching, as of this week, Cerulean City.  But it plods along at a painstakingly slow pace,  stopping to "think" after every single move, sometimes for longer […]

Mar 21, 2025 - 22:37
 0
One of the World's Most Advanced AI Agents Is Completely Stuck Trying to Beat a Pokémon Game for Children
Claude 3.7 Sonnet, one of the industry's leading AI models, has hit a wall in its playthrough of Pokémon Red.

In case you haven't heard, Anthropic has been livestreaming its AI model, Claude 3.7 Sonnet, attempting to complete a playthrough of Pokémon Red.

The experiment, dubbed "Claude Plays Pokémon," is intended to be a demonstration of "AI agents," the industry's ongoing race to create AI models that are capable of operating autonomously by interacting with their environment.

Claude has managed to get surprisingly far into the game, clinching three Gym badges and reaching, as of this week, Cerulean City. But it plods along at a painstakingly slow pace, stopping to "think" after every single move, sometimes for longer intervals than others. For nearly 80 agonizing hours, for instance, Claude bumbled cluelessly around Mt. Moon, before finally finding the ladder it needed to escape. Invested Twitch viewers breathed a sigh of relief.

Progress isn't looking poised to speed up. The Anthropic AI's excursion through the Kanto region has mostly devolved into running around in circles, unsure of its next move. It needs to hop on Route 5 to reach the next stage, but where and how?

A text window in the livestream of Claude's thought process shows that the AI is using a process of elimination to rule out which locations aren't the Route 5 entrance. But will it piece together that it needs to use the HM "Cut" on a few destructible trees to access the fabled path? It's not looking likely: it keeps repeating how it needs to find the "gatehouse" to the route instead. 

In short, Claude is stuck. One of the AI industry's leading models may well be stumped by a game that's been beaten by literal children for generations.

According to engineers, a major challenge for Claude is visually processing what it sees in the game. Claude excels at interpreting the game's text-based portions, including the Pokémon battles. It also has access to the game's RAM to glean information like its in-game coordinates. But it can't consistently interpret the tiny number of pixels that make up its low-res environment.

"Claude's still not particularly good at understanding what's on the screen at all," David Hershey, the Anthropic engineer behind the Pokémon experiment, told Ars Technica in a recent interview. "You will see it attempt to walk into walls all the time." Ironically, Hershey suggests, if Claude was playing a more visually realistic game, it might do better.

"It's pretty easy for me to understand that [an in-game] building is a building and that I can't walk through a building," Hershey added. "And that's [something] that's pretty challenging for Claude to understand."

There are times, however, when Claude is surprisingly clever, like responding to in-game clues that are designed to be misleading.

"It's pretty funny that they tell you you need to go find Professor Oak next door and then he's not there," Hershey told Ars, describing one of the first missions in the game. "As a 5-year-old, that was very confusing to me. But Claude actually typically goes through that same set of motions where it talks to mom, goes to the lab, doesn't find [Oak], says, 'I need to figure something out.'"

"It's sophisticated enough to sort of go through the motions of the way [humans are] actually supposed to learn it, too," Hershey added.

So maybe all is not lost yet. There's still plenty of time for Claude 3.7 Sonnet to turn things around. It's gotten significantly farther than its predecessor 3.0 Sonnet, which couldn't even make it out of Pallet Town, the game's starting area. Still, its struggles show that the technology still has a long way to go to be "agentic," let alone fulfill its promise of one day exceeding human capabilities.

More on gaming: Voice Actor for Aloy in "Horizon" Games Creeped Out by AI Version of Her Character

The post One of the World's Most Advanced AI Agents Is Completely Stuck Trying to Beat a Pokémon Game for Children appeared first on Futurism.