Claude-3.7 outperforms other AI in Super Mario Bros, but it’s still no gamer

Last week, BGR reported on Claude’s journey playing Pokemon Red. While thousands of players playing it at the same time was more efficient—since the AI … The post Claude-3.7 outperforms other AI in Super Mario Bros, but it’s still no gamer appeared first on BGR.

Mar 7, 2025 - 18:30
 0
Claude-3.7 outperforms other AI in Super Mario Bros, but it’s still no gamer

Claude-3.7 plays Super Mario Bros

Last week, BGR reported on Claude's journey playing Pokemon Red. While thousands of players playing it at the same time was more efficient—since the AI is still stuck on Mt. Moon—researchers think the next AI breakthrough might be related to live games.

Led by Hao Zhang, an assistant professor at UC San Diego, the research team is developing custom frameworks to test the capabilities of the leading AI models at gaming.

While Claude has been kind of disastrous playing Pokemon Red (it seems it doesn't have what it takes to become a Pokemon Master), it sucks a little bit less than Gemini-1.5 Pro and GPT-4o. Comparing Claude-3.7 and Claude-3.5, the newer AI is more responsive and seems to know a bit more about what needs to be done in Super Mario Bros. In addition to this classic Nintendo game, the researchers are also testing 2048 and Tetris, with more games coming soon.

https://twitter.com/haoailab/status/1895557913621795076

Another test is with Roblox. A blog post explains: "We developed a live Roblox game, AI Space Escape, powered by state-of-the-art large language models (LLMs), offering a unique experience to reason with AI. Beyond entertainment, our game generates gaming data for evaluating AI reasoning abilities in real-world scenarios, extending beyond math and coding benchmarks. All gaming data, evaluation scripts, and code are publicly available for further research."

We still have to wait for Claude and other AI improvements to see how these models can continue to evolve playing games. For the Pokemon Red experiment, the developer explained that what sets Claude apart is that it can see what's happening, understand the game state, and make decisions "similar to how a human player would"—although I might disagree, as the AI is still suffering to pass one of the first "dungeons" of the game.

The post Claude-3.7 outperforms other AI in Super Mario Bros, but it’s still no gamer appeared first on BGR.

Today's Top Deals

  1. Today’s deals: Apple Watch S10 at all-time low price, $16 3-in-1 wireless charger, laptop sale, more
  2. Best Apple deals for March 2025
  3. 71 best cheap Apple deals under $100
  4. Today’s deals: $164 iPhone SE, $799 M2 MacBook Air, $300 off Peloton Bike, $28 Anker waterproof speaker, more

Claude-3.7 outperforms other AI in Super Mario Bros, but it’s still no gamer originally appeared on BGR.com on Fri, 7 Mar 2025 at 12:29:00 EDT. Please see our terms for use of feeds.