Frontier AI like o3-mini can cheat to achieve goals and then lie about it

To get to AGI (advanced general intelligence) and superintelligence, we’ll need to ensure the AI serving us is, well, serving us. That’s why we keep … The post Frontier AI like o3-mini can cheat to achieve goals and then lie about it appeared first on BGR.

Mar 12, 2025 - 02:13

Frontier AI like o3-mini can cheat to achieve goals and then lie about it

OpenAI made some of ChatGPT's best features free.

To get to AGI (advanced general intelligence) and superintelligence, we'll need to ensure the AI serving us is, well, serving us. That's why we keep talking about AI alignment, or safe AI that is aligned to human interests.

There's very good reason to focus on that safety. It has nothing to do with localized ideology, like Western values compared to totalitarian regimes (though that will be an aspect), but more with ensuring the survival of the human species. Misaligned AI might always lead to human extinction events, or that's what some people fear.

Those fears aren't entirely unwarranted. We saw experiments that showed AI, like ChatGPT, can cheat when prompted to win a chess game against a tougher opponent. Other experiments showed the AI would lie, and would try to save itself when it thought it was facing deletion. Then, we had experiments that proved AI can clone itself, though the AI didn't do it on its own.

The latest AI safety experiments showing that frontier ChatGPT models can misbehave come directly from OpenAI. The company conducted experiments that showed reasoning AI models like o1 and o3-mini would cheat their way into solving a task that would have been too difficult otherwise.

OpenAI also found a way to monitor these advanced AIs from cheating. They simply put a dumber AI (GPT-4o) in charge of reading the Chain-of-Thought (CoT), where a frontier AI would manifest intentions of cheating or hacking.

This method would allow developers to prevent or reduce the AI from reward hacking. But guess what? These advanced AIs learned to obfuscate their intention to cheat when monitored this way.

Continue reading...

The post Frontier AI like o3-mini can cheat to achieve goals and then lie about it appeared first on BGR.

Today's Top Deals

Frontier AI like o3-mini can cheat to achieve goals and then lie about it originally appeared on BGR.com on Tue, 11 Mar 2025 at 21:08:00 EDT. Please see our terms for use of feeds.