OpenAI confirms the new o3 and o3-mini frontier models

Join our daily and weekly newsletters to get the latest updates and exclusive content on industry-leading AI coverage. More information

OpenAI is slowly inviting select users to try out a brand new set of reasoning models called o3 and o3 mini, successors to the o1 and o1-mini models that were just fully released earlier this month.

OpenAI o3, named so to avoid copyright issues with phone company O2 and because CEO Sam Altman says the company “has a tradition of being really bad with names,” was announced today during the final day of broadcasts. live from “12 Days of OpenAI”.

Altman said the two new models would initially be given to selected outside researchers to security testingwith o3-mini expected by the end of January 2025 and o3 “shortly after.”

“We see this as the beginning of the next phase of AI, where you can use these models to perform increasingly complex tasks that require a lot of reasoning,” Altman said. “For the last day of this event we thought it would be fun to go from one frontier model to the next frontier model.”

The announcement comes just a day after Google unveiled and allowed the public to use its new Gemini 2.0 Flash Thinking model, another rival “reasoning” model that, unlike the OpenAI o1 series, allows users to see the steps documented. of your “thinking” process. in text bullets.

The release of Gemini 2.0 Flash Thinking and now the o3 announcement show that the competition between OpenAI and Google, and the broader field of AI model providers, is entering an intense new phase, as they offer not only LLM or multimodal models, but also advanced ones. Also reasoning models. These may be more applicable to more difficult problems in science, mathematics, technology, physics, and more.

Best third-party benchmark performance ever

Altman also said that the o3 model was “amazing at coding,” and benchmarks shared by OpenAI back this up, showing that the model outperforms even o1 on programming tasks.

• Outstanding Encoding Performance: o3 outperforms o1 by 22.8 percentage points on SWE-Bench Verified and achieves a Codeforces score of 2727, surpassing OpenAI’s chief scientist score of 2665.

• Mathematics and Science Mastery: o3 scores 96.7% on the AIME 2024 exam, missing only one question, and achieves 87.7% on GPQA Diamond, far surpassing expert human performance.

• Border reference points: The model sets new records on challenging tests like EpochAI’s Frontier Math, solving 25.2% of problems where no other model surpasses 2%. In the ARC-AGI test, o3 triples the score of o1 and exceeds 85% (as verified live by the ARC Prize team), representing a milestone in conceptual reasoning.

Deliberative alignment

Along with these advancements, OpenAI reinforced its commitment to security and alignment.

The company presented New research on deliberative alignmenta fundamental technique for creating your most robust and aligned model to date.

This technique embeds human-written security specifications into models, allowing them to explicitly reason about these policies before generating responses.

The strategy seeks to solve common security challenges in LLMs, such as vulnerability to jailbreak attacks and over-rejection of benign prompts, by equipping the models with chain-of-thought (CoT) reasoning. This process allows models to remember and apply security specifications dynamically during inference.

Deliberative alignment improves on previous methods, such as reinforcement learning from human feedback (RLHF) and constitutional AI, which rely on security specifications only for label generation rather than incorporating policies directly into the models.

By fine-tuning LLMs based on security-related cues and their associated specifications, this approach creates models capable of policy-based reasoning without relying heavily on human-labeled data.

Results shared by OpenAI researchers in a new article, not peer reviewed indicate that this method improves performance on security benchmarks, reduces harmful outcomes, and ensures better adherence to content and style guidelines.

Key findings highlight the advancements of the o1 model over its predecessors such as GPT-4o and other next-generation models. Deliberative alignment allows the o1 series to excel in resisting leaks and providing safe completions, while minimizing excessive rejections in benign indications. Furthermore, the method facilitates out-of-distribution generalization, showing robustness in encrypted and multilingual jailbreak scenarios. These improvements align with OpenAI’s goal of making AI systems more secure and interpretable as their capabilities grow.

This research will also play a key role in the alignment of o3 and o3-mini, ensuring their capabilities are powerful and responsive.

How to request access to the o3 and o3-mini test

Early access applications are now open at the AI Open Website and will close on January 10, 2025.

Applicants must fill out an online form shape which asks them for a variety of information, including research focus, past experiences, and links to previously published articles and their code repositories on Github, and they select which of the models, o3 or o3-mini, they also want to try. like what they plan to use them for.

Selected researchers will have access to o3 and o3-mini to explore their capabilities and contribute to security assessments, although the OpenAI form warns that o3 will not be available for several weeks.

Researchers are encouraged to develop robust evaluations, create controlled demonstrations of high-risk capabilities, and test models in scenarios not possible with widely adopted tools.

This initiative builds on the company’s established practices, including rigorous internal security testing, collaborations with organizations such as the US and UK AI Safety Institutes, and its Readiness Framework.

OpenAI will review applications on a rolling basis and selections will begin immediately.

A new leap forward?

The introduction of o3 and o3-mini signals a leap forward in AI performance, particularly in areas that require advanced reasoning and problem-solving capabilities.

With their exceptional results in coding, mathematics, and conceptual benchmarks, these models highlight the rapid progress being made in AI research.

By inviting the broader research community to collaborate on security testing, OpenAI aims to ensure that these capabilities are implemented responsibly.

Watch the broadcast below:

Daily insights into business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory changes to practical implementations, so you can share insights for maximum return on investment.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occurred.

Please follow and like us:

OpenAI confirms the new o3 and o3-mini frontier models

Best third-party benchmark performance ever

Deliberative alignment

How to request access to the o3 and o3-mini test

A new leap forward?

Like this:

Premier League roundup: Fulham shock Chelsea, Forest beat Spurs

Israel launches new airstrikes in Yemen, killing 3 people, Houthi media say | CBC News

Leave a comment Cancel reply

Blog Post

Best third-party benchmark performance ever

Deliberative alignment

How to request access to the o3 and o3-mini test

A new leap forward?

Like this:

Premier League roundup: Fulham shock Chelsea, Forest beat Spurs

Israel launches new airstrikes in Yemen, killing 3 people, Houthi media say | CBC News

Leave a comment Cancel reply