OpenAI just unveiled o1, internally referred to as strawberry, a new huge language model that was trained using reinforcement learning to execute sophisticated reasoning. o1 considers its response before giving it; it is capable of going through a lengthy internal reasoning process.

Although there is still work to be done to make this new model as user-friendly as existing models, openai has made the OpenAI o1-preview and o1-mini early versions of the model available for immediate usage in ChatGPT and to trusted API users.

Chain-of-Thought

Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem. Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses. It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working. This process dramatically improves the model’s ability to reason. To illustrate this leap forward, we showcase the chain of thought from o1-preview on several difficult problems below.

Coding

as per openai, o1 scored 213 points and ranked in the 49th percentile in the 2024 International Olympiad in Informatics (IOI), by initializing from o1 and training to further improve programming skills. This model competed in the 2024 IOI under the same conditions as the human contestants. It had ten hours to solve six challenging algorithmic problems and was allowed 50 submissions per problem. ioi-code

Human preference evaluation

OpenAI assessed human preference of o1-preview vs. GPT-4o on difficult, open-ended prompts in a wide range of fields, in addition to tests and academic benchmarks. Human trainers participated in this evaluation by voting for their favorite anonymised response to a prompt from GPT-4o and o1-preview. In categories that require a lot of reasoning, such as algebra, coding, and data analysis, o1-preview is vastly superior to gpt-4o. O1-preview is not recommended for all natural language tasks, nevertheless, indicating that it is not appropriate in all situations. win rate matplotlib

Safety

Chain-of-thought reasoning opens up additional possibilities for safety and alignment. OpenAI discovered that incorporating model behavior policies into a reasoning model's logic is a reliable method for teaching human values and principles. Through educating the model safety rules and contextual reasoning, OpenAI discovered evidence of reasoning capacity that directly enhances model robustness: The o1-preview significantly outperformed the hardest internal benchmarks and important jailbreak evaluations when it came to assessing the safety refusal bounds of our model. Using a chain of thought, in my opinion, offers major improvements in terms of safety and alignment because: (1) it makes the model's reasoning more resilient to out-of-distribution circumstances; and (2) it makes the model's reasoning about safety rules observable.

Conclusion

o1 considerably pushes the boundaries of AI reasoning. As they continue to refine this model, openai intends to provide updated versions. Our capacity to connect models with human values and beliefs should improve as a result of these enhanced reasoning capabilities. openai think that o1 and its offspring will open up a plethora of new applications for AI in the domains of science, coding, math, and allied arts. We can't wait for consumers and API developers to learn how it can enhance their day-to-day tasks.

Some of the evaluation results

Appendix A

Dataset	Metric	gpt-4o	o1-preview	o1
Competition Math AIME (2024)	cons@64	13.4	56.7	83.3
pass@1	9.3	44.6	74.4
Competition Code CodeForces	Elo	808	1,258	1,673
Percentile	11.0	62.0	89.0
GPQA Diamond	cons@64	56.1	78.3	78.0
pass@1	50.6	73.3	77.3
Biology	cons@64	63.2	73.7	68.4
pass@1	61.6	65.9	69.2
Chemistry	cons@64	43.0	60.2	65.6
pass@1	40.2	59.9	64.7
Physics	cons@64	68.6	89.5	94.2
pass@1	59.5	89.4	92.8
MATH	pass@1	60.3	85.5	94.8
MMLU	pass@1	88.0	90.8	92.3
MMMU (val)	pass@1	69.1	n/a	78.1
MathVista (testmini)	pass@1	63.8	n/a	73.2

OpenAI released much awaited model openai o1 (Strawberry)