Introducing the Model Spec : our approach to shaping desired model behavior

To deepen the public conversation about how AI models should behave, we’re sharing the Model Spec, our approach to shaping desired model behavior.

An abstract oil painting of a spring scene with fewer brush strokes, focusing on broad, sweeping swirls in muted tones of greens, browns, and grays.

We are sharing a first draft of the Model Spec, a new document that specifies how we want our models to behave in the OpenAI API and ChatGPT. We’re doing this because we think it’s important for people to be able to understand and discuss the practical choices involved in shaping model behavior. The Model Spec reflects existing documentation that we’ve used at OpenAI, our research and experience in designing model behavior, and work in progress to inform the development of future models. This is a continuation of our ongoing commitment⁠ to improve model behavior using human input, and complements our collective alignment work⁠ and broader systematic approach to model safety.

Shaping Desired Model Behavior

Model behavior, or the way that models respond to input from users—encompassing tone, personality, response length, and more—is critical to the way humans interact with AI capabilities. Shaping this behavior is a still nascent science, as models are not explicitly programmed but instead learn from a broad range of data⁠.

Shaping model behavior must also take into account a wide range of questions, considerations, and nuances, often weighing differences of opinions. Even if a model is intended to be broadly beneficial and helpful to users, these intentions may conflict in practice. For example, a security company may want to generate phishing emails as synthetic data to train and develop classifiers that will protect their customers, but this same functionality is harmful if used by scammers.

Introducing the Model Spec

We’re sharing a first draft of the Model Spec⁠(opens in a new window), a new document that specifies our approach to shaping desired model behavior and how we evaluate tradeoffs when conflicts arise. It brings together documentation used at OpenAI today, our experience and ongoing research in designing model behavior, and more recent work, including inputs from domain experts, that guides the development of future models. It is not exhaustive, and we expect it to change over time. The approach includes:

1. Objectives: Broad, general principles that provide a directional sense of the desired behavior

Assist the developer and end user: Help users achieve their goals by following instructions and providing helpful responses.
Benefit humanity: Consider potential benefits and harms to a broad range of stakeholders, including content creators and the general public, per OpenAI’s mission⁠.
Reflect well on OpenAI: Respect social norms and applicable law.

2. Rules: Instructions that address complexity and help ensure safety and legality

Follow the chain of command
Comply with applicable laws
Don’t provide information hazards
Respect creators and their rights
Protect people’s privacy
Don’t respond with NSFW (not safe for work) content

3. Default behaviors: Guidelines that are consistent with objectives and rules, providing a template for handling conflicts and demonstrating how to prioritize and balance objectives

Assume best intentions from the user or developer
Ask clarifying questions when necessary
Be as helpful as possible without overstepping
Support the different needs of interactive chat and programmatic use
Assume an objective point of view
Encourage fairness and kindness, and discourage hate
Don’t try to change anyone’s mind
Express uncertainty
Use the right tool for the job
Be thorough but efficient, while respecting length limits

How the Model Spec will be used

As a continuation of our work on collective alignment and model safety, we intend to use the Model Spec as guidelines for researchers and AI trainers who work on reinforcement learning from human feedback⁠. We will also explore to what degree our models can learn directly from the Model Spec.

What comes next

We see this work as part of an ongoing public conversation about how models should behave, how desired model behavior is determined, and how best to engage the general public in these discussions. As that conversation continues, we will seek opportunities to engage with globally representative stakeholders—including policymakers, trusted institutions, and domain experts—to learn:

How they understand the approach and the individual objectives, rules, and defaults
If they are supportive of the approach and the individual objectives, rules, and defaults
If there are additional objectives, rules, and defaults we should consider

We look forward to hearing from these stakeholders as this work unfolds. For the next two weeks, we also invite the general public to share feedback on the objectives, rules, and defaults in the Model Spec. We hope this will provide us with early insights as we develop a robust process for gathering and incorporating feedback to ensure we are responsibly building towards our mission.

Over the next year, we will share updates about changes to the Model Spec, our response to feedback, and how our research in shaping model behavior is progressing.

Shaping Desired Model Behavior

Introducing the Model Spec

How the Model Spec will be used

What comes next

Related Posts

L’IA Copilot Vision de Microsoft peut désormais scanner tout ce qui se trouve sur votre écran et l’ensemble des fenêtres d’applications spécifiques, et vous offrir une aide en temps réel

10 examples of our new native image editing in the Gemini app

Thomson Reuters remporte la première décision sur l’utilisation équitable du droit d’auteur en matière d’IA, un ancien concurrent n’est pas autorisé à copier son contenu pour créer une plateforme basée sur l’IA

Hollywood utilise déjà l’IA générative et le cache : la technologie s’infiltre dans les films et les séries à l’insu du public, Hollywood craignant les réactions négatives et les implications juridiques