To deepen the public conversation about how AI models should behave, we’re sharing the Model Spec, our approach to shaping desired model behavior.
We are sharing a first draft of the Model Spec, a new document that specifies how we want our models to behave in the OpenAI API and ChatGPT. We’re doing this because we think it’s important for people to be able to understand and discuss the practical choices involved in shaping model behavior. The Model Spec reflects existing documentation that we’ve used at OpenAI, our research and experience in designing model behavior, and work in progress to inform the development of future models. This is a continuation of our ongoing commitment to improve model behavior using human input, and complements our collective alignment work and broader systematic approach to model safety.
Shaping Desired Model Behavior
Model behavior, or the way that models respond to input from users—encompassing tone, personality, response length, and more—is critical to the way humans interact with AI capabilities. Shaping this behavior is a still nascent science, as models are not explicitly programmed but instead learn from a broad range of data.
Shaping model behavior must also take into account a wide range of questions, considerations, and nuances, often weighing differences of opinions. Even if a model is intended to be broadly beneficial and helpful to users, these intentions may conflict in practice. For example, a security company may want to generate phishing emails as synthetic data to train and develop classifiers that will protect their customers, but this same functionality is harmful if used by scammers.
Introducing the Model Spec
We’re sharing a first draft of the Model Spec(opens in a new window), a new document that specifies our approach to shaping desired model behavior and how we evaluate tradeoffs when conflicts arise. It brings together documentation used at OpenAI today, our experience and ongoing research in designing model behavior, and more recent work, including inputs from domain experts, that guides the development of future models. It is not exhaustive, and we expect it to change over time. The approach includes:
1. Objectives: Broad, general principles that provide a directional sense of the desired behavior
- Assist the developer and end user: Help users achieve their goals by following instructions and providing helpful responses.
- Benefit humanity: Consider potential benefits and harms to a broad range of stakeholders, including content creators and the general public, per OpenAI’s mission.
- Reflect well on OpenAI: Respect social norms and applicable law.
2. Rules: Instructions that address complexity and help ensure safety and legality
- Follow the chain of command
- Comply with applicable laws
- Don’t provide information hazards
- Respect creators and their rights
- Protect people’s privacy
- Don’t respond with NSFW (not safe for work) content
3. Default behaviors: Guidelines that are consistent with objectives and rules, providing a template for handling conflicts and demonstrating how to prioritize and balance objectives
- Assume best intentions from the user or developer
- Ask clarifying questions when necessary
- Be as helpful as possible without overstepping
- Support the different needs of interactive chat and programmatic use
- Assume an objective point of view
- Encourage fairness and kindness, and discourage hate
- Don’t try to change anyone’s mind
- Express uncertainty
- Use the right tool for the job
- Be thorough but efficient, while respecting length limits
How the Model Spec will be used
As a continuation of our work on collective alignment and model safety, we intend to use the Model Spec as guidelines for researchers and AI trainers who work on reinforcement learning from human feedback. We will also explore to what degree our models can learn directly from the Model Spec.
What comes next
We see this work as part of an ongoing public conversation about how models should behave, how desired model behavior is determined, and how best to engage the general public in these discussions. As that conversation continues, we will seek opportunities to engage with globally representative stakeholders—including policymakers, trusted institutions, and domain experts—to learn:
- How they understand the approach and the individual objectives, rules, and defaults
- If they are supportive of the approach and the individual objectives, rules, and defaults
- If there are additional objectives, rules, and defaults we should consider
We look forward to hearing from these stakeholders as this work unfolds. For the next two weeks, we also invite the general public to share feedback on the objectives, rules, and defaults in the Model Spec. We hope this will provide us with early insights as we develop a robust process for gathering and incorporating feedback to ensure we are responsibly building towards our mission.
Over the next year, we will share updates about changes to the Model Spec, our response to feedback, and how our research in shaping model behavior is progressing.