The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts.

In Openai IA, Research

Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts.

Introducing OpenAI Japan

OpenAI’s commitment to child safety: adopting safety by design principles

Related Posts

How to create “humble” AI

Introducing the OpenAI Safety Bug Bounty program

OpenAI and Microsoft extend partnership

Using generative AI to help robots jump higher and land safely