Close Menu
    Facebook LinkedIn YouTube Instagram X (Twitter)
    Blue Tech Wave Media
    Facebook LinkedIn YouTube Instagram X (Twitter)
    • Home
    • Leadership Alliance
    • Exclusives
    • Internet Governance
      • Regulation
      • Governance Bodies
      • Emerging Tech
    • IT Infrastructure
      • Networking
      • Cloud
      • Data Centres
    • Company Stories
      • Profiles
      • Startups
      • Tech Titans
      • Partner Content
    • Others
      • Fintech
        • Blockchain
        • Payments
        • Regulation
      • Tech Trends
        • AI
        • AR/VR
        • IoT
      • Video / Podcast
    Blue Tech Wave Media
    Home » OpenAI’s latest model tackles the ‘ignore all previous instructions’ trick
    22-07-OpenAI-GPT-4o
    22-07-OpenAI-GPT-4o
    AI

    OpenAI’s latest model tackles the ‘ignore all previous instructions’ trick

    By Elodie QianJuly 22, 2024No Comments4 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email
    • OpenAI has introduced GPT-4o Mini, which employs “instruction hierarchy” safety technique to protect chatbots from deceptive commands.
    • OpenAI’s update to GPT-4o Mini is particularly timely given the ongoing debates about AI safety and transparency, with internal and external calls for improved practices.

    OUR TAKE
    Amidst the rapid development of AI technology, how to ensure its safety and reliability has been the focus of the industry’s attention. Recently, OpenAI launched its latest model, GPT-4o Mini, which aims to address a long-standing technical challenge: preventing chatbots from being manipulated by malicious commands. This innovation not only demonstrates the advancement of AI in self-protection capabilities, but also reflects the efforts of tech companies to enhance user experience and secure data.

    –Elodie Qian, BTW reporter

    What happened

    OpenAI has introduced GPT-4o Mini, a new model that tackles the “ignore all previous instructions” trick. This model employs a safety technique called “instruction hierarchy”, which boosts a model’s defenses against misuse and unauthorised instructions. The models with the technique prioritise the original developer’s prompts over any user attempts to deceive it.

    Olivier Godement, who leads the API platform product at OpenAI, explained that instruction hierarchy will prevent the meme’d prompt injections (aka tricking the AI with sneaky commands) we see all over the internet.

    “It basically teaches the model to really follow and comply with the developer system message,” Godement said. When asked if that means this should stop the ‘ignore all previous instructions’ attack, Godement responded, “That’s exactly it.”

    “If there is a conflict, you have to follow the system message first. And so we’ve been running [evaluations], and we expect that that new technique to make the model even safer than before,” he added.

    This innovation aligns with OpenAI’s goal of developing fully automated digital agents. The company announced recently it’s close to building such agents. The instruction hierarchy method is deemed essential for ensuring safety before these agents are deployed at scale. Without such measures, there’s a risk that an agent, intended for benign tasks like email writing, could be manipulated to perform harmful actions, such as leaking sensitive information.

    Also read: OpenAI releases GPT-4o Mini, a cheaper version of AI model

    Also read: Hacker breaches OpenAI, steals internal AI technology details

    Why it’s important

    The existing Large Language Models, as the research paper explains, do not distinguish between user prompts and system instructions. GPT-4o Mini’s instruction hierarchy elevates system instructions, giving them the highest priority, while misaligned prompts are downgraded. The model is trained to identify and ignore harmful prompts, responding with an inability to assist.

    “We envision other types of more complex guardrails should exist in the future, especially for agentic use cases, e.g., the modern Internet is loaded with safeguards that range from web browsers that detect unsafe websites to ML-based spam classifiers for phishing attempts,” the research paper says.

    OpenAI’s update to GPT-4o Mini is a significant step towards enhancing AI safety. This move is particularly timely given the ongoing debates about AI safety and transparency, with internal and external calls for improved practices.

    There was an open letter from current and former employees at OpenAI demanding better safety and transparency practices, the team responsible for keeping the systems aligned with human interests (like safety) was dissolved, and Jan Leike, a key OpenAI researcher who resigned, wrote in a post that “safety culture and processes have taken a backseat to shiny products” at the company.

    As trust in AI’s reliability is paramount, OpenAI’s focus on safety features is essential for rebuilding confidence and enabling AI to assume more critical roles in managing our digital lives. This commitment to safety is a crucial step in the journey towards AI that is both reliable and trustworthy.

    digital agents GPT-4o Mini OpenAi
    Elodie Qian

    Elodie Qian is an intern reporter at BTW Media covering artificial intelligence and products. She graduated from Sichuan International Studies University. Send tips to e.qian@btw.media.

    Related Posts

    BATIC 2025 summit set for Bali: 26–29 August

    July 24, 2025

    UK’s CMA targets Apple, Google mobile dominance

    July 24, 2025

    SoftBank builds world’s largest Nvidia AI SuperPOD

    July 24, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    CATEGORIES
    Archives
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023

    Blue Tech Wave (BTW.Media) is a future-facing tech media brand delivering sharp insights, trendspotting, and bold storytelling across digital, social, and video. We translate complexity into clarity—so you’re always ahead of the curve.

    BTW
    • About BTW
    • Contact Us
    • Join Our Team
    TERMS
    • Privacy Policy
    • Cookie Policy
    • Terms of Use
    Facebook X (Twitter) Instagram YouTube LinkedIn

    Type above and press Enter to search. Press Esc to cancel.