By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Pratzo - Daily NewsPratzo - Daily NewsPratzo - Daily News
Notification Show More
Font ResizerAa
  • Technology
    • AI & Machine Learning
    • Software & Apps
    • Hardware & Gadgets
    Technology
    Show More
    Top News
    Samsung Galaxy Z Fold 7, Galaxy Z Flip 7 Battery Capacities Tipped via Certification Site
    May 15, 2025
    Euclid Space Telescope Discovers Rare Einstein Ring, Revealing Dark Matter Secrets
    February 12, 2025
    Amazon Trims Jobs in Devices and Services Unit
    May 16, 2025
    Latest News
    Red Magic Astra Gaming Tablet Launched With Snapdragon 8 Elite SoC, 8,200mAh Battery
    July 2, 2025
    Samsung Galaxy Z Flip 7 FE Name Appears in Alleged Third-Party Case Listing Alongside Galaxy Z Flip 7
    July 2, 2025
    Poco F7 5G Confirmed to Get Snapdragon 8s Gen 4 Chipset Ahead of June 24 Launch
    July 2, 2025
    Threads Rolls Out DMs With Message Controls, Inbox Filters for Users Aged 18 and Above
    July 2, 2025
  • Digital Marketing
    • Social Media Updates
    • PPC & Ads Insights
    • SEO Trends
    • Content Marketing Strategies
    Digital MarketingShow More
    70% of Senior Marketers Support Google’s Decision to Retain Third-Party Cookies on Chrome
    December 6, 2024
  • Lifestyle & Productivity
    • Personal Productivity Tools
    • Smart Home Tech
    • Wearables
    • Wellness Gadgets
    Lifestyle & ProductivityShow More
    Allu Arjun’s Bail Hearing Postponed to January 3
    December 31, 2024
    Pushpa 2 Full Movie Leaked Online
    Pushpa 2 Full Movie Leaked Online: A Major Setback Despite Record Pre-Sales
    December 5, 2024
    Pushpa 2: The Rule Movie Review – A Gripping Mass Entertainer
    December 5, 2024
  • Automobile
    AutomobileShow More
    New Petrol Price in India: Crude Oil Prices Fall – Check Today’s Rates
    January 25, 2025
    All-New Honda Amaze 2025 Launched in India – Prices Start at ₹7.99 Lakh
    December 5, 2024
    Mahindra XEV 9e Launched In India Priced At ₹ 21.90 Lakh: Check Range, Features, and More
    November 27, 2024
Reading: Anthropic Developing Constitutional Classifiers to Safeguard AI Models From Jailbreak Attempts
Share
Font ResizerAa
Pratzo - Daily NewsPratzo - Daily News
Search
Follow US
Pratzo - Daily News > Technology > Anthropic Developing Constitutional Classifiers to Safeguard AI Models From Jailbreak Attempts
Technology

Anthropic Developing Constitutional Classifiers to Safeguard AI Models From Jailbreak Attempts

admin
Last updated: February 4, 2025 2:13 pm
admin Published February 4, 2025
Share
SHARE

Anthropic announced the development of a new system on Monday that can protect artificial intelligence (AI) models from jailbreaking attempts. Dubbed Constitutional Classifiers, it is a safeguarding technique that can detect when a jailbreaking attempt is made at the input level and prevent the AI from generating a harmful response as a result of it. The AI firm has tested the robustness of the system via independent jailbreakers and has also opened a temporary live demo of the system to let any interested individual test its capabilities.

Anthropic Unveils Constitutional Classifiers

Jailbreaking in generative AI refers to unusual prompt writing techniques that can force an AI model to not adhere to its training guidelines and generate harmful and inappropriate content. Jailbreaking is not a new thing, and most AI developers implement several safeguards against it within the model. However, since prompt engineers keep creating new techniques, it is difficult to build a large language model (LLM) that is completely protected from such attacks.

Some jailbreaking techniques include extremely long and convoluted prompts that confuse the AI’s reasoning capabilities. Others use multiple prompts to break down the safeguards, and some even use unusual capitalisation to break through AI defences.

In a post detailing the research, Anthropic announced that it is developing Constitutional Classifiers as a protective layer for AI models. There are two classifiers — input and output — which are provided with a list of principles to which the model should adhere. This list of principles is called a constitution. Notably, the AI firm already uses constitutions to align the Claude models.

constitutional classifier Constitutional Classifiers

How Constitutional Classifiers work
Photo Credit: Anthropic

 

Now, with Constitutional Classifiers, these principles define the classes of content that are allowed and disallowed. This constitution is used to generate a large number of prompts and model completions from Claude across different content classes. The generated synthetic data is also translated into different languages and transformed into known jailbreaking styles. This way, a large dataset of content is created that can be used to break into a model.

This synthetic data is then used to train the input and output classifiers. Anthropic conducted a bug bounty programme, inviting 183 independent jailbreakers to attempt to bypass Constitutional Classifiers. An in-depth explanation of how the system works is detailed in a research paper published on arXiv. The company claimed no universal jailbreak (one prompt style that works across different content classes) was discovered.

Further, during an automated evaluation test, where the AI firm hit Claude using 10,000 jailbreaking prompts, the success rate was found to be 4.4 percent, as opposed to 86 percent for an unguarded AI model. Anthropic was also able to minimise excessive refusals (refusal of harmless queries) and additional processing power requirements of Constitutional Classifiers.

However, there are certain limitations. Anthropic acknowledged that Constitutional Classifiers might not be able to prevent every universal jailbreak. It could also be less resistant towards new jailbreaking techniques designed specifically to beat the system. Those interested in testing the robustness of the system can find the live demo version here. It will stay active till February 10.

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who’sThat360 on Instagram and YouTube.


WhatsApp for Android Begins Testing Ability to Open View Once Media on Linked Devices


source

You Might Also Like

Red Magic Astra Gaming Tablet Launched With Snapdragon 8 Elite SoC, 8,200mAh Battery

Samsung Galaxy Z Flip 7 FE Name Appears in Alleged Third-Party Case Listing Alongside Galaxy Z Flip 7

Poco F7 5G Confirmed to Get Snapdragon 8s Gen 4 Chipset Ahead of June 24 Launch

Threads Rolls Out DMs With Message Controls, Inbox Filters for Users Aged 18 and Above

Lumio Arc 5, Arc 7 Projectors Powered by Google TV to Launch in India on July 7

TAGGED:Satellite TechnologySpace TechnologyTechnology
Share This Article
Facebook Twitter Email Print
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Current Gold Rate: 3681.90 INR per gram

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!

    Popular News
    Technology

    Mr Housekeeping Now Streaming on Aha Tamil: Everything You Need to Know

    admin admin March 29, 2025
    Sony Announces Unified Beta Testing Program for PS5 and PC Games, Console Features and More
    Étoile OTT Release Date: When and Where to Watch it Online?
    ChatGPT’s Deep Research Feature Can Now Connect With GitHub Repositories
    Oppo Reno 14 With MediaTek Dimensity 8400 SoC Seen on Geekbench Ahead of Debut
    - Advertisement -
    Ad imageAd image

    Always Stay Up to Date

    Subscribe to our newsletter to get our newest articles instantly!

      About US

      At News.Pratzo.com, we are shaping the conversation in business and technology with reliable insights and updates. As part of the Pratzo.com brand, we aim to be your trusted source for impactful stories and trends, empowering professionals and enthusiasts alike. Stay informed, inspired, and ahead with us!
      Quick Link
      • Automobile
      • News
      • Cricket
      • Lifestyle & Productivity
      • Entertainment
      • Reviews & Comparisons
      • Digital Marketing
      • SEO Trends
      • Technology
      • AI & Machine Learning

      © Flair Hair & Beauty Salon London 2025

      © Pratzo News Network. Assets of Pratzo.com . All Rights Reserved.
      Go to mobile version