By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Pratzo - Daily NewsPratzo - Daily NewsPratzo - Daily News
Notification Show More
Font ResizerAa
  • Technology
    • AI & Machine Learning
    • Software & Apps
    • Hardware & Gadgets
    Technology
    Show More
    Top News
    Thandel OTT Release Date: When and Where to Watch Naga Chaitanya, Sai Pallavi’s Film Online?
    March 5, 2025
    SpaceX Successfully Deploys 21 Starlink Satellites, Loses Falcon 9 Booster
    March 6, 2025
    Mivi SuperPods Concerto TWS Earphones With Up to 60 Hours Total Battery Life Launched in India
    March 6, 2025
    Latest News
    Grammarly Announces Plans to Acquire Email App Superhuman to Create Agentic Productivity Platform
    July 2, 2025
    Amazon Prime Day 2025 Sale: Discounts on Electronics and Bank Offers Revealed
    July 2, 2025
    WWE 2K25 Launches on Nintendo Switch 2 This Month, Pre-Orders Now Live
    July 2, 2025
    Ferrari Amalfi Unveiled With Twin-Turbo V8 Engine, 320 KMPH Claimed Top Speed
    July 2, 2025
  • Digital Marketing
    • Social Media Updates
    • PPC & Ads Insights
    • SEO Trends
    • Content Marketing Strategies
    Digital MarketingShow More
    70% of Senior Marketers Support Google’s Decision to Retain Third-Party Cookies on Chrome
    December 6, 2024
  • Lifestyle & Productivity
    • Personal Productivity Tools
    • Smart Home Tech
    • Wearables
    • Wellness Gadgets
    Lifestyle & ProductivityShow More
    Allu Arjun’s Bail Hearing Postponed to January 3
    December 31, 2024
    Pushpa 2 Full Movie Leaked Online
    Pushpa 2 Full Movie Leaked Online: A Major Setback Despite Record Pre-Sales
    December 5, 2024
    Pushpa 2: The Rule Movie Review – A Gripping Mass Entertainer
    December 5, 2024
  • Automobile
    AutomobileShow More
    New Petrol Price in India: Crude Oil Prices Fall – Check Today’s Rates
    January 25, 2025
    All-New Honda Amaze 2025 Launched in India – Prices Start at ₹7.99 Lakh
    December 5, 2024
    Mahindra XEV 9e Launched In India Priced At ₹ 21.90 Lakh: Check Range, Features, and More
    November 27, 2024
Reading: OpenAI’s o3 AI Model Falls Short of Benchmark Claims in FrontierMath Test
Share
Font ResizerAa
Pratzo - Daily NewsPratzo - Daily News
Search
Follow US
Pratzo - Daily News > Technology > OpenAI’s o3 AI Model Falls Short of Benchmark Claims in FrontierMath Test
Technology

OpenAI’s o3 AI Model Falls Short of Benchmark Claims in FrontierMath Test

admin
Last updated: April 21, 2025 2:02 pm
admin Published April 21, 2025
Share
SHARE

OpenAI’s o3 artificial intelligence (AI) model, which was released last week, is underperforming on a specific benchmark. Epoch AI, the company behind the FrontierMath benchmark, highlighted that the publicly available version of the o3 AI model scored 10 percent on the test, a much lower value than the company’s claim at launch. The San Francisco-based AI firm’s chief research officer, Mark Chen, had said that the model scored 25 percent on the test, creating a new record. However, the discrepancy does not mean that OpenAI lied about the metric.

OpenAI’s o3 AI Model Scores 10 Percent on FrontierMath

In December 2024, OpenAI held a livestream on YouTube and other social media platforms, announcing the o3 AI model. At the time, the company highlighted the improved set of capabilities in the large language model (LLM), in particular, its improved performance in reasoning-based queries.

One of the ways the company exemplified the claim was by sharing the model’s benchmark scores across different popular tests. One of these tests was FrontierMath, created by Epoch AI. The mathematical test is known for being challenging and tamper-proof, as more than 70 mathematicians developed the test, and the problems are all new and unpublished. Notably, till December, no AI model has solved more than nine percent of the questions in a single attempt.

However, at the time of launch, Chen claimed that o3 was able to set a new record by scoring 25 percent on the test. External verification of the performance was not possible at the time, as the model was not available in the public domain. After o3 and o4-mini were launched last week, Epoch AI made a post on X (formerly known as Twitter), claiming that the o3 model, in fact, scored 10 percent on the test.

While a score of 10 percent also makes the AI model the highest ranking on the test, the number is less than half of what the company claimed. The post has led to several AI enthusiasts talking about the validity of the benchmark scores.

The discrepancy does not mean that OpenAI lied about the performance of its AI model. Instead, the AI firm’s unreleased model likely used higher compute to get that score. However, the commercial version of the model was likely fine-tuned to be more power efficient, and in that process, some of its performance was toned down.

Separately, ARC Prize, an organisation behind the ARC-AGI benchmark test, which tests an AI model’s general intelligence, also posted on X about the discrepancy. The post confirmed, “The released o3 is a different model from what we tested in December 2024.” The company claimed that the released o3 model’s compute tiers are smaller than the version it tested. However, it did confirm that o3 was not trained on ARC-AGI data, even at the pre-training stage.

ARC Prize said that it will re-test the released o3 AI model and publish the updated results. The company will also re-test the o4-mini model, and label the prior scores as “preview”. It is not certain that the released version of o3 will underperform on this test as well.

source

You Might Also Like

Grammarly Announces Plans to Acquire Email App Superhuman to Create Agentic Productivity Platform

Amazon Prime Day 2025 Sale: Discounts on Electronics and Bank Offers Revealed

WWE 2K25 Launches on Nintendo Switch 2 This Month, Pre-Orders Now Live

Ferrari Amalfi Unveiled With Twin-Turbo V8 Engine, 320 KMPH Claimed Top Speed

Honor X9c 5G India Launch Set for July 7; Key Specifications and Colour Options Revealed

TAGGED:Satellite TechnologySpace TechnologyTechnology
Share This Article
Facebook Twitter Email Print
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Current Gold Rate: 3681.90 INR per gram

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!

    Popular News
    Technology

    Samsung Galaxy M36 5G Launched in India With Exynos 1380 SoC, 5,000mAh Battery

    admin admin June 28, 2025
    Oppo A5 Pro 4G With Snapdragon 6s 4G Gen 1 SoC, 5,800mAh Battery Launched: Price, Specifications
    SpaceX Starship Prepares for Next Flight After Successful Static Fire Tests
    Urban Harmonic 2080 2.1 Channel Soundbar With 80W Output, 3D Surround Sound Launched in India
    Vi Introduces New International Roaming Packs for Gulf Region With Unlimited Incoming Calls
    - Advertisement -
    Ad imageAd image

    Always Stay Up to Date

    Subscribe to our newsletter to get our newest articles instantly!

      About US

      At News.Pratzo.com, we are shaping the conversation in business and technology with reliable insights and updates. As part of the Pratzo.com brand, we aim to be your trusted source for impactful stories and trends, empowering professionals and enthusiasts alike. Stay informed, inspired, and ahead with us!
      Quick Link
      • Automobile
      • News
      • Cricket
      • Lifestyle & Productivity
      • Entertainment
      • Reviews & Comparisons
      • Digital Marketing
      • SEO Trends
      • Technology
      • AI & Machine Learning

      © Flair Hair & Beauty Salon London 2025

      © Pratzo News Network. Assets of Pratzo.com . All Rights Reserved.
      Go to mobile version