Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    SpaceX Targets 170 Orbital Launches in 2025, Aims to Set New Industry Benchmark

    May 31, 2025

    Microsoft Reportedly Pauses Xbox Handheld Plans to Refocus on Windows 11 for Portable Gaming

    May 31, 2025

    Perplexity Labs Launches, Automating Spreadsheets, Reports, and Web App Creation

    May 31, 2025
    Facebook X (Twitter) Instagram Pinterest
    EchoCraft AIEchoCraft AI
    • Home
    • AI
    • Apps
    • Smart Phone
    • Computers
    • Gadgets
    • Live Updates
    • About Us
      • About Us
      • Privacy Policy
      • Terms & Conditions
    • Contact Us
    EchoCraft AIEchoCraft AI
    Home»AI»OpenAI Introduces New Audio Models in API for Speech-Based AI Applications
    AI

    OpenAI Introduces New Audio Models in API for Speech-Based AI Applications

    EchoCraft AIBy EchoCraft AIMarch 21, 2025No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    OpenAI has introduced three new artificial intelligence models designed to enhance speech-to-text and text-to-speech capabilities.

    These models—GPT-4o-transcribe, GPT-4o-mini-transcribe, and GPT-4o-mini-tts—are integrated into OpenAI’s API and are aimed at improving accuracy and reliability for real-world applications.

    While they are expected to outperform OpenAI’s previous Whisper models, they are not open-source.

    Advancing AI-Powered Voice Interaction

    The San Francisco-based AI company states that these models align with its broader goal of enabling more intuitive, multimodal AI interactions.

    OpenAI has previously developed AI agents such as Operator, Deep Research, and the Responses API, which focus on automating tasks and streamlining workflows. The latest update expands on this approach, allowing AI to process and generate speech with greater precision.

    One key improvement in the new speech-to-text models is their enhanced performance on the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) benchmark, which assesses AI models across 100 languages.

    OpenAI reports that its latest models demonstrate lower word error rates (WER), particularly in challenging conditions such as background noise, diverse accents, and varying speech speeds.

    Performance Comparison Dashboard

    These advancements are attributed to refined training techniques, including reinforcement learning and expanded datasets.

    Natural-Sounding Text-to-Speech Capabilities

    The GPT-4o-mini-tts model introduces enhancements in text-to-speech synthesis, offering more natural-sounding voices.

    According to OpenAI, the model supports customizable inflections, intonations, and emotional expressiveness, making it suitable for applications like virtual assistants, customer service solutions, and interactive storytelling.

    The model currently offers only preset synthetic voices, without the ability for users to generate custom voices.

    Pricing Structure for Developers

    OpenAI has established different pricing tiers for its new models:

    • GPT-4o-based audio models:
      • $40 per million input tokens
      • $80 per million output tokens
    • GPT-4o-mini models:
      • $10 per million input tokens
      • $20 per million output tokens

    This structured pricing model provides developers with scalable options for integrating AI-driven speech capabilities into their applications.

    Technical Innovations Enhancing Performance

    The new GPT-4o and GPT-4o-mini audio models incorporate several advancements:

    • Pretraining with High-Quality Audio Datasets – Extensive training on diverse speech datasets has improved the models’ ability to handle various audio-related tasks with higher accuracy.
    • Advanced Distillation Techniques – Knowledge transfer from larger models to smaller, more efficient ones enhances performance while preserving conversational realism.
    • Reinforcement Learning Integration – The use of reinforcement learning further refines transcription accuracy, making these models competitive in complex speech recognition scenarios.

    Availability

    The new models are now accessible via OpenAI’s API, with additional support available through the Agents SDK, which helps developers build voice-based AI applications.

    OpenAI plans to continue refining its speech-processing technology, exploring ways to enable more personalized experiences through custom voice capabilities while maintaining ethical AI practices.

    OpenAI is collaborating with policymakers, researchers, developers, and content creators to address the challenges associated with synthetic voice technologies, reinforcing its commitment to responsible AI development.

    AI API OpenAI text-to-speech
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSamsung Expands One UI 7 Rollout to Galaxy S21, S22, and More
    Next Article WhatsApp Developing AI-Powered Rewrite Feature and Two-Way Voice Chats with Meta AI
    EchoCraft AI

    Related Posts

    AI

    Perplexity Labs Launches, Automating Spreadsheets, Reports, and Web App Creation

    May 31, 2025
    AI

    Hugging Face Introduces Two Open-Source Humanoid Robots to Expand Access to Robotics

    May 31, 2025
    AI

    Tencent Releases HunyuanPortrait: Open-Source AI Model for Animating Still Portraits

    May 29, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Search
    Top Posts

    Samsung Galaxy S25 Rumours of A New Face in 2025

    March 19, 2024371 Views

    CapCut Ends Free Cloud Storage, Introduces Paid Plans Starting August 5

    July 12, 2024145 Views

    Windows 12 Revealed A new impressive Future Ahead

    February 29, 2024126 Views
    Categories
    • AI
    • Apps
    • Computers
    • Gadgets
    • Gaming
    • Innovations
    • Live Updates
    • Science
    • Smart Phone
    • Social Media
    • Tech News
    • Uncategorized
    Latest in AI
    AI

    Perplexity Labs Launches, Automating Spreadsheets, Reports, and Web App Creation

    EchoCraft AIMay 31, 2025
    AI

    Hugging Face Introduces Two Open-Source Humanoid Robots to Expand Access to Robotics

    EchoCraft AIMay 31, 2025
    AI

    Tencent Releases HunyuanPortrait: Open-Source AI Model for Animating Still Portraits

    EchoCraft AIMay 29, 2025
    AI

    DeepSeek Releases Updated R1 AI Model on Hugging Face Under MIT License

    EchoCraft AIMay 29, 2025
    AI

    OpenAI Explores “Sign in with ChatGPT” Feature to Broaden Ecosystem Integration

    EchoCraft AIMay 28, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Stay In Touch
    • Facebook
    • YouTube
    • Twitter
    • Instagram
    • Pinterest
    Tags
    2024 Adobe AI AI agents AI Model Amazon android Anthropic apple Apple Intelligence Apps ChatGPT Claude AI Copilot Elon Musk Galaxy S25 Gaming Gemini Generative Ai Google Google I/O 2025 Grok AI India Innovation Instagram IOS iphone Meta Meta AI Microsoft NVIDIA Open-Source AI OpenAI Open Ai PC Reasoning Model Samsung Smart phones Smartphones Social Media TikTok U.S whatsapp xAI Xiaomi
    Most Popular

    Samsung Galaxy S25 Rumours of A New Face in 2025

    March 19, 2024371 Views

    Apple A18 Pro Impressive Leap in Performance

    April 16, 202465 Views

    Google’s Tensor G4 Chipset: What to Expect?

    May 11, 202449 Views
    Our Picks

    Apple Previews Major Accessibility Upgrades, Explores Brain-Computer Interface Integration

    May 13, 2025

    Apple Advances Custom Chip Development for Smart Glasses, Macs, and AI Systems

    May 9, 2025

    Cloud Veterans Launch ConfigHub to Address Configuration Challenges

    March 26, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • Home
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • About Us
    © 2025 EchoCraft AI. All Right Reserved

    Type above and press Enter to search. Press Esc to cancel.

    Manage Consent
    To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
    Functional Always active
    The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
    Preferences
    The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
    Statistics
    The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
    Marketing
    The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
    Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
    View preferences
    {title} {title} {title}