Close Menu
  • Home
  • Leadership Alliance
  • Exclusives
  • History of the Internet
  • AFRINIC News
  • Internet Governance
    • Regulations
    • Governance Bodies
    • Emerging Tech
  • Others
    • IT Infrastructure
      • Networking
      • Cloud
      • Data Centres
    • Company Stories
      • Profile
      • Startups
      • Tech Titans
      • Partner Content
    • Fintech
      • Blockchain
      • Payments
      • Regulations
    • Tech Trends
      • AI
      • AR / VR
      • IoT
    • Video / Podcast
  • Country News
    • Africa
    • Asia Pacific
    • North America
    • Lat Am/Caribbean
    • Europe/Middle East
Facebook LinkedIn YouTube Instagram X (Twitter)
Blue Tech Wave Media
Facebook LinkedIn YouTube Instagram X (Twitter)
  • Home
  • Leadership Alliance
  • Exclusives
  • History of the Internet
  • AFRINIC News
  • Internet Governance
    • Regulation
    • Governance Bodies
    • Emerging Tech
  • Others
    • IT Infrastructure
      • Networking
      • Cloud
      • Data Centres
    • Company Stories
      • Profiles
      • Startups
      • Tech Titans
      • Partner Content
    • Fintech
      • Blockchain
      • Payments
      • Regulation
    • Tech Trends
      • AI
      • AR/VR
      • IoT
    • Video / Podcast
  • Africa
  • Asia-Pacific
  • North America
  • Lat Am/Caribbean
  • Europe/Middle East
Blue Tech Wave Media
Home » OpenAI Is Now Capable of Voice and Image Recognition
btw-media
AI

OpenAI Is Now Capable of Voice and Image Recognition

By Bal MarsiusSeptember 26, 2023Updated:November 22, 2023No Comments3 Mins Read
Share
Facebook Twitter LinkedIn Pinterest Email

Image credit: Rawpixel via Freepik 

OpenAI has introduced a series of game-changing enhancement, including two standout features: voice interaction and image recognition.

Literally Chatting with ChatGPT

One of the most significant upgrades is the addition of voice interaction to ChatGPT which allows users to engage in spoken conversations with the AI. Choose from a selection of five lifelike synthetic voices, each designed to provide a natural conversational experience. It’s like having a real-time phone conversation with a chatbot, with ChatGPT responding to your spoken questions promptly.

The underlying technology relies on two distinct models. OpenAI’s Whisper, a pre-existing speech-to-text model, converts spoken words into text, which is then fed to ChatGPT. Conversely, a new text-to-speech model transforms ChatGPT’s responses into spoken language.

During a recent demonstration, Joanne Jang, a product manager at OpenAI, showcased the range of synthetic voices. These voices were meticulously crafted through training the text-to-speech model on the voices of hired actors. OpenAI even envisions a future where users can create their own custom voices. The primary criterion for crafting these voices was ensuring that they are pleasant and easy to listen to.

This advancement extends beyond ChatGPT, as OpenAI is sharing its text-to-speech model with other companies, including Spotify. Spotify, for instance, is using this synthetic voice technology to translate celebrity podcasts into multiple languages using synthetic versions of the podcasters’ voices.

Image Recognition Now Possible

Another groundbreaking addition to ChatGPT is image recognition. This feature, which OpenAI had teased with the introduction of GPT-4, now allows users to upload images to the app and query it about the content of those images. This means you can ask ChatGPT questions about visual content.

In a practical demonstration, Raul Puri, a scientist working on GPT-4, uploaded a photo of a math homework problem and asked ChatGPT for a solution. Impressively, ChatGPT provided the correct steps. Users have also employed this feature for troubleshooting technical issues by uploading screenshots and seeking guidance.

Moreover, ChatGPT’s image recognition capability has been used by Be My Eyes, an app designed to assist individuals with impaired vision. Users can upload images and ask the chatbot to describe them, offering a new level of independence.

However, OpenAI is acutely aware of the potential risks of these updates, especially when combining different AI models. For instance, users cannot inquire about photos containing private individuals. The company acknowledges the need for vigilance in preventing misuse and is committed to safeguarding both users and non-users from harm.

Challenges Ahead for ChatGPT

These updates mark the rapid evolution of OpenAI’s experimental models into practical products. ChatGPT Plus, the premium version of the app, combines GPT-4 and DALL-E, making it a formidable competitor to voice assistants like Siri, Google Assistant, and Alexa. What was once accessible only to select software developers is now available to everyone for a monthly subscription of $20.

As ChatGPT expands its capabilities to “see, hear, and speak,” there are challenges to consider. Voice recognition may pose accessibility issues for individuals with non-mainstream accents. Additionally, synthetic voices carry social and cultural implications that require further exploration.

OpenAI, however, asserts that it has addressed the major concerns and believes that these updates are safe for release. The journey to refine and expand AI capabilities continues, with ChatGPT leading the way. While there are certainly challenges and questions to address, this latest update represents a significant step toward creating more powerful and interactive AI assistants.

Bal Marsius

Bal was BTW's copywriter specialising in tech and productivity tools. He has experience working in startups, mid-size tech companies, and non-profits.

Related Posts

AT&T launches internal AI assistant for employees

November 12, 2025

Samsung honoured for AI and security breakthroughs at CES 2026

November 6, 2025

Google’s ‘Big Sleep’ AI uncovers 5 open-source cyber threats

November 5, 2025
Add A Comment
Leave A Reply Cancel Reply

CATEGORIES
Archives
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023

Blue Tech Wave (BTW.Media) is a future-facing tech media brand delivering sharp insights, trendspotting, and bold storytelling across digital, social, and video. We translate complexity into clarity—so you’re always ahead of the curve.

BTW
  • About BTW
  • Contact Us
  • Join Our Team
  • About AFRINIC
  • History of the Internet
TERMS
  • Privacy Policy
  • Cookie Policy
  • Terms of Use
Facebook X (Twitter) Instagram YouTube LinkedIn
BTW.MEDIA is proudly owned by LARUS Ltd.

Type above and press Enter to search. Press Esc to cancel.