Home ai OpenAI Introduces Advanced Voice Mode for ChatGPT, Offering Hyper-Realistic Audio Responses

OpenAI Introduces Advanced Voice Mode for ChatGPT, Offering Hyper-Realistic Audio Responses

OpenAI has announced the rollout of ChatGPT’s Advanced Voice Mode, providing users with access to GPT-4o’s hyper-realistic audio responses. Initially, this feature will be available to a select group of ChatGPT Plus users, with a gradual expansion to all Plus users planned for the fall of 2024.

When OpenAI first showcased GPT-4o’s voice capabilities in May, it amazed audiences with its quick responses and striking resemblance to a real human voice, specifically that of Scarlett Johansson. However, Johansson denied giving permission for her voice to be used and sought legal counsel to protect her likeness. OpenAI claimed that it did not use Johansson’s voice and subsequently removed the voice shown in its demo. In June, OpenAI announced a delay in the release of Advanced Voice Mode to enhance safety measures.

Now, after a month of anticipation, OpenAI has introduced the alpha version of Advanced Voice Mode. However, it is important to note that the video and screensharing capabilities showcased earlier will not be included in this alpha release. Premium users of ChatGPT will now have the opportunity to experience the voice feature demonstrated in the GPT-4o demo.

OpenAI differentiates Advanced Voice Mode from the previous Voice Mode available in ChatGPT. The older version utilized three separate models to convert voice to text, process the prompt, and convert the response into voice. However, GPT-4o is multimodal, capable of handling these tasks internally, resulting in significantly lower latency conversations. Furthermore, OpenAI claims that GPT-4o has the ability to detect emotional intonations in the user’s voice, such as sadness, excitement, or even singing.

Although TechCrunch was unable to test the feature at the time of publishing, ChatGPT Plus users will have the opportunity to witness the remarkable realism of OpenAI’s Advanced Voice Mode. OpenAI plans to release the feature gradually to closely monitor its usage. Users in the alpha group will receive an alert within the ChatGPT app, followed by an email containing instructions on how to utilize the feature.

OpenAI has conducted extensive testing of GPT-4o’s voice capabilities with over 100 external red teamers who speak 45 different languages. The company also plans to release a report on its safety efforts in early August, providing transparency about the measures taken to ensure responsible usage of the technology.

To avoid controversies related to deepfake technology, OpenAI has taken precautions by limiting Advanced Voice Mode to four preset voices: Juniper, Breeze, Cove, and Ember. These voices were created in collaboration with paid voice actors. The Sky voice demonstrated in OpenAI’s May demo is no longer available in ChatGPT. OpenAI spokesperson Lindsay McCallum emphasizes that ChatGPT cannot impersonate other individuals’ or public figures’ voices, and any outputs deviating from the preset voices will be blocked.

OpenAI is also addressing concerns regarding copyright infringement. The company has implemented filters to prevent the generation of copyrighted music or audio. With the rise of AI-powered audio models like GPT-4o, the potential for legal disputes from copyright holders, particularly record labels, has increased. OpenAI aims to minimize the risk of such conflicts by taking proactive measures.

By gradually rolling out Advanced Voice Mode and implementing safety precautions, OpenAI is demonstrating its commitment to responsible and ethical use of AI technology. The company’s efforts to address concerns related to voice impersonation and copyright infringement contribute to building trust and ensuring the long-term viability of AI-powered audio applications.

Exit mobile version