Unleashing the Power of ChatGPT Advanced Voice Mode: Impersonations, Language Instruction, and More

ChatGPT Advanced Voice Mode: A Game-Changer in Conversational AI

The long-awaited ChatGPT Advanced Voice Mode from OpenAI has finally arrived, but only a select few customers in the “alpha” group have access to it. Despite the criticism it faced from Scarlett Johansson and the delay of over a month, initial alpha testers are already sharing examples of the impressive capabilities of this new feature.

1. Language instruction and translation:
One of the standout features of ChatGPT Advanced Voice Mode is its ability to provide interactive language instruction. Users on X have noted that popular language learning app Duolingo might face tough competition from this mode. It can offer customized instruction tailored to individuals learning or practicing a new language, making it a potential game-changer in language education.

2. Multimodal capabilities:
Powered by OpenAI’s new GPT-4o model, Advanced Voice Mode is the company’s first natively multimodal large model. Unlike its predecessor, GPT-4, which relied on other domain-specific models, GPT-4o can handle vision and audio inputs and outputs without external support. This means that the mode can use the user’s phone camera to provide visual information. For example, it can translate screens from different languages, as demonstrated by McGill University instructor Manuel Sainsily.

3. Humanlike utterances:
Cristiano Giardina, an Italian-American AI writer, has shared several examples of the new ChatGPT Advanced Voice Mode. In one viral demo, he asked the mode to count up to 50 faster and faster, and it even stopped to catch its breath near the end. This showcases the mode’s ability to mimic human-like expressions and natural speaking patterns, making it incredibly impressive.

4. Beatboxing and audio storytelling:
Startup founder Ethan Sutin showcased how ChatGPT Advanced Voice Mode can beatbox fluidly and convincingly, resembling a human MC. Additionally, the mode can engage in audio storytelling and roleplaying. Users can ask it to play along and create fictitious scenarios, like going back in time to Ancient Rome. Its playful vocal tone, complete with vocal style changes and laughter at its own jokes, adds a unique touch to the experience.

5. Accents and character impersonations:
Another fascinating aspect of ChatGPT Advanced Voice Mode is its ability to mimic accents. Giardina demonstrated how it can imitate various regional British accents, as well as impersonate a soccer commentator across languages. Sutin also showcased its attempts to reproduce different U.S. regional accents. Moreover, the mode can imitate fictional characters, further highlighting its versatility.

As OpenAI plans to roll out ChatGPT Advanced Voice Mode to all paying ChatGPT Plus subscribers by the fall, questions arise about its practical applications beyond fun demos and experiments. Will it make ChatGPT more useful and appealing to a wider audience? Could it potentially lead to an increase in audio-based scams? Only time will tell as the company expands access and gathers more feedback from users.

In the meantime, if you want to stay updated on the latest AI advancements, be sure to subscribe to our daily and weekly newsletters for exclusive content and industry-leading coverage. Don’t miss out on the future of conversational AI!