Tech News : Wait For It …The OpenAI Voice Cloning Tool

OpenAI has announced the preview of its (two years in the making) ‘Voice Engine’ voice cloning tool, although there’s no firm release date yet.

What Can It Do? 

OpenAI says Voice Engine uses “text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.”  OpenAI says this “small model” with a single 15-second sample can create “emotive and realistic voices.” 

Two Years On 

Voice Engine was first developed almost 2 years ago in late 2022, since then it’s been used to power the preset voices available in the text-to-speech API and ChatGPT Voice and Read Aloud. ChatGPT Voice is the feature that enables ChatGPT to use voice commands and AI to speak its responses. OpenAI’s text-to-speech (TTS) API is the service that converts text into natural-sounding speech, i.e. it uses AI models to produce speech that closely mimics human voices.

Being Cautious 

Although the voice cloning tool has been powering other aspects of OpenAI’s voice command and text-to-speech features for almost two years, the announcement of Voice Engine itself has been delivered with more than a hint of caution about it. For example, OpenAI’s announcement about Voice Engine says it’s just “preliminary insights and results from a small-scale preview.” Also, OpenAI admits it is deliberately taking a “cautious and informed approach to a broader release” which it says is because of the “potential for synthetic voice misuse” (e.g. deepfakes) and using convincing fake audio recordings for fraudulent purposes, impersonation, or spreading misinformation.

OpenAI says that it recognises that generating speech that resembles people’s voices “has serious risks, which are especially top of mind in an election year” and is “engaging with U.S. and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build.“ 

Also, testing partners for Voice Engine have had to agree to usage policies that prohibit the impersonation of another individual or organisation without consent or legal right. OpenAI is also asking partners to get explicit and informed consent from the original speaker and to disclose to their audience that the voices they’re hearing are AI-generated.

To enable OpenAI to monitor and enforce these policies and requirements, OpenAI says it’s implemented a set of safety measures, which include “watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it’s being used.“ 

What Now? 

Although OpenAI wants to announce the fact that it has developed a powerful AI voice cloning tool, it wants to temper the disappointment about not releasing it yet by highlighting a few positive uses for Voice Engine. For example, in its recent announcement about Voice Engine, OpenAI listed how it could be used to :

  • Provide reading assistance to non-readers and children
  • Translate content like videos and podcasts (for creators and businesses)
  • Support people who are non-verbal (therapeutic applications).

OpenAI also highlights how Voice Engine could prove extremely useful for patients recovering their voice or for those people suffering from sudden or degenerative speech conditions, and for improving essential service delivery in remote settings, thereby reaching global communities.

What Does This Mean For Your Business? 

With this being a very important election year for at least 64 countries (including the US, UK and India), each of the large AI companies are very reluctant to be named as the one that allowed misuse of their AI products and/or didn’t take the right precautions to prevent misuse. For example, just as Google has put restrictions on what its Gemini AI model will answer about elections for fear of it being misused, OpenAI has decided now is not the right time, without the right protections in place, to release its two years in the making voice cloning tool.

OpenAI, therefore, is happy to let the world and OpenAI’s competitors know that it has an advanced AI ‘Voice Engine’ in the pipeline, but it isn’t prepared to take the risk of the tool and the company’s name being tarnished by misuse within the global arena of elections. It’s likely that we’ll see much more of this caution being exercised by AI companies releasing new features and products, particularly this year.

For businesses and organisations, plus those in the health/therapy sectors hoping to make use of the powerful, value-adding capabilities of Voice Engine, it’s a case of waiting a bit longer. The danger, however, in the fast-moving field of AI is that while time passes (as testing and safety policies are being put in place), another competitor with a new or updated existing powerful voice cloning tool may be released during the meantime, thereby stealing some of Voice Engine’s thunder.

Even when Voice Engine is regarded to be safe to release, this won’t guarantee attempts by bad actors to misuse it, so it will be interesting to see whether it’s as well protected as OpenAI says it will be and what users are able to produce with it. Ultimately, OpenAI will want to get this tool out there, being used by as many people as possible as soon as possible – pending this period of caution.

Tech Insight : Voice Changers

In this insight, we look at ‘voice changers’, technology that lets users change and disguise their voices in real time and in playback.

What Do They Do & How? 

Voice changing software (and hardware) alter the sound of a person’s voice in real-time or during playback. Voice changers can be implemented through software applications installed on computers, smartphones, or dedicated devices. Some voice chat programs or messaging apps may also include built-in voice modulation features. The latest voice changers use AI to help change and disguise voices.

Voice changers generally work by applying various effects, such as pitch shifting, modulation, distortion, echo, and other audio manipulations. There are free and paid-for Pro versions (often available on the same platform).

What Are They Used For? 

Voice changers can be used for various purposes, including entertainment, anonymity, and practical applications. Here are a few common use cases:

– Entertainment. Voice changers are often employed in the entertainment industry for creating unique character voices in movies, TV shows, and video games. They enable actors and voice artists to transform their voices to match specific roles or enhance their performance.

– Gaming. In the gaming community, voice changers are popular for online multiplayer games. Players can modify their voices to match the characters they are portraying, adding an extra layer of immersion and fun to the gaming experience.

– Anonymity. Voice changers can be used to conceal one’s identity during online communication. By altering their voice, individuals can protect their privacy and maintain anonymity during voice chats, online gaming, or other virtual interactions.

– Pranks and humour. Voice changers are often used for practical jokes or comedic purposes. They allow users to mimic famous personalities, create funny sound effects, or generate exaggerated voices for amusement.

– Accessibility. Voice changers can also serve as assistive technology, helping individuals with speech impairments or conditions such as laryngitis to communicate more effectively. By modifying their voice, they can overcome communication barriers and make their speech more understandable.

What’s The Difference Between Voice Generators and Voice Changers?

AI voice changers modify an existing voice, while AI voice generators synthesise new voices from scratch. Voice changers alter and manipulate the characteristics of an input voice, whereas voice generators create entirely new voices using artificial intelligence techniques. That said, some AI voice sites offer both voice changers and voice generators.

Examples of AI Voice Changers 

AI voice changers employ artificial intelligence techniques to modify and transform voices. Here are a few examples of AI-based voice changers:

Voice.ai (https://voice.ai/) is a free, real-time voice changer where users can choose from 1000s of different voices in its ‘Voice Universe’ UGC voice library, or use the AI Voice changer to change their voice in real-time across different apps, e.g. Zoom, Discord, Minecraft, GTA5, Fortnite, Valorant, League of Legends, Among Us, Skype, WhatsApp, Teamspeak and more. Voice.ai does, however, have a low Trustpilot rating which appears to be based around several issues including the quality of results, privacy, subscriptions and more.

Lyrebird (https://www.descript.com/lyrebird): Canada-based Lyrebird (AI research division of Descript) is an AI voice synthesis platform that can mimic and imitate human voices. It allows users to generate speech in different voices by training the AI model on a small voice sample. The resulting synthesised voice can be used to alter the speaker’s voice or create entirely new voices.

Modulate.ai (https://www.modulate.ai/). Modulate.ai is an AI-driven voice modulation platform that enables real-time voice changing in video games and virtual reality. It uses machine learning algorithms to transform the user’s voice into various character voices, enhancing the gaming experience and adding immersion.

Replica Studios (https://replicastudios.com/). Replica Studios offers an AI-powered voice synthesis platform that allows users to generate voices for characters, voice-overs, and other applications. The platform utilises AI models to create realistic and customisable voice effects, enabling voice actors to modify their voices to match specific requirements.

Resemble AI (https://www.resemble.ai/). Resemble AI provides an AI-based voice cloning and voiceover platform. It uses deep learning techniques to analyse and recreate a person’s voice based on a few minutes of recorded audio. This technology enables voice actors, content creators, and companies to generate personalised voices for different applications. Resemble AI’s voice changer allows users to use speech-to-speech to apply their voice’s cadence to an AI voice of their choice.

Respeecher (https://www.respeecher.com/). Although it is known as a voice cloning platform, it can transform input voices into a range of output voices.

VoiceMod (https://www.voicemod.net/ai-voices/). This real-time voice changer provides 10+ (and soon many more) different voices and a large number of voice effects for users. It focuses on users in the entertainment industry – games, movie character voices, streaming, and more.

What Does This Mean For Your Business? 

The advent of AI voice changers has ushered in a new era of opportunities for businesses and individuals alike. These innovative technologies have not only opened up new avenues for creativity and entertainment but have also resulted in significant cost and time savings. With free voice changers, voice cloning tools, and other advanced AI tools becoming accessible to everyone, businesses can harness their potential across different industries.

One of the key advantages of AI voice changers is their adaptability. They offer a wide range of uses, allowing businesses to leverage them for various purposes. In the entertainment industry, voice changers enable actors and voice artists to transform their voices, creating unique character voices for movies, TV shows, and video games. Gamers can immerse themselves in virtual worlds by modifying their voices to match their characters. Voice changers also offer a layer of anonymity for online communications, ensuring privacy and security.

Moreover, AI voice generators have revolutionised voice synthesis, enabling businesses to generate artificial voices that sound natural and human-like. This has opened up avenues for virtual assistants, audiobooks, voiceovers, and other applications where synthesised speech is needed. The ability to create personalised voices using AI models has provided content creators, voice actors, and companies with powerful tools to enhance their projects and engage audiences effectively.

Risks

However, as with any technological advancement, there are risks associated with the use of AI tools. The accessibility of AI voice changers and related technologies means that some individuals may employ them for malicious purposes, such as impersonation, fraud, or spreading disinformation. This calls for responsible use and ethical considerations in utilising these tools.

The rise of AI voice changers has brought about unprecedented opportunities for businesses and individuals. These tools have unlocked creative potential, saved time and costs, and opened doors to new experiences across industries. However, it is crucial to be aware of the risks and ensure responsible use of AI tools to mitigate any potential misuse. By harnessing the power of AI voice changers in a responsible manner, businesses can explore new horizons and leverage the transformative capabilities of this technology, from marketing to training & onboarding for example.