OpenAI Unveils Voice Engine for Synthetic Voice Generation

  • OpenAI unveils Voice Engine, an AI voice generator capable of mimicking human voices with high accuracy, raising concerns about potential misuse.
  • Despite its potential for aiding accessibility services and translation, Voice Engine prompts concerns over misinformation and fraud, with OpenAI emphasizing the need for ethical guidelines and user consent.

OpenAI recently unveiled an artificial intelligence (AI) voice generator which mimics human voices with remarkable accuracy, making the tool suitable for accessibility services but raising potential concerns over misinformation or any form of abuse.

OpenAI showcased some early tests of their Voice Engine tool on Friday. This system employs 15 seconds of samples spoken aloud from someone to generate convincing replicas of their voices in AI. Users then provide paragraphs of text which the tool read out using these synthetic voices.

OpenAI stands out among existing AI voice services as an expert at encouraging widespread acceptance and use. Their chatbot ChatGPT proved particularly adept at this feat.

AI-enhanced text-to-voice tools could aid translation services, reading assistance for children or those regaining the ability to speak again – the company claims. But some are concerned it might also promote disinformation campaigns or allow scammers to more easily commit frauds.

OpenAI states that Voice Engine is currently only being utilized by “a small group of trusted partners,” including education and health technology firms, with tests conducted using these pilot partners determining if and how it should become more widely utilized. According to OpenAI, these testers agree not to recreate people’s voices without first receiving express consent and must clearly indicate to listeners when something being heard is artificial intelligence-generated.

“OpenAI recognizes that creating speech that mimics people’s voices presents serious risks,” according to its blog post. Additionally, they acknowledge a need for major adjustments as AI-generated audio becomes more widely accessible – for instance phasing out voice authentication for bank accounts as soon as possible is one suggestion presented by OpenAI.

“Any large deployment of synthetic voice technology should include voice authentication experiences that verify whether original speakers voluntarily contribute their voices for use by this service, as well as an anti-cloning list to detect and block creation of voices that sound too similar to notable figures,” OpenAI suggests.

Voice Engine can take an audio sample in one language and translate it to multiple other ones to produce a replicate voice that speaks fluently across many others.

On its blog post, they provide an example of human reading a passage about friendship while AI generates audio that sounds similar. Each AI sample maintains tone and accent of its original speaker in each sample produced.

Below are audio samples from OpenAI to demonstrate its Voice Engine in action, specifically using real human speech as input into its tool. Watch as a clip featuring real human voices was inputted into OpenAI’s Voice Engine tool for analysis.


Voice Engine created an AI-generated voice using human samples of speech as well as written guidelines which told it what to say. Here is the resultant AI voice clip created using human samples: (AI-generated voiceclip).


Voice Engine will come as users eagerly await OpenAI’s public release of Sora, its AI-generated video tool first shown off last month. Sora promises realistic 60-second videos from text instructions with multiple characters, specific types of motion and intricate background details; OpenAI also has ChatGPT which generates images using text commands as prompts.

OpenAI also announced on Monday it is making ChatGPT available to everyone without registration – providing easy access for anyone wanting to use its services.

