OpenAI just announced that it recently conducted a small-scale preview of a new tool called Voice Engine. This is a voice cloning technology that can mimic any speaker by analyzing a 15-second audio sample. The company says it generates “natural-sounding speech” with “emotive and realistic voices.”
The technology is based on the company’s pre-existing text-to-speech API and it has been in the works since 2022. OpenAI has already been using a version of the toolset to power the preset voices available in the current text-to-speech API and the Read Aloud feature. There are a bunch of samples on the company’s official blog and they sound eerily close to the real thing. I encourage you to give them a listen and imagine the possibilities, both good and bad.
OpenAI says they see this technology being useful for reading assistance, language translation and helping those who suffer from sudden or degenerative speech conditions. The company brought up a Brown University pilot progra