At MOST, we support the Core Principles for Artificial Intelligence Applications as laid out by the Human Artistry Campaign. Widely recognized cornerstones of the ethical use of creative works, also reflected in these core princples, are permission, fair payment and transparency.
Also, we are a boutique studio: we do specific, tailored, creative work – not mass automation.
We have developed a workflow for working with AI-generated voices. There are three ways in which we think AI can be a useful tool and deliver added value, in addition to the voice casting & recording services we offer.
You (or we) use written copy as input for the AI voice generation software and select a preset voiceclone.The AI software generates an audiofile based on the text and the general sound and delivery that comes with the selected voice. If the first result is not to your liking, you can change the text phonetically to generate new takes that might have a different intonation. In audio software, these takes can be combined.
Pro’s
Con’s
This is between the voice talents that have allowed their voices to be cloned for the software, and the software creators. As per EU law, there is an opt-out to prevent the software from using your input data (in this case, text only) to train the model.
This type of AI-generated voices can be helpful if you need something quickly at low cost in situations where ‘OK is good enough’. If you want to be able to fine-tune the results (like you would be able to with a human voice talent), text-to-speech is best avoided.
We start with a recording of the copy by a human voice. The voice can be yourself, someone from our studio or a professional actor. But be aware: the native language from the speaker should be the same as the copy. And the speaker should be experienced, since it’s performance will be the base of the output. The voice character of this recording is then altered by AI.
Pro’s
Con’s
For the recording that is used as an input: usually no buyout, only recording costs (and possibly talent fee). When hiring a professional actor a usage rights license will apply¹.
For the voice characters / voice profiles used: no extra fee, this in arranged between the AI software company
We always use the opt-out clause to make sure the audio we upload into the AI engine is not used for training.
This type of gen-AI voice is a good ‘in-between’ solution, when you need flexibility in the voice character but also want full control over the intonation. This also allows one voice talent to perform multiple roles with different voices.
In addition to the text-to-speech and speech-to-speech, in this approach we also take control of the voice character used for the output. We do this by cloning the voice of a professional voice-actor (with permission, of course). From that moment, we can use text-to-speech and speech-to-speech (with e.g. our own voice as input) to create copy with the sound of that voice. Of course, we still have the option to not use AI and record copy with the VO talent, for instance for very critical applications
Pro’s
Con’s
This should be negotiated with the voice talent beforehand. There will be a fee for the first recordings and after that a license fee for the use of the voice clone, similar to current buyouts. We always use the opt-out clause to make sure the audio we upload into the AI engine is not used for training.
This type of use is good for high-end, critical VO applications where you also need a lot of variations over time, such as ‘brand voice’ applications. It will be cheaper than regular recording sessions with the same VO talent, since less recording sessions are needed.
We continue to closely follow the development of AI in the field of music and sound. We see creative potential as well as ethical, cultural and ecological implications to be considered.
⁰¹ Text-to-speech
⁰² Speech-to-speech
⁰³ Hybrid
Input
Text
Audio recording (yourself, us, or professional actor)
Audio recording, custom voice clone, text
Pro’s
Quick, therefore cheap (if the quick result is sufficient for your needs)
You can do it yourself, or we can do it for you.
Many different voices available (less so in Dutch)
Compared to text-to-speech, much more control over the performance / intonation.
Many different voice types available (especially in English)
Usually very natural sounding
Combines the high-quality results of traditional recording with the flexibility of AI-generated voices
Voice can be specific for a brand or client, even exclusively
Flexible approach: e.g. do new recordings for a campaign with the talent and use AI for later, additional versions
Con’s
Output is generally OK but unpredictable in the details of the performance
Trial-and-error is needed to improve results, with an uncertain outcome
All users of a software package have access to the same AI voice clones, so ‘familiar voices’ can arise
You need to record
Since a recording of the text is the basis, less flexibility in copy compared to text-to-speech
All users of a software package have access to the same AI voice clones, so ‘familiar voices’ can arise
Creating a custom clone takes time in the beginning, but saves time in the end
Not all VO talents might be open to this approach, takes some time and negotiations
Permission & usage rights fees
No fees. This is between the voice talents that have allowed their voices to be cloned for the software, and the software creators.
Opt-out so our (text) input is not used for training.
For the recording that is used as an input: no buyout, only recording costs (and possibly talent fee).
When hiring a professional actor a usage rights license will apply.
For the AI voice profiles used: no extra fee,
Opt-out so our (audio) input is not used for training.
To be be negotiated with the voice talent beforehand. There will be a fee for the first recordings and after that a license fee for the use of the voice clone, similar to current buyouts.
Opt-out so our (text and audio) input is not used for training.
When to use
“When OK is good enough” – Guide VO, instructions, if text interpretation is not important.
Mid-level projects and/or when you need multiple voice characters on a budget.
High-end projects with lots of variations over time (for one-off projects, just record a human).
¹ Usage rights licenses on request per usecase.
Our website uses cookies to improve your experience and gather analytics. By clicking ‘Accept’, you agree to our use of cookies.