AI Voices

Before we dive into it

At MOST, we support the Core Principles for Artificial Intelligence Applications as laid out by the Human Artistry Campaign. Widely recognized cornerstones of the ethical use of creative works, also reflected in these core princples, are permission, fair payment and transparency.

Also, we are a boutique studio: we do specific, tailored, creative work – not mass automation.

How does generative AI fit into that?

We have developed a workflow for working with AI-generated voices. There are three ways in which we think AI can be a useful tool and deliver added value, in addition to the voice casting & recording services we offer.

How it works

You (or we) use written copy as input for the AI voice generation software and select a preset voiceclone.The AI software generates an audiofile based on the text and the general sound and delivery that comes with the selected voice. If the first result is not to your liking, you can change the  text phonetically to generate new takes that might have a different intonation. In audio software, these takes can be combined.

Pro’s

Con’s

Permission & fair payment

This is between the voice talents that have allowed their voices to be cloned for the software, and the software creators. As per EU law, there is an opt-out to prevent the software from using your input data (in this case, text only) to train the model.

When to use

This type of AI-generated voices can be helpful if you need something quickly at low cost in situations where ‘OK is good enough’. If you want to be able to fine-tune the results (like you would be able to with a human voice talent), text-to-speech is best avoided.

How it sounds
How it works

We start with a recording of the copy by a human voice. The voice can be yourself, someone from our studio or a professional actor. But be aware: the native language from the speaker should be the same as the copy. And the speaker should be experienced, since it’s performance will be the base of the output. The voice character of this recording is then altered by AI.

Pro’s

Con’s

Permission & fair payment

For the recording that is used as an input: usually no buyout, only recording costs (and possibly talent fee). When hiring a professional actor a usage rights license will apply¹.
For the voice characters / voice profiles used: no extra fee, this in arranged between the AI software company
We always use the opt-out clause to make sure the audio we upload into the AI engine is not used for training.

When to use

This type of gen-AI voice is a good ‘in-between’ solution, when you need flexibility in the voice character but also want full control over the intonation. This also allows one voice talent to perform multiple roles with different voices.

How it sounds
How it works

In addition to the text-to-speech and speech-to-speech, in this approach we also take control of the voice character used for the output. We do this by cloning the voice of a professional voice-actor (with permission, of course). From that moment, we can use text-to-speech and speech-to-speech (with e.g. our own voice as input) to create copy with the sound of that voice. Of course, we still have the option to not use AI and record copy with the VO talent, for instance for very critical applications

Pro’s

Con’s

Permission & fair payment

This should be negotiated with the voice talent beforehand. There will be a fee for the first recordings and after that a license fee for the use of the voice clone, similar to current buyouts. We always use the opt-out clause to make sure the audio we upload into the AI engine is not used for training.

When to use

This type of use is good for high-end, critical VO applications where you also need a lot of variations over time, such as ‘brand voice’ applications. It will be cheaper than regular recording sessions with the same VO talent, since less recording sessions are needed.

How it sounds
Thoughts on using (generative) AI for music and sound

We continue to closely follow the development of AI in the field of music and sound. We see creative potential as well as ethical, cultural and ecological implications to be considered.

Overview

⁰¹ Text-to-speech

⁰² Speech-to-speech

⁰³ Hybrid

Input

Text

Audio recording (yourself, us, or professional actor)

Audio recording, custom voice clone, text

Pro’s

Quick, therefore cheap (if the quick result is sufficient for your needs)

You can do it yourself, or we can do it for you.

Many different voices available (less so in Dutch)

Compared to text-to-speech, much more control over the performance / intonation.

Many different voice types available (especially in English)

Usually very natural sounding

Combines the high-quality results of traditional recording with the flexibility of AI-generated voices

Voice can be specific for a brand or client, even exclusively

Flexible approach: e.g. do new recordings for a campaign with the talent and use AI for later, additional versions

Con’s

Output is generally OK but unpredictable in the details of the performance

Trial-and-error is needed to improve results, with an uncertain outcome

All users of a software package have access to the same AI voice clones, so ‘familiar voices’ can arise

You need to record

Since a recording of the text is the basis, less flexibility in copy compared to text-to-speech

All users of a software package have access to the same AI voice clones, so ‘familiar voices’ can arise

Creating a custom clone takes time in the beginning, but saves time in the end

Not all VO talents might be open to this approach, takes some time and negotiations

Permission & usage rights fees

No fees. This is between the voice talents that have allowed their voices to be cloned for the software, and the software creators.

Opt-out so our (text) input is not used for training.

For the recording that is used as an input: no buyout, only recording costs (and possibly talent fee). 

When hiring a professional actor  a usage rights license will apply.

For the AI voice profiles used: no extra fee,

Opt-out so our (audio) input is not used for training.

To be be negotiated with the voice talent beforehand. There will be a fee for the first recordings and after that a license fee for the use of the voice clone, similar to current buyouts.

Opt-out so our (text and audio) input is not used for training.

When to use

“When OK is good enough” – Guide VO, instructions, if text interpretation is not important.

Mid-level projects and/or when you need multiple voice characters on a budget.

High-end projects with lots of variations over time (for one-off projects, just record a human).

Footnote

¹ Usage rights licenses on request per usecase.

We use cookies

Our website uses cookies to improve your experience and gather analytics. By clicking ‘Accept’, you agree to our use of cookies.