AI Voices

Before we dive into it

At MOST, we support the Core Principles for Artificial Intelligence Applications as laid out by the Human Artistry Campaign. Widely recognized cornerstones of the ethical use of creative works, also reflected in these core princples, are permission, fair payment and transparency.

Also, we are a boutique studio: we do specific, tailored, creative work – not mass automation.

How does generative AI fit into that?

If used responsibly and with care, AI-voices can sometimes be a welcome addition to our palette. Below, we outline three ways in which we think AI can be a useful tool and deliver added value, in addition to the voice casting & recording services we offer.

Prompt responsibly

AI has a huge environmental impact. If you use AI for voice generation, finalise scripts as much as possible before you generate. Think first, AI later.

How it works

You (or we) use written copy as input for the AI voice generation software and select a preset voice clone. The AI software generates an audiofile based on the text and the general sound and delivery that comes with the selected voice. If the first result is not to your liking, you can change the text phonetically to generate new takes that might have a different intonation. In audio software, these takes can be combined.

Pro’s

Con’s

Permission & fair payment

This is between the voice talents that have allowed their voices to be cloned for the software, and the software creators. As per EU law, there is an opt-out to prevent the software from using your input data (in this case, text only) to train the model.

When to use

This type of AI-generated voices can be helpful if you need something quickly at low cost in situations where ‘OK is good enough’. If you want to be able to fine-tune the results (like you would be able to with a human voice talent), text-to-speech is best avoided.

How it sounds

How it works

We start with a recording of the copy by a human voice. The voice can be yourself, someone from our studio or a professional actor. But be aware: the native language from the speaker should be the same as the copy. And the speaker should be experienced, since it’s performance will be the base of the output. The voice character of this recording is then altered by AI.

Pro’s

Con’s

Permission & fair payment

For the recording that is used as an input: usually no buyout, only recording costs (and possibly talent fee). When hiring a professional actor a usage rights license will apply¹.
For the voice characters / voice profiles used: no extra fee, this in arranged between the AI software company
We always use the opt-out clause to make sure the audio we upload into the AI engine is not used for training.

When to use

This type of gen-AI voice is a good ‘in-between’ solution, when you need flexibility in the voice character but also want full control over the intonation. This also allows one voice talent to perform multiple roles with different voices.

How it sounds (Dutch)

How it works

In addition to the text-to-speech and speech-to-speech, in this approach we also take control of the voice character used for the output. We do this by cloning the voice of a professional voice-actor (with permission, of course). From that moment, we can use text-to-speech and speech-to-speech (with e.g. our own voice as input) to create copy with the sound of that voice. Of course, we still have the option to not use AI and record copy with the VO talent, for instance for very critical applications

Pro’s

Con’s

Permission & fair payment

This should be negotiated with the voice talent beforehand. There will be a fee for the first recordings and after that a license fee for the use of the voice clone, similar to current buyouts. We always use the opt-out clause to make sure the audio we upload into the AI engine is not used for training.

When to use

This type of use is good for high-end, critical VO applications where you also need a lot of variations over time, such as ‘brand voice’ applications. It will be cheaper than regular recording sessions with the same VO talent, since less recording sessions are needed.

How it sounds (Dutch)

How we like to use AI-generated voices — use case for Het Utrechts Archief

We like to use AI voices to extend our possibilities and make the good, even better. There are many creative ways to use AI voices, but not all of them sound good or communicate with real emotion. The words are there, but the story isn’t told. In this Note, we talk about a special tailored workflow for Het Utrechts Archief, in which we combine voice acting and AI speech-to-speech to create a whole exhibition.

Overview

⁰¹ Text-to-speech

⁰² Speech-to-speech

⁰³ Hybrid

Input

Text

Audio

Audio and/or text

Pro’s

Quick

Cheap

You can do it yourself, and we can do it for you

Many different voices available (less so in Dutch)

More control over the performance and intonation

Many different voices available (less so in Dutch)

Natural sounding

Combines traditional, high-quality recordings with flexibility of AI-generated voices

The voice can be specific and exclusive for a brand or client

Highly flexible – start with a real voice talent, finish with AI

Con’s

Output is unpredictable

Trial-and-error is needed to improve results

Standard available voices are non-exclusive

A quality recording is needed

Less flexibility in adjusting copy

Standard available voices are non-exclusive

Creating a custom clone takes time in the beginning, but saves time in the end

Not all voice talents are open to this approach

Permission & usage rights fees

No fees. This is between the voice talents that have allowed their voices to be cloned for the software, and the software creators.

Opt-out so our (text) input is not used for training.

For the recording that is used as an input: no buyout, only recording costs (and possibly talent fee).

When hiring a professional actor a usage rights license will apply.

For the AI voice profiles used: no extra fee,

Opt-out so our (audio) input is not used for training.

To be negotiated with the voice talent beforehand. There will be a fee for the first recordings and after that a license fee for the use of the voice clone, similar to current buyouts.

Opt-out so our (text and audio) input is not used for training.

When to use

“When OK is good enough” – Guide VO, instructions, if text interpretation is not important.

Mid-level projects and/or when you need multiple voice characters on a budget.

High-end projects with lots of variations over time (for one-off projects, just record a human).

Footnote

¹ Usage rights licenses on request per usecase.