LibGuides: GenAI limitations: Audio output

Audio output

How could you use it?

You don’t like the sound of your voice but a video assessment is due. Need sound effects or music for your art exhibition, animation creation, or marketing campaign. Or maybe you want to learn a language through conversation or creating a focus study soundtrack. Audio genAI can augment all of these things.

What are audio outputs

GenAI audio exists in several formats but typically covers generative speech and generative music. Technology in these areas can be AI text-to-speech, AI generative voices, AI voice cloning, AI dubbing and AI music (learn more about these terms). Like most genAI tools, those designed for audio are trained on large datasets. From existing audio recordings or datasets (learn more) genAI identifies patterns and structures to create a remixed piece of audio.

Misuse, manipulation or just plain bad

While genAI technology can generate lifelike or genuine sounding audio, such as a person’s voice, this can lead to outputs being used for malicious purposes (learn more). In fact, generated audio is being used to make scams more sophisticated or propaganda more impactful. If you’re using, without permission, a song snippet or sound bite of someone else speaking the odds are that unauthorised use has breached privacy, intellectual property or other legal and ethical boundaries.

Generative audio also raises challenges to traditional notions of authenticity in terms of creativity and identity. There are ethical and professional concerns like the potential replacement of human voices in various industries which have implications for employment and personal identity. Another limitation is that generated voices or sounds can lack or compromise authenticity, have unusual cadence, or have incongruous or a quite bland tone. Check out the social media account of this AI singer-songwriter. Think about how real and authentic this voice sounds.

Explore different scenarios about generated audio and trust in the interactive below. Click on the arrows to navigate.

Specific risks and limitations

As outlined earlier, audio output from genAI has risks and limitations that impact both creation and use. To deep dive into specific areas of concern and to engage with reflective questions, listen to information in the below audio recording. Can you tell that we created this information using generative audio?

Activity overview

This activity plays audio that has being generated by a genAI tool that has converted text to speech.

Audio transcript

This is not the voice of a real librarian – or even a real person. A genAI tool was used to convert text to speech. Now, let’s look at the risks and limitations that accompany the use of genAI for creating audio.

Using AI to mimic or synthesize human voices raises ethical issues, especially if it breaches consent and privacy. Unauthorised use of someone’s voice to create synthetic audio recordings infringes on personal rights. You can hear more about these issues and concerns on this page. As a student, you’re expected to develop critical thinking skills. It’s about being a global citizen who engages ethically and productively with technologies, so in relation to audio-based genAI outputs, we want you to consider ethical questions like:

Why are you using generative tools to create audio?
How are you sharing the audio you are generating?
Are you using audio that is under copyright?

There are a lot of misuse and manipulation concerns to consider in the audio and genAI space. One of the malicious purposes of generative audio is that AI-generated voices or sounds can be used to create convincing fake recordings for deceptive purposes, such as:

Spreading misinformation, disinformation or malinformation
Impersonating individuals
Generating fake evidence

The authenticity or provenance of audio is important, so an awareness that someone’s voice could be false is essential to consider. The capacity to replicate someone’s voice also poses security risks if AI-generated audio is effective enough to mimic voice authentication systems. Sophisticated technology could potentially be exploited for unauthorised access or fraudulent activities.

Have you ever heard a deep-fake audio? Do you sometimes wonder if your voice has been harvested and used without your consent?

Let’s talk about inconsistent or not-quite-right quality that you can come across in generated audio. The quality of generative audio varies incredibly. You can find heaps of examples where the output sounds unnatural. Sometimes it can be robotic, sometimes incoherent and sometimes it just lacks originality. There’s no standard of how genAI audio will sound. Add that to a multitude of tools available and you can imagine the output variability. Using generated audio may not get you the results you want.

Hopefully you took a look at the social media account of Anna Indiana (an AI singer-songwriter) flagged earlier. Apparently, Anna Indiana stands for Artificial Neural Networks Accelerate Innovative New Developments Igniting A New Age. Some critics have called Anna Indiana work “deeply mediocre”.

Have a listen to her songs and think about whether it presents an authentic use of generated audio. How do you feel about generated voices? Do they bother you or are you just used to them now?

What does bias and fairness mean when you’re taking about generated audio?

Like other genAI tools, if the data used to train genAI audio models is biased, the generated audio may also exhibit biases. So this could result in the reinforcement of stereotypes, discrimination, or other forms of prejudice in the training data. Think about how everyone you’ve ever met has their own biases in play. And then think about what that means for genAI datasets that have scraped the web or social media. Mind boggling!

Using genAI in an informed and evaluative way means being conscious of these biases in output you generate with the tool or in output others have generated and you come across. When producing or listening to generated audio be mindful of whether biases are present. Things like stereotypical accents, music, phrases or gender roles.

In what ways do you think the use of generated audio could lead to unfair outcomes? What might be a way you could influence the development of generated audio to be more inclusive and fair?

Generative AI audio also raises legal questions regarding ownership, copyright, and intellectual property. If you’re using a genAI tool for audio that involves the input of existing music or sounds, you could be at risk of breaching copyright – especially if it contains substantial parts from another artist.

Any generated audio that closely resembles existing copyrighted material is going to be problematic.

The point of reflection here is that you need to be mindful of whether you have consent to use audio files to generate content. Permission to use another person’s voice in genAI must be obtained to avoid possible legal issues.

One of the often overlooked limitations of generated audio is that current tools have only basic options for remix or modification. Depending on the genAI tool used, it can be difficult to edit the music or sounds that are generated, such as an inability to specify instruments or alter melody or add in sound effects. What this means is that users may have limited control over the specific characteristics of the generated audio, making it challenging to fine-tune the output according to specific requirements or preferences.

The other side of the production coin is that what gets generated needs to be evaluated by the creator. If the person who created it has limited understanding of music, audio genres or sound production then it’s not surprising that mediocre or worse quality is the result. If you jumped onto a genAI tool to create a soundscape or song, do you think you would be able to judge its artistic merit?

Here’s our concluding thought. You’ve now listened to the various concerns of using genAI to create audio output – and all spoken to you with the assistance of a generative audio tool. Hopefully these considerations help you with responsible use, and awareness of what generative audio can do.

Caution

Audio generated by AI tools is entirely dependent on the quality and diversity of the training data. When the data is limited or biased, the audio output can be skewed or lack variety. Unauthorised or deliberate misuse of certain audio types in AI tools, as well as the resulting output, can lead to ethical and legal breaches.

How to minimise the risks

There are different ways to minimise the risks related to genAI and audio. Work through the below scenario to build your awareness of the limitations of generative audio and ways to mitigate those limitations. .

Scenario

Hannah has been using AudioCraft to generate some background music for a video presentation. She is also using it to provide a voiceover for her script. What are the risks with using a genAI tool for audio and what can Hannah do to minimise them?

In the activity below, match the minimising strategy to the risk. Read through the suggested strategies and place the appropriate response in line with the risk it addresses. Then check your answer. You can rerun the activity as many times as you like.

Stories

Working through this page you’ve read, watched or listened to different scenarios and stories that unpack the relationship between genAI and audio. Below is a final real-world tale that shares the potential of what generative audio can do and the ethical implications that can arise. It also touches on questions of trust and ownership.

Listen to the audio story below about how artists like Drake and The Weekend have had their style and voices mimicked without permission. While the details are real, this story was both written and then voiced by genAI.

Activity overview

This activity plays audio that has being generated by a genAI tool that has converted text to speech.

Audio transcript

This is the story of artistic identity, generated songs and streaming platforms.

Our title is "The Beat Battle: Universal Music Clashes with AI - A Cautionary Tale"

What's up, everyone? Today, we're diving into a real-life drama unfolding in the music industry, and trust me, it's something straight out of a tech thriller. We know music is no longer just being created by humans. genAI is both mimicking the style of real artists and generating new content. So what happened to Drake and The Weekend?

Picture this: Universal Music Group, yeah, the big guns behind so many of our favorite tracks, are throwing down the gauntlet against the world of AI.

So, here’s the lowdown: UMG noticed something sketchy going on. AI companies have been secretly tapping into their treasure trove of songs. And I'm not just talking about a few tracks here and there. We're talking about a massive operation, using these songs to "train" their AI systems. Existing or original music compositions were being used to hone the genAI audio technology, ‘violating copyright en masse’.

A musical track, Heart On My Sleeve, used AI-generated vocals that sounded like musicians Drake and The Weeknd. In contrast to the Grimes story we shared earlier, these artists definitely did not give permission for their style and voices to be remixed and used in genAI platforms.

UMG wasn't having any of it. The record label slammed the song's maker for imitating their talent. They shot off emails to the big players like Spotify and Apple Music – you know, the usual suspects in the streaming game. Their message? "Cut these AI guys off. They're using our stuff without permission." Pretty bold move, right?

And get this, the plot thickens. These AI platforms have been like sponges, soaking up all sorts of tunes to create new, AI-generated music. It's like they're trying to mimic our favorite artists without even asking. Sounds like something from a sci-fi movie, but it's happening right now.

But wait, it's not just about the tunes. There's this whole legal drama unfolding too. Visual artists and big names like Getty Images are already taking AI companies to court for using their work without permission. So, UMG is like, "Hold up, we might be next."

UMG's stance is clear: they're standing up for their artists. They're saying, "We can't let these platforms use our music to beef up their AI without giving credit where credit's due." They want to make sure artists get their fair share.

Now, here's where it gets tricky. Spotify, Apple Music, and the rest haven't really said much. It's all hush-hush. And with the complicated business ties between these giants, it's like watching a high-stakes chess game.

What UMG's really doing here is laying down the law. They're pushing these streaming services to tighten up their rules, to make sure no one's stepping on artists' toes. It's all about protecting the music and the people who create it.

So, what do you think? Is UMG the hero in this story, standing up for artists' rights? Or is this just the beginning of a bigger battle over who controls the future of music in the age of AI?

Think about what you listen to. How do you feel when listening to music created by AI compared with music created by humans? How do you feel that artists should be compensated if their style is mimicked?

Remember and reflect

Key takeaway

Not everything you hear is real or trustworthy. genAI can generate amazing soundscapes, songs or speeches. However, generated audio can be misused or even deliberately produced for malicious purposes.

Consider

Here are some key points to keep in mind when producing or engaging with generated audio:

Check the authenticity of an audio source – can you verify it? Sophisticated genAI technology can effectively impersonate or replicate voices and sounds.

How does the audio sound? Always consider if the genAI tool you're using is fit for purpose.

Gain consent for use of other people's audio. Any content you generate and share must be done with proper consent for ethical and legal reasons.