LibGuides: GenAI limitations: Reliability

Reliability of genAI - limitations and considerations

Generative AI (genAI) tools are impressive, often producing content that seems near-human. But they have their limits. Their outputs are only as good as the datasets (learn more) they've been trained on. Ask any genAI chatbot a question or a request – what it gives you can be helpful, funny, flawed or just plain made-up.

While genAI tools are a groundbreaking development in the realm of artificial intelligence, it’s crucial to approach their use with an understanding of their limitations. They are tools best used with a blend of human judgement and technological assistance, ensuring their outputs are leveraged effectively and responsibly.

Exploring output fails

Try using Microsoft Bing Chat to generate an infographic about a topic you’re studying. Does it give you accurate and readable information?

Explore our collage of examples below. Click on the plus (+) icons to read the prompt behind the output and the discussion about what wasn’t quite right in what was generated.

Activity overview

This interactive image hotspot provides some examples of AI generated infographics. Clicking on the hotspots reveals information about the outputs. Hotspots are displayed as plus (+) icons.

Hotspot 1: AI in the library

Prompt asked: Create a powerpoint slide about using AI in the library.

Image description: A grey robot holding a stack of books, surrounded by a mindmap of random images inside circles such as cogs, technology and the brain.

What was wrong with the output:

The text in the image – at a glance the title and subtitle read well. Closer examination shows obvious spelling and formatting errors.
The graphics in the image – stereotyped and one-dimensional representation in the choice of images (library is depicted as just books and AI is a robot).
The output format as an image was also problematic, we asked for a PowerPoint slide and it produced a JPEG file.

What was helpful:

It’s a good example of how information can be presented using visual cues in a presentation slide.
The short text chunks are a good design benchmark for presenting written information.

Main take-away: It’s important to use the right genAI tool for the output you need.

Hotspot 2: Data visualisation

Prompt asked: Create a data visualisation in a pie chart of all the regions in the world and digital poverty statistics.

Image description: Circular chart that has multicoloured segments on the left and depicts clouds and continents on the right.

What was wrong with the output:

No data provided or visualised.
Too much content within the image makes it unreadable.
Lack of logical information means there’s no meaning within the image.
The prompt asked for a pie chart but this type of data (demographic and statistical) would be better visualised in other formats. genAI tools currently don’t have evaluative judgment, they action a prompt without suggesting fit for purpose options.

What was helpful?

Looking and evaluating this output made it clear that a pie chart wouldn’t work well as data visualisation. The positive outcome was being able to test visualisation ideas before building them.

Main take-away: genAI tools won’t always recognise their limitations and can provide wrong format or information as a result.

Hotspot 3: Representation in images

Prompt asked: Create an image that represents artists.

Image description: Black and white image of 4 men and 3 women standing/sitting around a table. They are all focused on painting. In the centre of the table are jars of paint brushes.

What was wrong with the output:

This image reflects bias in a number of ways:

Artists are all young with no age diversity
Minimal racial diversity in the group
Artists seem to have similar body types
Assumption that art = painting and sketching. Digital art or other forms of art practice are not considered.

When genAI produces an image that confirms or reinforces existing stereotypes or assumptions, the type of bias is often referred to as "confirmation bias". In an AI or machine learning context it's more accurately "algorithmic bias" or "data bias."

What was helpful in the output:

This output steps away from male artist as hero stereotype.
The overall image quality would work well for use in a presentation slide deck. It would also function well as a stimulus image that an artist then creates work in response or extension to.

Main take-away: You need to be specific with your prompt, the more detail the better as this helps create a quality output that mitigates bias.

Hotspot 4: Pros Cons table

Prompt asked: Create a pros and cons table of using Generative AI in study.

Image description: Top-down view of a table. In the centre of the table is a diagram divided into 4 sections, and each section contains lots of boxes with random indistinct imagery. Surrounding the diagram are objects such as pens, pencils and cups of coffee.

What was wrong with the output:

Bing Chat gave us two outputs for this prompt: 1. an image of a table and 2. a text-based table.

The problem with the image output:

A text-based response was required but Bing Chat guessed we also wanted an image because of the chat prompt history.
No useful information at all. Words are unreadable. Images have no meaning.
Gave as an actual table scattered with irrelevant things.

The problem with the text-based output:

Teacher rather than student perspective in the content
Errors in text with random numbers added.
American rather than Australian English language used.

What was helpful:

Bing Chat gave us two outputs for this prompt. The image output was not usable but the text-based table could be refined or extended on.

Hotspot 5: Infographic about information

Prompt asked: Create me an infographic about mis, dis and malinformation in connection to generative AI.

Image description: An infographic that is black, blue and orange, which contains illegible words and a bunch of random images such as clouds, computer screens, cogs and more. It is very cluttered.

What was wrong with the output:

It looks like an infographic from a distance but closer examination shows none of the content makes any sense.
Data visualisations are generic images not actual analysis.
Infographics have a balance between meaningful text and visualisation, this ouput has no real text, data is missing, and images have no meaning.

What was helpful?

The design and layout of the infographic is a starting point for an infographic creation.
Good balance of text, data and imagery in design.
Colour palette and visual elements are well designed.

Main take-away: Pause, reflect and dive deeper into the output. The image is a great reminder to always assess the generated output.

A range of limitations

Our genAI foundational module highlighted that genAI tools use machine learning to take-in a lot of data and create content based on what you tell them. They're a bit like the word suggestions on your phone when you type a message, they get better as they learn more. For example, ChatGPT-4 now has enough data to be able to accurately pass the U.S. Bar exam and the Legal Ethics exam!

However, genAI can give you unreliable or flat-out wrong content if the data has issues (gaps or inaccuracies) or if your instructions are unclear. GenAI tools create content based on patterns they’ve learned and not through actual understanding. That means the content can be inaccurate, nonsensical, biased or superficial. At times genAI tools can hallucinate responses. That's why it's always a good idea to check their responses carefully, especially when the answers are important.

Read on for a deep dive into how inaccuracies, hallucinations, bias and ineffective prompts all impact the quality of genAI output.

Inaccuracies

GenAI has reshaped the rules of information creation, communication, and consumption. The technologies' ability to generate vast amounts of diverse and complex content quickly, pulling from huge and at times unknown sources of data, has increased the likelihood of inaccuracies (learn more) occurring.

Keep in mind that the quality and breadth of the training data directly influence genAI outputs.

These systems learn from vast datasets and their responses reflect what information is in these datasets. The way genAI works is by looking at a lot of examples and trying to copy or remix what it sees. Sometimes, it has learned from outdated or incorrect information, which means the answers it gives can also be outdated or incorrect.

Hallucinations

GenAI tools use pre-defined sets of training information, with its predictive technology identifying patterns in this information. The tools then use these predictive patterns when generating content in response to user prompts. ‘Hallucinations’ (learn more) are AI generated responses that are fabricated, inaccurate or incorrect. These occur when the genAI tool becomes erratic as it attempts to respond to prompts for which it has insufficient or flawed information. As machines, genAI tools do not have the capacity to reason or reflect on the sense of content they generate.

GenAI companies know hallucinations are built into their tools. Did you know that researchers have explored hallucination rates for different genAI technologies?

You can’t unquestionably trust genAI content or take the output at face value.

As technology continues to evolve hallucination rates should decrease but it’s still a limitation you need to be aware of.

What causes genAI hallucinations?

There are several reasons for hallucinations. Click on the flipcards below to learn more.

Tool quality and purpose

GenAI tools have varying levels of development that impacts their effectiveness. Tool purpose must match what users ask it to do.

Training data quality

Garbage In = Garbage Out. Quality of what’s put into the tool (training data) impacts output quality. Data limitations (e.g. insufficient/outdated) means the tool generates inaccurate responses.

Prompt wording

Clear instructions and limiting ambiguity are critical. Prompts need to give specifics of what you need the output to be. Quality of wording impacts the quality of output.

Malicious prompts

Malicious prompts are deliberately designed to confuse the genAI tool or to corrupt the training data and can lead to hallucinations.

Biases

Humans are biased and we are typically unaware of our own bias (learn more). We make assumptions based on their own understandings and experiences. We see or react to things through a very individual lens.This kind of thinking is known as cognitive bias. Confirmation bias is when we look for information to confirm our own biases which can distort the output of the tool.

Humans are part of the genAI loop at different points. Human bias therefore impacts how we interact with these tools.

GenAI tools are influenced when designers or moderators either knowingly or unknowingly introduce their own cognitive biases as they develop and iterate the tool.
Users also inadvertently introduce their biases to genAI tool through the types of data they enter, including prompts. An example of generating self-confirming output from the tool is "Give me a picture of a parent with a spoon in **her**hand".
Our biases also impact how we interpret the genAI information or output. We favour results that align with our beliefs or expectations and disregard outputs that contradict them.

Training Data Bias

If the data used to train the AI is biased, the generated outputs reflect that bias. This perpetuates stereotypes, misinformation and other inaccuracies.

Read through the types of data bias that genAI can include in the interactive below. Match the type and the definition by drag and drop.

Prompts

Prompts are commands you use for generative AI tools. They can be questions, files, images or other data that the tool responds to when producing output. They have a huge influence on the quality of the material you generate. Be careful with the prompts you provide:

Overgeneralised prompts can produce output that is too broad to be useful (explore overgeneralised examples and improved versions)
Overly specific prompts can produce output that is too specific and misses the big picture (explore overly specific examples and improved versions)
Ambiguous or unclear prompts can produce vague and unusable output (explore ambiguous examples and improved versions)
Malicious by design prompts produce deliberately inaccurate or biased output that in turn feeds back into the system (poisoned data) (explore malicious by design examples)

Tip

You can explore how to prompt more effectively in our Prompt engineering module.

The trouble with nuance

While generative models can handle a wide range of queries, they can sometimes miss subtle nuances, sarcasm, or cultural contexts. They may produce answers that are technically correct but might not fit a specific cultural, regional, or nuanced understanding.

Your essential role in this context is to bridge the gap between the technically correct outputs of genAI and the nuanced understanding required for specific situations. This involves several key responsibilities that you can explore by moving through the interactive below:

Activity overview

This interactive image hotspot provides some key responsibilities to explore. Hotspots are displayed as plus (+) icons that can be clicked, to present the information.

Hotspot 1: Contextual interpretation

People must interpret genAI outputs within specific cultural and situational contexts to ensure relevance and appropriateness.

Hotspot 2: Critical evaluation

It's essential for people to critically assess whether genAI's responses accurately reflect intended nuances and sentiments.

Hotspot 3: Prompt refinement

Human input is crucial in refining prompts to guide genAI towards more contextually accurate and relevant outputs.

Hotspot 4: Ethical oversight

People are responsible for overseeing genAI's use to prevent cultural insensitivity and ensure ethical standards.

Hotspot 5: Supplementing information

People need to fill gaps in genAI's capabilities, providing additional information or adjustments where necessary.

Alert

Trust plays a critical role when using any information source or digital tool. As a genAI tools user you need to decide how much you trust the output being produced. Taking an active role in generating, interpreting and extending the generated content is a key strategy in effective genAI use.

Remember and reflect

Key takeaway

Using generative AI can help with many aspects of work and study, saving you a lot of time. Just remember that while convenient, genAI tools are not perfect, and their outputs have known limitations. Being able to trust in the quality of what’s generated relies on your critical evaluation and sense-making. The type of tool you use, how it is has been trained, and the data you input all determine the results and whether they are appropriate.

Consider

Be mindful of the various limitations and don’t rely on the product or answers as being completely accurate, up-to-date or fit for purpose. Suggested checklist of what to consider when using and creating with genAI tools:

Does this tool have current and regularly updated data?

Have I cross-checked the genAI output with other credible sources?

Does the generated output make sense to you?

Can you identify potential biases in the output?

Do I need to modify my prompts to get more relevant output?