RDFox Blog

ChatGPT’s ‘Snow White’ Problem: The Danger of Common Knowledge

Thomas Vout

Knowledge Engineer

4

min read

March 15, 2023

Original photo by Raamin ka on Unsplash

ChatGPT is an incredible tool that has revolutionized the way we interact with technology. With no barrier to entry, remarkable knowledge, and apparent creativity, more and more people are relying on this powerful AI chatbot to answer their most pressing questions.

The latest in AI services, ChatGPT demonstrates a phenomenal leap forward in machine learning technology, but it’s not without its flaws — primarily the quality of its answers. To solve this problem, we must turn to the rising second branch of artificial intelligence: semantic reasoning.

Snow White and the Explainable AI

If you ask ChatGPT what the earliest feature-length animation movie was, it will, quite confidently, answer ‘Snow White and the Seven Dwarves (1937)’. This sounds about right. We’ve certainly heard it before and we’re sure many wouldn’t question its accuracy. However, the reality is that Snow White and the Seven Dwarves was not the first feature-length animated movie — it was in fact (*at least) the second. Before it came a relatively unknown Argentine film called El Apóstol — released in 1917, now lost to time.

ChatGPT provides the incorrect answer to the question, ‘Which was the first feature-length animated film?’

And that’s where the problem lies. ChatGPT ‘knows’ Snow White and the Seven Dwarves to be the original animated movie because, collectively, we already believed that it was. It’s a common misconception that has been floating around the internet for years, and as such, has been used to train and teach ChatGPT what it now feeds back to us. ChatGPT even knows about El Apóstol and can provide all the information that a human would need to answer the question correctly. The problem, therefore, lies not in the data but in the steps to understand a question and give the correct answer.

ChatGPT demonstrates that it is aware of El Apostol and its release date — many years earlier than Snow White.

The real issue here though is not just that we’re given a false answer, it’s that there’s no way to differentiate it from the truth. As with all machine learning algorithms, ChatGPT can’t explain why it gave the answer that it did. Yes, you can ask for an explanation, but the same problem still applies — you can’t be sure of its truth. Suddenly we’re spiralling into an existential crisis over a fictional hypersomniac and her entourage.

This highlights one of machine learning’s key strengths and its greatest weakness — the answers it gives are based on statistical probability. For some applications, this is the very thing that makes it all possible, but when accurate results are required, or required with context, it becomes clear that we need another approach.

How to Improve Accuracy and Certainty with Explainable Artificial Intelligence

Semantic reasoning, the driving force of rules-based AI, provides a contrast to machine learning — applying a set of logical rules to infer new information from existing data, as opposed to applying a learned behaviour to a new scenario. The clear benefit of semantic reasoning is that any conclusion drawn can be traced back to exactly where it came from, step by step. Explaining the result no longer amounts to peering into a black box; now the workings can be shown in precise detail and you can be 100% confident that the results follow logically from the rules that were set.

To understand the importance of certainty, you need only imagine an autonomous vehicle as it makes a statistical-based decision that could mean the difference between avoiding or causing a collision. By instead relying on the logic of semantic reasoning, some automotive companies are already seeing the benefit of rules-based artificial intelligence.

This is where RDFox comes in. RDFox is a knowledge graph and semantic reasoning engine — a powerful AI software — developed at the University of Oxford.

Using RDFox, we can create a rule that simply states: find the oldest feature-length animated film and give it the tag ‘OldestAnimatedFilm’. By loading a comprehensive dataset into RDFox (we used Wikidata) we can ask the same question as before: ‘which was the first feature-length animated film?’ We ask this via a query instead of natural language but the meaning is the same. RDFox will of course give you the answer ‘El Apóstol’ and, if asked why, will direct you to the rule we just created plus the facts that matched the conditions — complete explainability.

RDFox is asked for details about a film with the tag ‘OldestAnimatedFilm’. It returns El Apóstol and its release date.

Beyond Data Quality: How to get Accurate Answers with AI

As you can see, the challenge goes beyond the availability of accurate data — good data quality is vital, but so too are the interpretability and correctness of results. In the current context that might not seem as life-and-death as we’re making out to be but consider for a moment some of the questions that are being asked in the real world.

Has this individual committed fraud?

Does this patient require treatment?

Is this vehicle safe to turn?

Suddenly the need for accurate, auditable results is critical. There is no room for error when the stakes are this high. And even if they weren’t, wouldn’t you rather decisions were made based on answers that could be backed up by logic and reason?

Can rules-based AI like RDFox write you a poem? No. But it will give you correct answers and for some, that’s the difference between revolutionary success and complete catastrophe.

‍

*Authors note: Since authoring this article it has come to my attention that there are in fact several other feature-length animated films that came before Snow White, including 1926's The Adventure of Price Achmed, directed by Lotte Reiniger and spotted by Caroline Carriazo. Even more reason to trust expert knowledge and leave the LLMs to their poetry.

Team and Resources

The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Enterprises (OSE) and Oxford University Innovation (OUI).

Product

RDFox

Why RDFox?

Start for Free

Download Centre

Features

Semantic Reasoning

Knowledge Graph Database

Performance

Deployment

On-Device

On-Cloud

Solutions

Use Cases

Semantic Search & Recommendation

Rules & Regulations

Configuration Management

Autonomous Vehicles

Resources

Resources

Resource Hub

Blog

News

FAQs

White Papers

Patents

Developers

Documentation

Download Centre

Support

Contact Us

RDFox Community

Events & Webinars

Getting Started Guide

Company

Company

About Us

Leadership

Careers

Partners

Contact Us

RDFox

Overview

Start for Free

Download Centre

Features

Semantic Reasoning

Knowledge Graph Database

Performance

Deployment

On-Device

On-Cloud

Use Cases

Semantic Search & Recommendation

Rules & Regulation

Configuration Management

Autonomous Vehicles

Resources

Resource Hub

Blog

News

FAQs

White Papers

Patents

Developers

Documentation

Start for Free

Download Centre

Support

Contact Us

Events & Webinars

The Getting Started Guide

Company

About Us

Leadership

Careers

Partners

Contact Us

RDFox Blog

ChatGPT’s ‘Snow White’ Problem: The Danger of Common Knowledge