Back to The Blog

The Olympics: How to Build a Linked Data Application

Combining RDFox and Wallscope’s Platform

Combining RDFox and Wallscope’s Platform

Photo from Unsplash - Edited by Felicity Mulford and Angus Addlesee

In 2012, the BBC famously used linked data to support coverage of the London Olympics on its website, app, and interactive video player. They have continued to champion the benefits of semantic technologies to this day.

Coincidentally, as I started to write this, the Tokyo 2020 Olympics were planned to be in full swing. Therefore only one topic can so aptly be chosen for this article: The Olympics.

I usually write linked data tutorials but in this article I write about fully developing a small project from a few disparate data sources to a complete dynamic dashboard. In order to do this smoothly, I have selected RDFox by Oxford Semantic Technologies for my triplestore and reasoning engine in this project. Wallscope’s platform is of course triplestore agnostic, so any triplestore can be used, but this decision is made clear throughout the article.

Finally before I begin, I am currently working full-time on my PhD (combining linked data, speech processing, and computer vision to make more natural voice assistants) and could not have done this alone! All of these fantastic people contributed a huge amount to this project - please do check out their work:
-
Antero Duarte, Lead Developer at Wallscope
- Johnny Strachan, Full Stack Developer at Wallscope
- Dorota Burdach, UI/UX Designer & Front-End Developer at Wallscope
- Emma Findlow, Communications Manager at Wallscope
- Valerio Cocchi, Knowledge Engineer at Oxford Semantic Technologies
- Felicity Mulford, Marketing Analyst at Oxford Semantic Technologies

The Problem

Data management and integration has always been a huge problem with floods of new documents (health records, contracts, spreadsheets with columns titled ‘CY3’ and a Word document naturally stored somewhere else telling you what ‘CY3’ means), financial transactions, social media mentions, staff payroll updates, website traffic tracking data, and the list goes on…

Handling all of this data as it comes in live and assimilating it with existing information is a challenge for businesses of every size. This challenge cannot go ignored either - especially in the current climate. Efficiently utilising this unified data in real-time could help the public sector control budget cuts, large businesses retain their staff, and small businesses survive.

I really hope I have conveyed how important it is to tackle this challenge but with the ‘sales pitch’ over - let’s move on and show you what we can do.

We stick to the Olympics theme as an example but of course, the same methodologies can be applied to a huge variety of use-cases.

The Planned Output

To add a little context to the following sections, I thought I’d share my initial designs of this project’s final output. The plan was to create an interface that lets users seamlessly compare Olympic athletes (spoiler alert: we succeeded). We run through the data sources, reasoning, queries, and final result below but these drawings should clarify why we make certain decisions.

First we want an “Athlete View”, to allow the user to single out an individual athlete. In this view we have an infobox, news column, and dynamic charts to compare the selected athlete with their competitors.

Athlete View

Next we aggregate up to a “Sport View”. We again have an infobox and news column but our charts are centred around the selected sport. For example, we could investigate whether skiers have become lighter or heavier over the years.

Sport View

Finally, we have the “Continent View” and there are once again some dynamic charts, an infobox, and a news column containing unstructured text that mentions that continent. You could check whether there is a recent rise in African footballers for example.

Continent View

With the three planned views in mind, we need the data to populate our dashboard.

The Data

To imitate the disparity in data sources within organisations, we have brought together a few different data sources. These are:

Knowledge Graph

The initial knowledge graph we are using was originally created by myself for another tutorial. Running queries using an athlete’s age was not of concern for example, as I was explaining how to create a knowledge graph from a tabular dataset using OpenRefine. I downloaded the “120 Years of Olympic History” csv file from Kaggle and made the RDF version available on Github.

This knowledge graph contains athletes with their attributes (height, weight, age, sex, team) and medals they won (including the sport, games and year they won them). Each team was also attached to their National Olympic Committee (NOC) code. For example, let’s take a quick look at the gold medal Jessica Ennis-Hill won at London 2012:

Colours are for display purposes only. The prefix “walls” represents <http://wallscope.co.uk/ontology/olympics/> in this case. All other prefixes can be found on prefix.cc
Prior to RDFox V4 I had to tediously draw the above graph. However, you can now visualise graphs within RDFox (read more about what’s new in RDFox V4 here):
RDFox Graph Visualisation - coming with v4

Small Tabular Dataset

Our small tabular dataset just contained a list of NOCs next to their relevant continents. For transcontinental countries (those with land in more than one continent), the continent in which the majority of their land belongs was chosen. This was transformed into triples using a small script. To give an example, here is Portugal (NOC is “POR”) in turtle format:

@prefix noc: <http://wallscope.co.uk/resource/olympics/NOC/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <https://schema.org/> .

noc:POR dbo:continent dbr:Europe .
dbr:Europe a schema:Continent ;
 rdfs:label "Europe"@en .

If you are unfamiliar with RDF formats, here is a quick guide.

You may notice that I used dbo:continent to link each NOC to its appropriate continent entity. I then used schema:Continent as each continent entity’s type, indicating that the entity represents a continent. To find these, I used Wallscope’s Pronto tool which is free and open-source (developed by Francesco Belvedere). I will explain how to do this in just two screenshots:

To link two entities (NOC and continent) I need a predicate. I typed “continent” in the predicate search and used the top result.

To find an appropriate entity type for our continent entities, I typed “continent” into the type search. As this is an article read by humans, I figured the dbo result might cause confusion as it looks similar to the predicate (difference is the capital C). I therefore chose the schema result.

Reddit

In order to represent a major challenge that businesses face, we needed to include some unstructured text. To do this, I downloaded dumps of Reddit during the last few Olympic games (namely: London 2012, Sochi 2014, Rio 2016, and PyeongChang 2018). I then filtered this data to the 30k submissions in r/olympics for relevance.

This was then processed using Wallscope’s Data Foundry to extract relevant entities and map them to their counterparts in the initial knowledge graph (described above). Essentially, Data Foundry reads through each Reddit submission and extracts any information that it deems relevant. This information is then transformed into a knowledge graph which, as mentioned, is then linked to entities within existing knowledge graphs.

Data Foundry is getting a new face! Here is a sneak preview of the prototype - designed by Dorota Burdach

This enhanced graph can then be queried for Reddit submissions relevant to a specific athlete, sport, country, continent, etc…

At this point, our various data sources have been linked together for analysis but we want to do this quickly as new data comes in. This is where RDFox’s incremental reasoning comes in.

The Reasoning

Semantic Reasoning is the ability to make logical deductions from the information that is explicitly available. In RDFox, this is done using rules written in Datalog - a rule language for knowledge representation.

The knowledge graphs that Wallscope create and utilise are stored in RDF-triplestores (a database type) and RDFox is the one we are using in this project. One reason for this is their fantastic reasoning engine that runs as new data comes in and we have therefore used it here to improve the performance of our final demo. There are other pros to using RDFox of course, such as its impressive speed.

When working with a client, Wallscope’s team work closely with you to design an easy-to-use and intuitive interface to explore and present your data in the most suitable manner for your use case. We therefore know what calculations and aggregations will likely be requested through this interface and can optimise for this.

There are infinite examples of heavy queries over large graphs. For example, if a business had years of financial transactions of varying types (materials, payroll, insurance, etc…) to process and the staff often require a condensed overview of this information - they may request a summary of all material purchases.

Example summary of all material purchases. Source

This query has to run through all material transactions and run several aggregations to return a full and accurate report, taking a significant amount of time. Instead (since we know this is a critical summary), we could run these calculations as material transactions take place and integrate this new information into the knowledge graph. As a result, the team can output this report in an instant, make decisions faster, and continue with their day.

An example in a health-related context could be live reporting of how many patients are in the cardiovascular ward. What capacity remains, and how does that compare across the rest of the country?

To illustrate this in practice, we will apply some reasoning to our Olympics data using RDFox rules.

Rules in Practice

Full documentation for reasoning in RDFox can be found here.

Starting off very simply, we can restructure an existing graph if needed. For example, we can link an athlete to the games they participated in with the following rule:

[?athlete, wso:athleteInGames, ?games]
:-
[?instance, wso:athlete, ?athlete],
[?instance, wso:games, ?games].

wso represents <http://wallscope.co.uk/ontology/olympics/>.

The head of the rule (before “:-”) is created and stored in RDFox if the conditions in the body of the rule (after “:-”) hold. In this example, when an instance links to both an athlete and an Olympic games (in the original graph), we can deduce that the athlete in question took part in those games.

As our original Olympic knowledge graph was made for a short tutorial, we really need to refactor the athlete ages. Essentially, each athlete is linked to the age at which they won a medal - resulting in athletes with multiple medals also having multiple ages (woops, my fault). This is not a problem with access to RDFox’s rules however as we can grab the year that an athlete won their first medal, grab the youngest age linked to that athlete, and calculate their birth year. Problem solved!

To begin refactoring, let’s attach each athlete to the youngest age at which they won an Olympic medal:

[?ath, wso:minAge, ?min]
:-
AGGREGATE(
 [?ath, foaf:age, ?age]
 ON ?ath BIND MIN(?age) AS ?min ) .

In this example, all of an athlete’s ages are grabbed and the MIN found. Then, this minimum age is linked to the athlete with wso:minAge.

Similarly, we need the earliest year that an athlete won a medal:

[?ath, wso:earliestYear, ?min]
:-
AGGREGATE(
 [?ath, wso:athleteInGames, ?g],
 [?g, dbp:year, ?y]
 ON ?ath BIND MIN(?y) AS ?min ) .

Notably here, we are using wso:athleteInGames which we created two rules above - starting to create a small hierarchy of rules.

Finally, using the earliest year that an athlete won an Olympic medal and their age at the time, we can calculate each athlete’s birth year:

[?ath, wso:birthYear, ?by]
:-
[?ath, wso:earliestYear, ?ey],
[?ath, wso:minAge, ?age],
BIND(?ey - ?age AS ?by) .

Hopefully these short examples are relatively clear, so let’s move on to designing whole new entities as the foundation for later rules:

[?part, a, wso:Participation],
[?part, wso:hasAthlete, ?ath],
[?part, wso:hasGames, ?g],
[?part, wso:hasYear, ?y],
[?part, wso:hasAthleteAge, ?age],
[?part, wso:hasCountry, ?ctry]
:-
[?ath, wso:athleteInGames, ?g],
[?ath, wso:birthYear, ?by],
[?ath, wso:hasCountry, ?ctry],
[?ath, foaf:age, ?age],
[?g, dbp:year, ?y],
FILTER( ?age + ?by = ?y),
BIND(IRI( CONCAT(STR(wsr:), "participation/",
 REPLACE(STR(?ath), STR(wsr:),""), "_",
 REPLACE(STR(?g), STR(wsr:),""))) AS ?part ) .

wsr represents <http://wallscope.co.uk/resource/olympics/>.

Essentially, this rule creates Participation (?part) entities which are similar to instances in the original graph but less convoluted for further extensions and rule building. To illustrate an extension to this participation entity, we can link the number of medals an athlete wins to their participation at an Olympic games:

[?part, wso:medalsAtGames, ?ct]
:-
AGGREGATE(
 [?part, wso:hasInstance, ?inst],
 [?inst, wso:medal, ?med]
 ON ?part
 BIND COUNT(?med) AS ?ct ) .

To illustrate the development of further rules using participation entities, we can create a rule to link an athlete to the total number of medals they have ever won at the Olympics:

[? Ath, use: totalMedalCount,? Mc]
:-
AGGREGATE(
 [?part, wso:hasAthlete, ?ath],
 [?part, wso:medalsAtGames, ?meds]
 ON ?ath
 BIND SUM(?meds) AS ?mc
) .

[?ath, wso:totalMedalCount, 0]
:-
[?ath, a, foaf:Person],
NOT EXIST ?meds, ?part IN (
 [?part, wso:hasAthlete, ?ath],
 [?part, wso:medalsAtGames, ?meds] ) .

As you can see here, athletes are either connected to their wso:totalMedalCount (if they have won a medal), or to a wso:totalMedalCount of zero (if they have not ever won a medal).

Developing this even further, we can link the birth years we calculated earlier to the average wso:totalMedalCount of all athletes born in that year:

[?year, wso:yearHasAverageMedals, ?avg]
:-
AGGREGATE(
 [?ath, wso:birthYear, ?year],
 [?ath, wso:totalMedalCount, ?tot]
 ON ?year BIND AVG(?tot) AS ?avg
) .

You will see how the output of this rule is used in the first query in the queries section below.

There are many other rules used in this project but I really wanted to highlight this hierarchy of rules dependent on supporting rules (which in turn depend on supporting rules, etc…). This may sound minor but is very valuable in practice.

Returning to our healthcare example: How many patients are in the cardiovascular ward and what capacity remains? Hospital management may want to know this for their specific hospital, regional directors may want to know this on a healthboard level, and an MP may want to know this at a national level. In addition to the levels of aggregation, they all want to know this information is accurate, updating live as patients are admitted and discharged.

Depiction of a hierarchy of rules (each node represents a rule). If data enters that matches the bottom right rule’s conditions, all the marked rules update the graph as required.

With our hierarchy of rules, information is efficiently updated at all relevant levels of aggregation as data is received.

The Queries

So far we have designed an interface, enhanced a variety of data sources, and developed some incremental reasoning to support our dashboards live performance. To finally populate our charts, we just need to write a few SPARQL queries to return the appropriate results.

If you are new to SPARQL queries and would like to learn, I have previously written two tutorials: one on basic SPARQL queries and the other on more advanced SPARQL queries. In addition, Felicity Mulford has written an article on SPARQL basics and RDFox.

I am going to assume that you would be comfortable writing queries to populate the info boxes (see note above if not), I will therefore run through some of the queries that populate the charts and the news columns.

Chart Queries

Starting with the “athlete view”, we have a histogram that displays the average number of medals athlete’s have won bucketed by athlete’s age. In the last rule we described earlier, we combined each athlete’s birthYear and totalMedalCount to link each year to exactly this metric. Therefore, our query is much simpler and can be written like so:

PREFIX wso: <http://wallscope.co.uk/ontology/olympics/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT (YEAR(?date) - ?birthYear AS ?age) ?avgMedalCount

WHERE {
 BIND(xsd:dateTime(NOW()) AS ?date)
 ?birthYear wso:yearHasAverageMedals ?avgMedalCount .
}
ORDER BY ?age

Note the use of the link wso:yearHasAverageMedals - which is the link we created in the last rule described.

RDFox Query Console
Without the rules, we would have to run a hugely complex query spanning the entire graph. In an industrial use case, this could save a significant amount of time!

For the parallel coordinates plot we need a slightly larger query, using some of the rules that I didn’t detail above (they can all be found on GitHub). Essentially, we have another hierarchy of rules to aggregate athlete stats by sex, sport, continent, and year. This design allows us to populate our more dynamic charts very quickly as the dropdown options are used.

RDFox Faceted Search - Finding the average African female swimmer’s height in 2016.

Using this extra information we deduced through our reasoning, we can send the following query:

PREFIX wso: <http://wallscope.co.uk/ontology/olympics/>
PREFIX wSport: <http://wallscope.co.uk/resource/olympics/sport/>
PREFIX dbr: <http://dbpedia.org/resource/>

SELECT
 (((AVG(?mWeight) + AVG(?fWeight))/2) AS ?avgWeight)
 (((AVG(?mHeight) + AVG(?fHeight))/2) AS ?avgHeight)
 (((AVG(?mAge) + AVG(?fAge))/2) AS ?avgAge)

WHERE {
 ?cis wso:continentInSportAverageMaleWeight ?mWeight ;
      wso:continentInSportAverageMaleHeight ?mHeight ;
      wso:continentInSportAverageMaleAge ?mAge ;
      wso:continentInSportAverageFemaleWeight ?fWeight ;
      wso:continentInSportAverageFemaleHeight ?fHeight ;
      wso:continentInSportAverageFemaleAge ?fAge ;
      wso:hasContinent ?continent ;
      wso:hasSport ?sport .

 # When user selects "Africa", ?continent is set to dbr:Africa.
 # When user selects "Swimming", ?sport is set to wSport:Swimming.
}

This calculates and returns the global average weight, height, and age of all Olympic athletes ever. By switching the ?continent or ?sport variables to a fixed entity (see in query comments), we can return more specific aggregates to the user. We do not aggregate by sex in our rules as this can be done very easily within the query. To clarify, we could theoretically have one billion athletes in this graph but the query would still only be finding the mean of two returned values.

Sensitive information is often littered throughout a companies files and databases. This includes payroll, medical records, staff personal contact details, customer information, private financial transactions, and the list goes on…

To allow public data access without compromising security, Wallscope’s Platform provides a data access management layer (called HiCCUP) that allows fine grained control over the data that can be accessed within the knowledge graph, and makes it available as an API that can be consumed from different applications.

A HiCCUP “recipe” for the query above. The chart can populate live but through a controlled endpoint.

In the “sport” view, we want to report the top athletes, ordered by total medal count. Luckily, we made a rule above to attach athletes directly to the number of Olympic medals they have won in their career. Let’s find the top five male swimmers:

PREFIX wso: <http://wallscope.co.uk/ontology/olympics/>
PREFIX wSex: <http://wallscope.co.uk/resource/olympics/gender/>
PREFIX wSport: <http://wallscope.co.uk/resource/olympics/sport/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT ?name ?mc
WHERE {
 ?instance wso:event ?event ;
           wso:athlete ?athlete .

 ?event rdfs:subClassOf wSport:Swimming . # Swimming for example.

 ?athlete foaf:gender wSex:M ; # Switch "M" to "F" for female.
          wso:totalMedalCount ?mc ;
          rdfs:label ?name .
}
ORDER BY DESC(?mc)
LIMIT 5

Again, this query would be significantly more complicated if we had not used wso:totalMedalCount. As a preview, here are the results of this specific example:

Query results in RDFox’s query console.

With these examples, I hope you can see how our ‘on the fly’ reasoning and aggregation removes a lot of the pressure from the queries themselves. This results in a more responsive application without compromising accuracy.

News Queries

To populate the ‘News’ section of the interface we do something a little different. As mentioned above, we used Wallscope’s platform to process the 30k Reddit texts that we downloaded. This process outputs a knowledge graph that represents the submissions and their content. Finally, we then map this graph to our core entities for retrieval of related “news”.

Usually we do this disambiguation and filtering within Wallscope’s platform, while indexing. RDFox did not have a full-text search functionality at the time so we wanted to show off how fast it is at doing this - even at runtime. In V4 of RDFox however, Apache Solr has been integrated.
Essentially, when you open a page in the demo - a series of queries are run.

The first query returns all of the platform’s output entities (Reddit submissions) that match a given text. For this example, we are looking for Michael Phelps.

PREFIX dct: <http://purl.org/dc/terms/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX walls: <http://wallscope.co.uk/ontology/>
CONSTRUCT{
 ?file a walls:File ;
       dct:subject ?subs ;
       schema:text ?text ;
       schema:url ?url .
 ?subs rdfs:label ?match .
}
WHERE {
{SELECT ?file ?subs ?match ?text ?url
WHERE {
 BIND(CONCAT(".*",CONCAT(replace("Michael Phelps"," ",".*"),".*")) as ?candidate)
 ?file dct:subject ?subs ; schema:text ?text ; schema:url ?url .
 ?subs rdfs:label ?match .
 FILTER regex(lcase(str(?match)), lcase(str(?candidate)))
 BIND(SHA512(CONCAT(str(?file), str(RAND()))) as ?random)
}
ORDER BY ?random
LIMIT 10
}
}

This query outputs all Reddit submissions that mention Michael Phelps and the output should look like this:

<file://DF/reddit/results-sm/olympic-rs-2016-08-4420.txt> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wallscope.co.uk/ontology/File> .
<file://DF/reddit/results-sm/olympic-rs-2016-08-4420.txt> <http://purl.org/dc/terms/subject> <http://wallscope.co.uk/resource/cc00aafe-a54c-41ae-8ea0-a10570e493c9> .
<file://DF/reddit/results-sm/olympic-rs-2016-08-4420.txt> <http://schema.org/text> "Michael Phelps Wins Gold in Men's Swimming 200M Butterfly | Olympics 201...\n" .
<file://DF/reddit/results-sm/olympic-rs-2016-08-4420.txt> <http://schema.org/url> <https://www.reddit.com/r/olympics/comments/4x4hwi/michael_phelps_wins_gold_in_mens_swimming_200m/> .
<http://wallscope.co.uk/resource/cc00aafe-a54c-41ae-8ea0-a10570e493c9> <http://www.w3.org/2000/01/rdf-schema#label> "Michael Phelps Wins Gold" .

With these matching submission, we now want to retrieve all the other entities that are linked to the same submissions. This query will return related entities to Michael Phelps and provide us with navigation hooks.

Wallscope’s platform indexes temporal entities (such as dates and times) by default, but we have decided not to explore them in this project. We have therefore used this query to also filter for entities of specific types.

PREFIX dct: <http://purl.org/dc/terms/>CONSTRUCT { <file://DF/reddit/results-sm/olympic-rs-2016-08-4420.txt> dct:subject ?subs }
WHERE {
 VALUES (?type) {
   (<http://wallscope.co.uk/ontology/nlp/PERSON>)
   (<http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing>)
   (<http://schema.org/Organization>)
   (<http://www.w3.org/2004/02/skos/core#Concept>)
 }
 <file://DF/reddit/results-sm/olympic-rs-2016-08-4420.txt> dct:subject ?subs .
 ?subs a ?type .
}

We now have submissions linked to Michael Phelps and other related entities. Again, here is the example output:

<file://DF/reddit/results-sm/olympic-rs-2016-08-4420.txt> <http://purl.org/dc/terms/subject> <http://wallscope.co.uk/resource/cc00aafe-a54c-41ae-8ea0-a10570e493c9> .
<file://DF/reddit/results-sm/olympic-rs-2016-08-4420.txt> <http://purl.org/dc/terms/subject> <http://wallscope.co.uk/resource/baede3ee-9471-494d-8d11-3544b53e1067> .

In this case, “cc00aafe-a54c-41ae-8ea0-a10570e493c9” represents “Michael Phelps” and “baede3ee-9471–494d-8d11–3544b53e1067” represents “Swimming”.

Finally, we need to map the related entities to our core knowledge graph. Like the previous query, we filter results for entities that have types which interest us. In this case we want to find athletes, sports, and continents for navigation hooks.

PREFIX dct: <http://purl.org/dc/terms/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

CONSTRUCT {
 ?mention dct:relation ?s .
 ?s rdfs:label ?name .
}
WHERE {
 BIND(<http://wallscope.co.uk/resource/baede3ee-9471-494d-8d11-3544b53e1067> as ?mention)

 VALUES (?types) {
   (<http://xmlns.com/foaf/0.1/Person>)
   (<https://schema.org/Continent>)
   (<http://dbpedia.org/ontology/Sport>)
 }
 ?s a ?types ; rdfs:label ?name .
 ?mention rdfs:label ?mentionLabel .
 BIND(replace(str(?mentionLabel)," ",".*") as ?candidate)
 FILTER regex(?name, ?candidate)
}
LIMIT 1

This final query in the chain links Michael Phelps to swimming through a Reddit submission that mentions them both. Once again, here is the example output:

<http://wallscope.co.uk/resource/baede3ee-9471-494d-8d11-3544b53e1067> <http://purl.org/dc/terms/relation> <http://wallscope.co.uk/resource/olympics/sport/Swimming> .

In a usual project, we wouldn’t store any of these intermediary entities - this is for demonstration purposes only. We would output the following:

<file://DF/reddit/results-sm/olympic-rs-2016-08-4420.txt> <http://purl.org/dc/terms/subject> <http://wallscope.co.uk/resource/olympics/sport/Swimming> .
<file://DF/reddit/results-sm/olympic-rs-2016-08-4420.txt> <http://purl.org/dc/terms/subject> <http://wallscope.co.uk/resource/olympics/MichaelFredPhelpsII> .

This allows us to open the page for Michael Phelps and quickly display the submission relating him to swimming in the interface.

Screenshot from dashboard interface.

The Dashboard

How can we present all of the above work? Well, we developed a dashboard with three views: the athlete view, sport view, and continent view. Now I know that my drawings at the start of the article were incredible (sarcasm of course) but Johnny, Dorota, and Antero really brought this to life.

When you open the dashboard you land on a completely random athlete’s page - for example, I began with Allyson Felix.

source

Allyson Felix is an outstanding sprinter, competing in the Olympics from 2004 to 2016. The top of her page is shown below.

It is important to note that each “athlete view” has a common layout for all athletes.
Athlete View - Allyson Felix

You can see information that we know about an athlete, personalised charts, and related Reddit posts. Every section contains an info button which explains what is being shown and what we are doing behind the scenes.

We are not caching any results so every page in this demo is populated at runtime. As you are roaming the interface, note how fast the charts populate and how quickly they are updated as you filter.

Further down in the athlete view there is a chart with some filters. The parallel coordinates plot titled “Statistics Comparison” compares the current athlete to the average Olympian. Using the filters, I have compared Allyson Felix to the average male basketball player from a country in Oceania (a very tall bunch it seems):

Athlete View - Allyson Felix

From the sport in Allyson’s infobox, we can navigate through to the sport view by clicking “Athletics”:

Sport View - Athletics

In this view, we can see that Allyson Felix is not just a great sprinter, but the top female athlete that competes in athletic events (by medal count). The sport view contains many charts and I noticed something interesting while examining the “Medals Per Continent” chart:

Sport View - Athletics

You may have noticed that I selected 1972 on the “Medals Per Continent” slider. This chart displays two groups of bars. On the left we can see the number of medals that were won by continents (well, athlete’s representing countries within that continent). On the right we can see the number of athletes that competed for countries within each continent.

In 1972 the chart looks as expected, but lets look at four sequential summer Olympics (1972 to 1984):

I was sliding through the years and noticed that Africa all but disappears in 1976 - but only for the 1976 games? Similarly, North America follows the same pattern but at the 1980 games?

To investigate this further, I will type “Africa” into the search bar and head to the continent view:

Continent View - Africa

The summer and winter Olympics occur every four years but staggered by two years (summer in 2012, winter in 2014, summer in 2016, winter in 2018, etc…). African athletes rarely compete at the winter Olympics (only 5 at Sochi 2014), hence the vastly different numbers of athletes every two years.

Investigating the earlier question, we can filter the chart for “Athletics”:

Continent View - Africa

Now the drop in athlete numbers is very obvious! The number of African athlete’s dropped to almost zero in 1976.

I genuinely noticed this using our interface so had to dig for an answer. It turns out that African countries boycotted the Olympics in 1976 because New Zealand were not banned from the games. South Africa had been banned from the Olympics since 1964 because they refused to condemn apartheid. New Zealand’s rugby team were currently touring South Africa which sparked the start of the boycott. Source here.

It turns out that the drop in North American athletes in 1980 was also the result of a boycott. As every continent view has the same layout, we can check the same filtered chart as above on the “North America” page:

Continent View - North America

The 1980 Olympics were held in Moscow, Russia and the United States boycotted the games in protest of the Soviet invasion of Afghanistan (source).

Finally, to show off the “news” sections in each view - lets look at a couple of athletes.

I have decided to choose one historical athlete that still gets talked about on Reddit for obvious reasons - Jesse Owens:

Athlete View - Jesse Owens

As you can see, we display Reddit posts that are related to the entity of interest (Jesse Owens in this case). In addition, if other entities are also mentioned, a related second tag appears as a navigation tool.

Each of these posts can be clicked to head to the actual post on Reddit.

In other very unrelated news, bobsledder Johnny Quinn got stuck twice at Sochi 2016:

Johnny first got stuck in a bathroom while taking a shower. After calling for help, he had to use his bobsleigh skills to bash through the door!

source

He then got stuck in a lift… I don’t imagine he managed to barge out of that one however.

The Conclusion

From a few disparate data sources, we noticed data anomalies through an interface and learned something new. This was all built using the combined power of RDFox and Wallscope’s platform, so I hope I have conveyed the power of applications built like this one (not just for the Olympics use-case, but more generally).

If you want to discuss how you could benefit from anything discussed in this article, please feel free to get in touch with us here or here.

You can find the application here and if you are a developer, you can find the GitHub repo here.

If you are looking to read more, you can find all of my articles here but I really urge you to check out the profiles of all the contributors to this project:
-
Antero Duarte, Lead Developer at Wallscope
- Johnny Strachan, Full Stack Developer at Wallscope
- Dorota Burdach, UI/UX Designer & Front-End Developer at Wallscope
- Emma Findlow, Communications Manager at Wallscope
- Valerio Cocchi, Knowledge Engineer at Oxford Semantic Technologies
- Felicity Mulford, Marketing Analyst at Oxford Semantic Technologies
Many of these contributors have written in Wallscope’s publication or Oxford Semantic Technologies’ publication.

We have also created a quick video to show off the demo:

Finally, when the next Olympics go ahead, we are planning to update this project. I will tweet about this at the time - I really hope we have the time!

...

The Team and Resources

The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Innovation (OSI) and Oxford University’s investment arm (OUI). The author is proud to be a member of this team.