The music industry is a dynamic space, with daily new releases, artists, bands and albums. Information on the industry is vast, presenting music platform providers with a great challenge, if their aim is to provide a complete, up to date service for their users.
This two-part article will demonstrate how RDFox can be used within a music streaming service, to link, enrich, validate and query large datasets, with record accuracy and speed. The provider can operate a responsive application, which obtains real value from their data. This use case nicely demonstrates the power of RDFox and its applicability to modern day applications.
RDFox is a knowledge graph and semantic reasoning engine. RDFox is a triplestore, which provides more flexibility than the strict tabular structure of an SQL database format, and allows various data sources to be easily linked, using reasoning. As an in-memory solution, RDFox is extremely fast, even for very large datasets. The powerful reasoning engine is unmatched in power and is the fastest graph-based querying system available.
RDFox uses semantic reasoning, also referred to as rules, to encode domain expertise into the knowledge graph. Rules can be used to compute metrics, identify missing features, categorise behaviours, discover repeated patterns and highlight inconsistencies. And what’s more, in RDFox this can be done incrementally. The music provider no longer has to worry about having a consistent database or responsive service — it’s all managed by RDFox.
In this case study, a hypothetical music platform ‘RDFox Music’ was created. Users can create a user account, listen to music, search for artists and songs and songs are recommended to them.
For functionality, the platform needed a complete understanding of the music industry. A dataset containing a broad amount of music industry data was created and stored within a knowledge graph which can be queried by users in natural language. The music knowledge graph incorporates three data sources: Wikidata, Discogs and MusicBrainz. Each source has strengths and weaknesses.
On their own, not one of these data sources has enough information for the music platform, but linked together, they provided a wealth of knowledge which can be cross referenced for validity. Each data source is in a different format. RDFox is a Resource Description Framework (RDF) triple store, so it requires information to be imported as triples.
Initially, the information was stored in three separate knowledge graphs within RDFox. Using RDFox’s reasoning capabilities, rules are created which link the three knowledge graphs together into a fourth, unified graph containing information on artists, bands and recordings.
In RDFox, rules are expressed in Datalog and represent ‘if-then’ statements. Rules are used to determine that an artist named in Discogs, is the same artist that is found in MusicBrainz and Wikidata, and then the information is stored on this artist within the unified knowledge graph (in grey).
This artist found in the unified knowledge graph is equivalent to the artists found in the data source knowledge graphs. However, to prevent four artists being returned when the knowledge graph is queried, rules are used to establish that the data from the three sources represents the same artist (ostmusic:artist/1), as seen below.
This example of linking data is applicable to other use cases and demonstrates the flexibility and power of reasoning with knowledge graphs.
For tips and tricks on writing rules read the article here.
To enrich the data, rules were used to materialise new information. This allows users to ask simpler queries and get the results quicker. One example of enriching the dataset includes calculating the count of members in a band.
‘The Knife’, has two members. This count is stored within the unified knowledge graph, so it is directly queriable by the music platform’s users. RDFox’s unique incremental reasoning capabilities mean that should a new member join ‘The Knife’, this number will be updated to three. Similarly, if one of the members leaves, the count will be updated to one, immediately, and automatically.
Using rules again, RDFox knows that a band with two members is called a ‘duo’, three a ‘triplet’ and four a ‘quartet’, etc. This increased the users’ ability to query the knowledge graph, for example, if asked for a ‘Swedish Electronic Duo’, RDFox knows that the user means ‘a band with two members’.
The following image provides an example of how to label a quartet:
RDFox Music want to provide recommendations to their users, this requires the data to be enriched, and is done in a number of ways.
Hierarchies were established which enriched the data within the knowledge graph, harnessing the ontological design of RDFox. A process of mapping for genre and location allows information on the music industry, stored as relationships between the data points to be added to the knowledge graph. By understanding genres and locations, the music platform can offer a more diverse recommendation experience, without compromising user satisfaction.
Additionally, recommendation services can suggest tracks based on streaming history, or songs listened to by similar users. With rules, RDFox can also determine trends by assessing the increasing popularity of genres or artists, based on the interaction of users with the platform.
It is also possible to discover similar entities through determining compatibility or finding similar patterns within the rich graph of connections. Thus, the music platform includes information on covers of the same track (e.g.Que sera sera, in different styles by different artists), alternative versions (e.g. remixes or reworks) or songs by the same artist. For more information on how RDFox can be used to determining compatibility between entities read our article on configuration.
Part One has explained how RDFox can be used to link and enrich data, providing a unified knowledge graph for users to query, and enriched results. To find out how RDFox Music validated and queried their data, and view performance statistics, read Part Two.