Back to The Blog

Writing RDFox Scripts

A short guide to improving your workflow

A short guide to improving your workflow

Photo by lorimcm on Unsplash

Using the shell is the simplest way to get started with RDFox, but having to manually input each command every time you want to use it can become a hassle. Thankfully, the RDFox shell can process text files with pre-written commands and execute those commands just as if you typed them one by one. This can save you a lot of time and make the process of restarting your work much simpler.

In this article we will talk about what scripts are and how you can use them to improve your RDFox workflow. You can find the accompanying Github repository here.

Why RDFox?

RDFox is a highly optimised knowledge graph and rules engine, designed from the ground up with performance and reasoning in mind. Its unique in-memory approach gives it unmatched speed and its advanced semantic reasoning capabilities make it the perfect fit for any demanding production environment.

What is a script?

A script is a text file containing RDFox shell commands. For example, it might look a bit like this:

my-start-script.rdfox

prefix : <http://oxfordsemantic.tech/myDataStore/entities#>

set output out
endpoint start

dstore create myDataStore # this is a comment
active myDataStore

# this is also a comment

import ! :Alice a :Person .

Each line is a separate shell command and comments are introduced using the hash (#) symbol.

Running scripts in RDFox

If our script is located in <working_directory>, we can run it on startup of RDFox using:

<path_to_rdfox>/RDFox sandbox <working_directory> my-start-script

(on Mac/Linux) or

<path_to_rdfox>/RDFox.exe sandbox <working_directory> my-start-script

(on Windows).

Notice that the path for the start script is relative to the <working_directory> and that we can omit the .rdfox file extension. Your file can be of any plain text format, but other extensions will have to be included.

Alternatively, we can run any script at any point after starting the RDFox shell by just typing in the path to it (again, relative to the <working_directory>). Note that means we can execute scripts within scripts.

Structuring your workspace

With scripts you can easily maintain order in your files and simultaneously retain the ease of jumping back into your workflow. For example, if your <working_directory> looks like this:

you can have one main script you run each time you restart RDFox:

start.rdfox

settings.rdfox
prefixes.rdfox

set output out
endpoint start

dstore create myDataStore
active myDataStore

import-data.rdfox
import-basic-rules.rdfox
answer-basic-queries.rdfox

Notice we do not specify where our script files are located — the secret to that is modifying the settings:

settings.rdfox

set dir.scripts "$(dir.root)scripts/"

set dir.facts "$(dir.root)data/"

set dir.dlog "$(dir.root)rules/"

set dir.queries "$(dir.root)queries/"

set dir.output "$(dir.root)output/"

These commands change the directories where RDFox searches for given types of files.

It is often beneficial to split your rules and scripts into multiple files, each with a very specific purpose, so that you can easily modify and reuse them when needed. In particular, setting up a data source can require quite a long script, so it might be better to separate it from other script files.

Answering queries

Just like in the shell, there are two ways to run queries in scripts. One is to use the answer (for read queries), update (for write queries) or evaluate (for any type of query) command and pass a query file to it. This is usually easier to write as we do not have to worry about line breaks.

The other way is to simply insert your query directly into your script:

SELECT ?person ?firstName ?dateOfBirth WHERE { \
   ?person prop:hasPosition :accountant ; \
       prop:hasFirstName ?firstName ; \
       prop:hasDateOfBirth ?dateOfBirth \
}

Notice though that we had to add backslashes (\) at the end of our lines to tell RDFox we were not finished with our input yet. This can become a hassle for long and complex queries when we need to make adjustments.

If we want to save our results, we can write our query answers directly into a file by modifying our output settings, for example:

write-accountants-csv.rdfox

set output accountants.csv
set query.answer-format text/csv

SELECT ?person ?firstName ?dateOfBirth WHERE { \
   ?person prop:hasPosition :accountant ; \
       prop:hasFirstName ?firstName ; \
       prop:hasDateOfBirth ?dateOfBirth \
}

# Return to previous values
set output out
set query.answer-format "application/x.sparql-results+turtle-abbrev"

Here, we first set our desired output file and format, then query the data store and return to the previous settings. Note that although we do not have to create the output file beforehand, the directory it is placed in (in this case <working_directory>/output, as set in settings.rdfox) must already exist.

Parameters

Suppose we want to create a more universal script for writing query results to csv files. It could look something like this:

write-general-csv.rdfox

set output $(1)
set query.answer-format text/csv

answer $(2)

# Return to previous values
set output out
set query.answer-format "application/x.sparql-results+turtle-abbrev"

The $(<n>) notation used here indicates script parameters. When running the script, we can add 2 values that will replace them during execution, for example:

write-general-csv developers.csv developers.rq

This can be very useful if we reuse similar yet not quite identical code at many points in our scripts.

Parallelisation and transactions

With RDFox, you can control the way it handles your requests. Instead of importing files or answering queries one after another, it can do these things in parallel. In order to achieve that, we pass multiple files to the same import or answer command:

import data-1.ttl data-2.ttl

How many threads RDFox uses is controlled by the threads <n>command. By default, this will be set to the number of logical processors on your machine.

Now, suppose we would like to ensure that some operations are either executed together or not at all. The concept that helps us with that is RDFox transactions.

A transaction is a sequence of commands that starts with the begin keyword. It has an optional parameter specifying the type of transaction to be started — either “read/write” (write, the default), “read-only” (read) or “interruptible read-only” (interruptible-read). You can end a transaction using commit (if you want to keep the changes made in it) or rollback (if you want to discard them).

transaction.rdfox

begin

import manager-rule.dlog
mat

answer managers.rq

# Try to commit transaction first. If that fails, roll back
commit
rollback

Unlike normally, in transactions rules are not materialised immediately, and although if we run any query materialisation happens automatically before we are given an answer, sometimes it might be beneficial to trigger it manually. That is where the mat command comes in — when used inside a transaction, it will materialise any new rules and give you some information about what exactly has changed

This can be useful for troubleshooting.

When to use scripts?

Scripts can be a great help in the development phase of your project because when going through multiple iterations of rules, it is often easier to start from scratch than remove their old versions each time.

Scripts are also useful when setting up a production environment with RDFox running in daemon mode, as we can use a single command to start RDFox in shell mode and execute a script (that ends with switching to daemon mode) on startup of our machine:

<path_to_rdfox>/RDFox shell <working_directory> start

We can, of course, run RDFox with persistence and have our data available after each restart automatically. That said, running in-memory only (with persistence turned off) can offer a performance benefit, and even if we do decide to turn it on, shell variables (such as prefixes, endpoint and directory settings, or active datastore) are not persisted and it is still recommended that we create a restart script to set these automatically.

Interested in RDFox?

You can request a free 30-day evaluation license here. We also offer free academic licenses.

To learn more about RDFox go to our website or medium page. If you have any questions or would like to schedule a consultation session with one of our Knowledge Engineers, you can email info@oxfordsemantic.tech.

...

The Team and Resources

The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Innovation (OSI) and Oxford University’s investment arm (OUI). The author is proud to be a member of this team.