Scripts can help you save time and avoid duplication of code for setting up data stores, and executing your rules and queries. This article will explain what a script is, how to write scripts, when to use scripts, how to use scripts with persistence, how to structure your workspace, and finally how use scripts within scripts, to set up your data source.
This article uses RDFox, a high-performance knowledge graph and semantic reasoning engine, that was designed from the ground up with reasoning and performance in mind. The powerful reasoning engine is unmatched in efficiency and reasoning capabilities, and by using rules it can provide flexible, incremental addition and retraction of data, as well as, fast parallel materialisation of new facts.
So why not combine the efficiency of scripts with the power of RDFox in your production environment? Get yourself a free RDFox licence today!
A script is a text file which contains the instructions you would enter through the command line. Rather than entering each command manually, the script is run at import time and executes all the commands at once.
For example, the following script includes all the commands from the getting started guide, which were entered sequentially in the command line. By putting the commands into a script, which can be run when you first start RDFox, you can save a considerable amount of time:
In order to get the most out of your scripts, the team at Oxford Semantic Technologies (OST) have developed and refined the following tips.
OST recommend creating multiple scripts, for example, for commands and prefixes, and structuring your workspace to help improve clarity and maintainability. For example:
The OST team suggest storing the queries, rules, data sources, and other scripts (see below) separately within the workspace and having one main script to control the others, i.e. the startup script.
The scripts including an explanation for each step, following the hashtags..
The first part of the script initialises the datastore settings:
In the scripts folder, we find it useful to store the prefixes in a dedicated script, for example prefixes.script would look like:
We can use scripts to import the prefixes, add and attach live data sources to RDFox, along with the rules which map the data sources to triples. Additionally, we can set the number of threads, which allows for parallelisation.
Next we can import the triples and rules in parallel, and materialise the rules within the datastore.
To see the information about our datastore we can include the ‘info’ command. Additionally, we can expose an endpoint and access RDFox’s console. As well as querying or managing your knowledge graph using an IDE of your choice, such as Emacs or Visual Studio Code, you can type SPARQL queries into the RDFox console.
You can find more information on initiating the console in our docs.
As well as data and rules being imported in parallel, queries can also be answered in parallel. This is done using the following commands within the script:
RDFox can provide multiple formats for the query results, including printing the results to a file. You can include how the query output format in the script.
You can find this script on github.
We recommend keeping your rules in separate files so that you can reuse the rule templates.
For example, a rule folder could contain the main rule-patterns which would then be called upon when you need them.
Rule files can also be tailored to the use case you are solving. For example, a mapping rule can turn tabular data into triples that looks like this:
Scripts are useful in the development phase of your projects since they can be reused and allow you to pick up where you left things. Tailoring rules to your problem often involves deleting and restarting datastores which means that a script can be useful for quickly iterating until you reach a solution.
Scripts are also useful for setting up RDFox instances in a production environment. On cloud (Linux in this case), it could be done like by having a .service file where you specify the script to run:
Copy this into /etc/systemd/system using sudo:
Enable the service when the machine boots:
Start the service manually:
Since RDFox can be persisted, one might ask why scripts are needed if RDFox can simply start from its persisted state.
1) Only the database itself is persisted at this stage, but not the shell variables, such as the active datastore, prefixes, working directories, endpoint ports, etc. Endpoints also need to be restarted every time one quits the RDFox process.
OST advises that a script is run to set these up for a better user experience.
This could be achieved with a restart.script which would look like this:
Notice that there was no need to import triples or rules in this script, as they will have been persisted and reloaded into RDFox.
2) RDFox in-memory only (i.e. persistence switched off) may be faster, and will load more quickly when starting from scratch than from a persisted state, so if you’re just in an exploratory phase, it may be helpful to work in-memory and from a script.
Note that you can always save your datastore from the shell and then reload it like this:
You can use a script to set up data sources without clogging up the start script. You can call it dsource1.script
Once the script is ready, we will be able to execute it from the starting script. A script within a script!
You can find this article’s template workspace and scripts here. Alternatively, you can also book a script consultation with an OST Knowledge Engineer by emailing email@example.com or by joining our community slack channel.