All usage of RDFox, whether via its built-in shell, REST API, or Java library, involves an RDFox server. In turn, whenever there is an RDFox server there is an RDFox server directory — a filesystem location containing any settings and data the user wishes to persist to the next session. To help developers, architects and operators working with RDFox to get comfortable with managing server directories, this article will describe how and when they are used, give details on the process for initializing new directories, and briefly touch on some key operational considerations.
Although RDFox is a main memory data store, it can be configured to persist data to disk for easier restarts and for use in online transaction processing (OLTP) settings. When persistence is enabled, the RDFox server saves the data, along with the corresponding access control settings, to its server directory. The server directory also acts as the default location for the RDFox license key and the default base directory for API log files written by the server. To safely and securely manage an RDFox server configured for persistence, it is important to understand the basics of server directories, including how to initialize, secure, back-up and eventually destroy them.
Before we go any further, it’s worth mentioning that not all configurations of RDFox use the server directory. If you are running with persistence disabled (an in-memory only setup), providing the license key explicitly and have not enabled API logging, you can safely use RDFox with no knowledge of the server directory. Just save the link to this article in case things change later on and then you are free to stop reading here!
The path to the server directory is determined at server start-up by the server-directory parameter. If not specified, the default value is used. This is a location within the user’s home directory whose exact value depends on which operating system RDFox is running on (see the documentation for details). When persistence is enabled (as controlled by the persist-roles and persist-ds parameters), the RDFox server requires exclusive access to the configured server directory. It ensures this by locking the directory at startup and maintaining the lock until shutdown, thus preventing any other RDFox servers from concurrently using the same server directory.
The first stage in the life cycle of an RDFox server directory is initialization. An RDFox server directory is initialized when the first role is created within a server configured to persist roles to the given directory. The first role is granted privileges that give it full control of the server, including the ability to create data stores, create other roles and to grant privileges to other roles. Because this role is being created, its name and password must be available during the initialization step. Once the server directory is initialized however, it is possible to restart the server without requiring a role name or password (for example using the daemon mode of the RDFox executable).
Initialization can be achieved with any of the deployment configurations supported by RDFox, namely:
Once initialized, all server directories are equal in the eyes of RDFox. That is, a directory initialized using a Java program is indistinguishable from one initialized by the RDFox executable. This makes it possible to, for instance, load a server directory initially created by a JRDFox app into the RDFox executable in order to interrogate it with the RDFox shell.
To demonstrate the equivalence of the above options, we will walk through the steps to initialize a directory using the RDFox executable and then give example code for achieving the same thing in a Java program. In each case, we wish to initialize /var/lib/RDFox/data as an RDFox server directory. Both examples assume that the directory exists, is empty except for a valid RDFox key file called RDFox.lic, and has permissions such that the user running the program is able to read from and write to it.
As the directory we’re seeking to initialize does not match the default, we must specify the server-directory parameter explicitly. When using the RDFox executable, server parameters are given as key-value pairs immediately after the executable name. We will combine this with the shell mode of the RDFox executable which, by default, enables role and data persistence. This leads us to the following RDFox command:
When executed, the above command will immediately create an RDFox Server instance with the specified server parameters. As the server will not find any pre-existing access control settings in the specified server directory, the program will ask for the desired name and password for the first role:
Once the password has been confirmed, access control is initialized and the settings are saved to the directory. The program confirms this with the message:
Since we specified shell mode in our RDFox launch command, the same role name and password are next used to create a connection to the server within the RDFox shell:
One can check that the initialization has been successful by exiting the process (using command quit), rerunning the same RDFox command and observing that, this time, the prompt asks the user to log in rather than to provide credentials for the first role.
The following minimal Java code is sufficient to initialize /var/lib/RDFox/data as an RDFox server directory with both role and data store persistence, exactly as was shown with the RDFox executable above.
The code first starts a local RDFox server instance with the desired parameters (line 16). Once the server is started, it is possible to check how many roles the server contains as shown on line 19. If there are no pre-existing roles, the first role is created on line 21, thus triggering the initialization of the directory.
On first run, the above code should print:
and on subsequent runs it should print:
Once initialized, an RDFox server directory contains data that may be valuable, sensitive, difficult or impossible to rebuild from other sources, or all of the above. As such, it’s important to put appropriate measures in place to keep the server directory secure.
Since RDFox does not encrypt any of the data or settings it stores, it is crucial that access to the file system containing the server directory be appropriately controlled. Role passwords are hashed using the Argon2 password-hashing algorithm and the hashes are stored directly in the directory. Although this provides some protection against password cracking attacks, the hashes should still be considered sensitive. Locating the server directory on an encrypted filesystem may also be a good option for improving operational security.
To guard against unintended data loss, the server directory should be backed up at regular intervals. RDFox version 5.4 and all earlier versions supporting data store persistence use a log format to persist data stores. This has the advantage that corruption in one part of the file does not prevent RDFox from loading any complete transactions recorded before that, which helps limit data loss in the face of certain hardware and software failures. To prevent the log from growing too large, the compact shell command and equivalent REST and Java APIs are provided. These cause the persistence of the data store to be re-written from the current state, discarding earlier transactions accumulated in the log file. The compact operation is mandatory after RDFox has encountered corruption in the persistence of a data store. In this situation, compaction effectively resets the persistence to the state after the last uncorrupted transaction in the log file, permanently discarding everything after that in the file.
To support migration between incompatible persistence versions, RDFox provides the transcribe shell command. Running the transcribe command will export the contents of the server and write an RDFox shell script that, when executed in the newer RDFox version, will import the exported content into the new server. To minimise data loss, the original server directory should be kept intact until the restoration step has been completed and verified as complete. The script produced by transcribe should therefore be executed with a clean server directory.
When you are sure you are finished with the data and settings in a given RDFox server directory, simply delete it to remove all traces of that server from your machine. RDFox does not persist any data or settings outside of the server directory on any platform, other than when an explicit export operation is performed by the user.
We have learned about the RDFox server directory:
The server directory is an essential component to understand for the proper use of RDFox. Should you require more information, see the RDFox documentation or get in touch with us via the Oxford Semantic Technologies website. Both of these channels will further illuminate details and other crucial elements that will aid your journey with RDFox, so we highly recommend you enquire. As ever, for those not already doing so, we encourage you to try RDFox for yourself (for free!) and see how you can make use of the world’s most performant knowledge graph.
The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Enterprises (OSE) and Oxford University Innovation (OUI).