Installing and Managing Virtuoso SPARQL Endpoint

Level: 
LOGD Related Technologies
Contributor: 
Description: 
Instructions for installing, configuring, and managing Virtuoso SPARQL endpoints (community edition)

Overview

This tutorial documents instructions for installing, configuring and managing Virtuoso Open Source Edition (VOSE) on 64-bit Linux servers. Some contents of this tutorial are from the VOSE documentation - TODO add link, but tailored specifically to the machine setup of TWC@RPI. It also introduces shell scripts to fulfill common administrative tasks (e.g., probing status of an VOSE SPARQL endpoint, starting/stopping/restarting an VOSE endpoint), which are developed by the LOGD team at RPI.

Installation

Packages and source code can be downloaded at http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSDownload. For the purpose of this tutorial, we are using the archived package available at http://sf.net/projects/virtuoso/files. Please note that checking out source code from Virtuoso's CVS server is also possible, please refer to http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSDownload for more detailed information.
After downloading the archived package (virtuoso-opensource-6.1.1.tar.gz), unzip it to the server you want to have Virtuoso installed. A detailed guide to compile and install Virtuoso is available online at http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSMake. The following is a step-by-step walk-through.
  • Make sure you have all the Package Dependencies.
  • Set the compiler flags according to the hardware processor and OS of your machine.
  • Unpack the downloaded VOSE package and navigate to that folder.
  • In the command prompt, enter
    ./autogen.sh
    • This will check the presence and right version of the required packages.
  • Enter
    ./configure
    • By default, the target installation directories are under /usr/local, but you can specify your desired directory using:
      ./configure --prefix=/path/to/dir
  • Enter
    ./configure
  • Enter
    make
  • Enter
    make install
    • If using the default target directory /usr/local, you should have root privilege.
    • You can also specify desired target directory using
      make install prefix=/path/to/dir
      . Installing to a directory that the current user have write access doesn't require root privilege.
If no error happens during any of the above steps, the installation should be finished.

Administrative Tasks

Manual Start-up

The Virtuoso server instance can be started by calling
/opt/virtuoso6/bin/virtuoso-t -f &
under the directory where the virtuoso.ini is located. Default directory to find virtuoso.ini is
/opt/virtuoso6/var/lib/virtuoso/db
.

Manual Shutdown

The Virtuoso server instance can be shutdown using the following steps:
  • Log into the isql interactive SQL command line environment. Please substitute <password> accordingly. Initial password set by Virtuoso is 'dba'.
    /opt/virtuoso6/bin/isql 1111 dba <password>
  • Execute the shutdown() function.
    SQL> shutdown();
Alternatively, the following shell command can also shut down a running Virtuoso instance:
/opt/virtuoso6/bin/isql 1111 dba <password> -K

Start-up/Shutdown Scripts

We have come up with some command line scripts on 64bit Linux (CentOS 5) to start-up/shutdown/restart the Virtuoso server instance and SPARQL endpoint in a single command.
  • To check status of the Virtuoso instance:
    sudo /etc/init.d/virtuosod status
  • To start the Virtuoso instance:
    sudo /etc/init.d/virtuosod start
  • To stop the Virtuoso instance:
    sudo /etc/init.d/virtuosod stop
  • To restart the Virtuoso instance:
    sudo /etc/init.d/virtuosod restart
Please note that:
  • All commands require sudo privileged user accounts.
  • Once the Virtuoso server instance is started successfully, the SPARQL endpoint will immediately become accessible at
    http://<host>:<port>/sparql
  • In order to start the Virtuoso instance correctly, please use the 'ps' command to make sure there are no existing live Virtuoso instances running under the directory of /opt/virtuoso6/var/lib/virtuoso/db. Otherwise, the startup command will fail due to the file locking mechanisms used by the Virtuoso implementation.

Loading Triples

We have come up with some command line utility scripts for loading triples in different formats into a named graph in the Virtuoso triple store. The scripts are located at google code and are installed on LOGD at
/opt/virtuoso/scripts
Newer, forked, versions of the scripts are available at github. The formats supported are:
  • RDF/XML
  • Turtle
  • N-triples
  • N-quad
Please follow these steps to load a data file (in either of the formats above) into a named graph:
  • Change directory to where the scripts are located.
    cd /opt/virtuoso/scripts
  • run the script vload, with exactly three arguments:
    • format: [rdf | ttl | nt | nq] corresponds to RDF/XML, Turtle, N-triples, and N-quad respectively.
    • data_file: path to the raw data file.
    • graph_uri: named graph uri into which the triples should be loaded
sudo ./vload nt /path/to/data/file/data-1554.nt http://data-gov.tw.rpi.edu/vocab/Dataset_1554
  • wait until the loading finishes. Depending on the size of the loaded dataset, this might take several seconds to several hours.

Deleting Named Graphs

There is a utility command for deleting a specific named graph from the triple store. It is located at
/opt/virtuoso/scripts
It takes only one argument, the URI of the named graph to be deleted. So, to delete all the triples in the named graph <http://data-gov.tw.rpi.edu/vocab/Dataset_1554>, you can use the following command.
sudo ./vdelete http://data-gov.tw.rpi.edu/vocab/Dataset_1554

Performance Tuning

There are online documentations on how to tune VOSE for better performance, such as the one at http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFPerformanceTuning and http://plato.cs.rpi.edu:8890/doc/html/rdfperformancetuning.html. Generally, configuring some of the parameters in the virtuoso.ini file to proper values helps to improve performance both in terms of loading big datasets and query evaluation. The following is a list of parameters in the virtuoso.ini file that needs to look at:
  • ServerThreads
    • Max number of threads used in the server, should be set close to the number of concurrent connections if heavy usage is expected. A value of 100 should work on most systems.
  • O_DIRECT
    • This may be useful if a large fraction of RAM is configured as database buffers. If this is on, the file system cache will not grow at the expense of the database process, for example it is less likely to swap out memory that Virtuoso uses for its own database buffers.
  • NumberOfBuffers
    • This controls the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Please also note that each buffer takes about 8700 Bytes (please cf. http://docs.openlinksw.com/virtuoso/dbadm.html for details about the size of each buffer).
  • CompileProceduresOnStartup
    • Setting this to 0 will speed up virtuoso startup, because stored procedures will not be loaded until the first time they are called.
  • FDsPerFile
    • Number of file descriptors per file to be obtained from OS. This parameter only effects databases that use striping. Having multiple FDs per file means that as many concurrent I/O operations may simultaneously be pending per file. This allows more flexibility for the OS to schedule the operations, potentially improving file I/O throughput.
  • ResultSetMaxRows
    • This setting is used to limit the number of the rows in the result. Sometimes adjusting the value of this parameter helps to prevent D.O.S attack.
Currently, our experiences is that on a 64bit Linux machine with 8 CPU cores (2*Quad core processor) and 32GB memory, setting the NumberOfBuffers parameter to the value of (32959832*0.6/8 = 2,400,000) will increase the performance significantly.

See also

http://tw.rpi.edu/web/inside/endpoints/installing-virtuoso
Your rating: None Average: 5 (8 votes)