Category Archives: Results

GeoKnow Research Contributions

Geospatial information extraction and management

One of the main contributions of GeoKnow was to take geospatial data out of GIS for making it accessible on the web. This will allow the access to non experts, and to have a self-explanatory spatial information structures accessible via standard Web protocols and support for ad-hoc definable, flexible information structures. GeoKnow developed and improved Sparqlify and TripleGeo. These tools allow for transforming geospatial data from several conventional formats, into RDF triples, compliant with several standards (GeoSPARQL, Virtuoso vocabulary, etc). Sparqlify has been tested for mapping OpenStreetMap (OSM) database to RDF, and TripleGeo supports several different geospatial databases (PostgreSQL, Oracle, MySQL, IBM DB2) , and also file formats (ESRI shapes, GML, KML).

As the need for location data within Linked Data applications has increased it has accordingly been a requirement for RDF Triple stores to support multiple geometries at Web scale. At the beginning of the GeoKnow project the Virtuoso RDF QUAD store only supported the point Geometry type and durning the course of the project it has been enhanced to support some 14 additional Geometry types (Pointlist, Ring, Box, Linestring, Multilinestring, Polygon, Multipolygon, Collection, Curve, Closedcurve, Curvepolygon and Multicurve) and associated functions, with near full compliance with the GeoSPARQL / OGC standards now in place. Support for the GEOS (Geometry Engine – Open Source) Library has been implemented in Virtuoso, enhancing its Geospatial capabilities further.

The Virtuoso query optimiser has been enhanced to improve geospatial query performance including parallelisation of their execution. In addition improvements have been made to the RDF storage optimisation reorganising physical data placement according to geospatial properties implementing a structured-aware RDF using Characteristic Sets. Over the course of the project annual benchmarking of the Virtuoso QUAD store have been performed to demonstrate the improvements in the state of the art in Geospatial querying by using the GeoBench tool. In the very last report Virtuoso was benchmarked using the newer OSM dataset. In the Description of Work, a scalability up to 25 billion triples and query times below 0.5s was envisioned, but presented results used more than 50 billion triples, and average query execution times of 0.46s for a power run, thus, results have exceeded expectations.

Spatial knowledge aggregation and fusing

GeoKnow project also aims at enriching the web of data with the geospatial dimension, so it has contributed with the development of interlinking and fusing methods adopted to spatial information. Two of the first tools achieving this are Data Extraction and Enrichment Framework (DEER)}, formerly known as GeoLift, and LIMES. DEER adds the spatial dimension in a dataset describing locations found in the links or in unstructured data (using a NLP component). LIMES was extended to enable linking within complex resources in geo-spatial data sets (e.g., polygons, line-strings, etc.). Furthermore, the improvement of these geo-linking tools were extended to scale using map-reduce algorithms in a cloud-based architecture. For evaluating these developments, the corresponding benchmarks were created (see \ref{sec:benchmarking}). This experimental benchmark consisted of linking cities from DBpedia and LinkedGeoData. Initial results suggested that by using a geospatial dimension and a mean distance when linking datasets, a perfect linking accuracy could be achieved. The result of this research was accredited with the Best Research Paper award at ESWC 2013.
For working directly with geometries, FAGI framework was created for fusing different RDF representations of geometries into a consistent map. This tool receives as input two datasets and a set of links that interlink entities between the datasets and produces a new dataset where each pair of linked entities is fused into a single entity. The fusion is performed for each pair of matched properties between two linked entities, according to a selected fusion action, and considers both spatial and non-spatial properties (metadata). Fusing geospatial data may lead to a very time consuming process. Thus, improvements were proposed and implemented for optimising several processes (focusing on the minimisation of data transfer and the exploitation of graph-joining functionality) and a benchmark was designed to evaluate those improvements\footnote. FAGI was also extended with additional functionality that support exploration, manual authoring, several options for batch fusion actions and link discovery and learning mechanisms for recommending fusion actions and annotation classes for fused entities.

Quality Assessment

GeoKnow also worked on providing tools to improve the quality of existing datasets. The OSM community constantly contributes with enrichment and enhancement of OSM maps, and provided the needed tools for doing so. GeoKnow contributed in improving the quality of the annotations providing by the users by generating classification and clustering models in order to recommend categories for new entities inserted into OSM. OSMRec, the tool developed for this aim, can be used for recommending OSM categories for newly created geospatial entities, based on already existing annotated entities in OSM. Other data quality assessment on geospatial data was investigated, first by identifying the metrics that can be used to asses the data pertaining to various aspects such as coverage, surface area and structuredness. These metrics were used to evaluate community-generated datasets. The metrics outcome was used to create two software tools for assessing the quality of the datasets. CROCUS produces statistics about the data, it generates three types of Data Cubes, where the first Data Cube refers to the accuracy, second and third DataCube addresses the completeness and consistency of spatial data. And, the GeoKnow Quality Evaluator (GQE), reuses CROCUS and implements a set of geospatial data quality metrics (e.g., dataset coverage, coherence, avg. polygons/per class, etc) to compare different datasets across these metrics. These tools were used to evaluate three different datasets: LinkedGeoData, NUTS, GeoLinkedData. The results from this evaluation helped to understand the overall structure of the datasets and the variety of the data. Another data assessment tool created in GeoKnow was the RDF Data Validation Tool, which is based on integrity constraints defined by the Integrity constraints defined by the RDF Data Cube vocabulary, and is focused on statistical data.

Visualisation and Data Curation

The exploration and visualisation of data is a crucial task for final users. GeoKnow aimed at creating maps that are dynamically enriched and adopted to the needs of special user communities. Thus, modern software frameworks were explored to support the creation of such interfaces. GeoKnow developed reusable JavaScript libraries for interfacing with SPARQL endpoints. These libraries were used for instance in Mappify, which is a tool for easily generating and sharing maps as widgets, and Facete, which is a faceted browser for RDF spatial and non-spatial data enhanced with editing support. The editing capabilities consist basically in the definition of the interaction between an endpoint and the UI (Facete). The RDF Edit eXtension (REX) tool interface was implemented to support two kinds of data editing, one dealing with geospatial data on a map, and other for editing triples. Furthermore, Lodtenant was developed to support curating RDF data by means of workflows realised as batch processes. After data curation process, one may require the possibility of saving changes for using them later or propagating them to the other datasets. One of the Unister requirements consisted in the capability of managing and synchronising changes between different versions of private and public interlinked datasets\footnote{\url{http://svn.aksw.org/projects/GeoKnow/Public/D4.3.1_Concept_for_public-private_Co-Evolution.pdf}}. This requirement derived the deployment of the Co-Evolution Service component, which is a web application with a REST interface that allow managing dataset changes.
Another visualization component was developed for visualising spatio-temporal data. \textit{Exploratory Spatio-Temporal Analysis of Linked Data ESTA-LD} is a tool for spatiotemporal analysis of linked statistical data that appear at different levels of granularity. Finally, Mobile-based visualisation was also covered in GeoKnow. The GEM application allows to perform faceted browsing fully exploits the Linked Open Data paradigm. This tool allows browsing any number of SPARQL endpoints and filtering resources based on their type and constraints on properties, as well as leverage GPS positioning to deliver semantic routing.

GeoKnow Generator Workbench

The GeoKnow Generator Workbench provides an unified interface and data access to most of the tools described earlier in this section, and is available online to test here. It enables simple access and interaction with the different components needed in the LD Lifecycle. This Workbench was designed under the requirements specification of the GeoKnow use cases. In general, these requirements include:

  • Scalability for working with large data sets
  • Authentication, Authorisation and Role Management as a primary requirement in companies
  • Data Provenance tracking for traceability of changes
  • Job Monitoring and Robustness for applicability in production
  • Modularity and Composability in order to provide flexibility w.r.t. integrating linked data tools

FAGI-gis: fusing geospatial RDF data

GeoKnow introduces the latest version of FAGI-gis, a framework for fusing Linked Data, that focuses on the geospatial properties of the linked entities. FAGI-gis receives as input two datasets (through SPARQL endpoints) and a set of links that interlink entities between the datasets and produces a new dataset where each pair of linked entities is fused into a single entity. Fusion is performed for each pair of matched properties between two linked entities, according to a selected fusion action, and considers both spatial and non-spatial properties.

The tool supports an interactive interface, offering visualization of the data at every part of the process. Especially for spatial data, it provides map previewing and graphical manipulation of geometries. Further, it provides advanced fusion functionality through batch mode fusion, clustering of links, link discovery/creation, property matching, property creation, etc.

As the first step of the fusion workflow, the tool allows the user to select and filter the interlinked entities  (using the Classes they belong to or SPARQL queries) to be loaded for further fusion. Then, at the schema matching step, a semi-automatic process facilitates the mapping of entity properties from one dataset to the other. Finally, the fusion panel allows the map-based manipulation of geometries, and the selection from a set of fusion actions in order to produce a final entity, where each pair of matched properties are fused according to the most suitable action.

The above process can be enriched with several advanced fusion facilities. The user is able to cluster the linked entities according to the way they are interlinked, so as to handle with different fusion actions, different clusters of linked entities. Moreover, the user can load unlinked entities and be recommended candidate entities to interlink. Finally, training on past fusion actions and on OpenStreetMap data, FAGI-gis is able to recommend suitable fusion actions and OSM Categories (Classes) respectively, for pairs of fused entities.

The FAGI-gis is provided as free software and its current version is available from GitHub. An introductory user guide is also available. More detailed information on FAGI-gis is provided int he following documents:


 

GeoKnow Public Datasets

In this blogpost we want to present three public datasets that were improved/created in GeoKnow project.

LinkedGeoData
Size: 177GB zipped turtle file
URL: http://linkedgeodata.org/

LinkedGeoData is the RDF version of Open Street Map (OSM), which covers the entire planet geospatial data information. As of September 2014 the zipped xml file from OSM had 36GB of data, while the size of zipped LGD files in turtle format is 177GB. The detailed description of the dataset can be found in the D1.3.2 Continuous Report on Performance Evaluation. Technically, LinkedGeoData is set of SQL files, database-to-rdf (RDB2RDF) mappings, and bash scripts. The actual RDF conversion is carried out by the SPARQL-to-SQL rewriter Sparqlify. You can view the Sparqlify Mappings for LinkedGeoData here. Within The maintenance and improvement of the Mappings required to transform OSM data to RDF has being done during all the project. This dataset has being used in several use cases, but specially for all benchmarking tasks within GeoKnow.

Wikimapia
URL: http://wikimapia.org/api/

Wikimapia is a crowdsourced, open-content, collaborative mapping initiative, where users can contribute mapping information. This dataset existed already before the project started. However it was only accessible through Wikimapia’s API⁴ and provided in XML or JSON formats. Within GeoKnow, we downloaded several sets of geospatial entities from Wikimapia, including both spatial and non-spatial attributes for each entity and transformed them into RDF data. The process we followed is described next. We considered a set of cities throughout the world (Athens, London, Leipzig, Berlin, New York) and downloaded the whole content provided by Wikimapia regarding the geospatial entities included in those geographical areas. These cities where preferred since they are the base cities of several partners in the project, while the rest two cities were randomly selected, with the aim to reach our target of more than 100000 spatial entities from Wikimapia. Apart from geometries, Wikimapia provided a very rich set of metadata (non-spatial properties) for each entity (e.g. tags and categories describing the geospatial entities, topological relations with nearby entities, comments of the users, etc.). The aforementioned dumps were transformed into RDF triples in a straightforward way: (a) defining intermediate resources (functioning as blank nodes) where information was organized in more than one levels, (b) flattening the information of deeper levels where possible in order to simplify the structure of the dataset and (c) transforming tags into OWL classes. Specifically, we developed a parsing tool to communicate with the Wikimapia API and construct appropriate n-triples from the dataset. The tool takes as input a bounding box in the form of wgs84 coordinates (min long, min lat, max long, max lat). We chose five initial bounding boxes: one for each of the cities mentioned above. The bounding box was defined in such way so that it covered the whole area of the selected city. Each bounding box was then further divided by the tool into a grid of smaller bounding boxes in order to overcome the upper limit per area of the returned entities from Wikimapia API. For each place returned, we transformed all properties into RDF triples. Every tag was assigned an OWL class and an appropriate label, corresponding to the textual description in the initial Wikimapia XML file. Each place became an instance of the classes provided by its tags. For the rest of the returned Wikimapia attributes, we created a custom property in a uniform way for each attribute of the returned Wikimapia XML file. The properties resulting from the Wikimapia XML attributes point to their literal values. For example, we construct properties about each place’s language id, Wikipedia link, URL link, title, description, edit info, location info, global administrative areas, available languages and geometry information. If these attributes follow a deeper tree structure, we assign the properties at intermediate custom nodes by concatenating the property with the place ID; these nodes function as blank nodes and connect the initial entity with a set of properties and the respective values. This process resulted to creating an initial geospatial RDF dataset containing, for each entity, the polygon geometry that represents it, along with a wealth of non-spatial properties of the entity. The dataset contains 102,019 geospatial entities and 4,629,223 triples.
Upon that, in order to create a synthetically interlinked pair of datasets, we split the Wikimapia RDF dataset, duplicating the geometries and dividing them into the two datasets in the following way. For each polygon geometry, we created another point geometry located in the centroid of the polygon and then shifted the point by a random (but bounded) factor⁵. The polygon was left in the first dataset where the point was transferred to the second dataset. The rest of the properties where distributed between the two datasets as follows: The first dataset consists of metadata containing the main information about the Wikimapia places and edit information about users, timestamps, deletion state and editors. The second dataset consists of metadata concerning basic info, location and language information. This way, the two sub-datasets essentially refer to the same Wikimapia entities, differing only in geometric and metadata information. Each of the two sub-datasets contains 102,019 geospatial entities and the first one contains 1,225,049 triples while the second one 4,633,603 triples.

Seven Greek INSPIRE-compliant data themes of Annex I
URL: http://geodata.gov.gr/sparql/

For the INSPIRE to RDF use case, we selected seven data themes from Annex I,that are describes in the Table below. Although all metadata in geodata.gov.gr is fully compatible with INSPIRE regulations, data is not because it has been integrated from several diverse sources, which have rarely followed the proper standards. Thus, due to data variety, provenance, and excessive volume, its transformation into INSPIRE-compliant datasets is a time-consuming and demanding task. The first step was the alignment of the data to INSPIRE Annex I. To this goal, we utilised the Humboldt Alignment Editor, a powerful open-source tool with a graphical interface and a high-level language for expressing custom alignments. Such transformation can be used to turn a non-harmonised data source to an INSPIRE-compliant dataset. It only requires a source schema (an .xsd for the local GML file) and a target one (an .xsd implementing an INSPIRE data schema). As soon as the schema mapping was defined, the source GML data was loaded, and the INSPIRE-aligned GML file was produced. The second step was the transformation into RDF. This process was quite straightforward, provided the set of suitable XSL stylesheets. We developed all these transformations in XSLT 2.0, implementing one parametrised stylesheet per selected data theme. By default, all geometries were encoded in WKT serialisations according to GeoSPARQL.The produced RDF triples were finally loaded and made available in both Virtuoso and Parliament RDF stores, in http://geodata.gov.gr/sparql, as a proof of concept.

INSPIRE Data Theme Greek dataset Number of features Number of triples
[GN] Geographical names Settlements, towns, and localities in Greece. 13 259 304 957
[AU] Administrative units All Greek municipalities after the most recent restructuring (”Kallikratis”). 326 9 454
[AD] Addresses Street addresses in Kalamaria municipality. 10 776 277 838
[CP] Cadastral parcels The building blocks in Kalamaria are used. Data from the official Greek Cadastre are not available through geodata. gov.gr. 965 13 510
[TN] Transport networks Urban road network in Kalamaria. 2 584 59 432
[HY] Hydrography All rivers and waterstreams in Greece. 4299 120 372
[PS] Protected sites All areas of natural preservation in Greece according to the EU Natura 2000 network. 419 10 894

DEER at ESWC 2015

GeoKnow was present at the Extended Semantic Web Conference (which took place in Portoroz, Slovenia) in many ways. In particular, we presented a novel approach for automating RDF dataset transformation and enrichment in the machine learning track of the main conference. The presentation was followed by a constrictive discussion of results and insights plus hints pertaining to further possible applications of the approach.

GeoKnow presented semantic search approach with geospatial context at KNOW@LOD workshop on ESWC

One of the key questions considering the use of linked data is search. Search-driven applications are widely spread, for example in the e-commerce industry or for business information systems. Hence, GeoKnow is also aiming on improving semantic search components, particularly considering geospatial data. Within the GeoKnow consortium the partner Unister — as being active as B2C service provider — is focussing on this topic in the last year of the project.
At the 12th instance of the Extended Semantic Web Conference (ESWC 2015) one result of this research presented in the well-organized 4th Workshop on Knowledge Discovery and Data Mining meets Linked Open Data (KNOW@LOD). The presented paper is named “Computing Geo-Spatial Motives from Linked Data for Search-driven Applications”. Hence, we are considering geo-spatial motives within search queries (e.g., “winter holiday”, “cutural”, “sports activities”) that cannot be answered by a data instance itself but need to interpret the information from the available data to discover relevant regions (e.g., populated places having a sufficient number of cultural hotspots nearby).

GeoKnow presented paper at WASABI workshop on ESWC

Data-driven processes are the focus of the International Workshop on Semantic Web Enterprise Adoption and Best Practice (WASABI 2015) of the 12th Extended Semantic Web Conference (ESWC 2015). At the 3rd instance of the workshop Andreas Both presented on behalf of the GeoKnow consortium the GeoKnow Generator Workbench which is a key outcome of the our project.
The GeoKnow Generator is a stack of components that covers several steps in the Linked Data Lifecycle. These components have been integrated in the Generator Workbench. To demonstrate its usability we also present four different real world use cases where the Generator and Workbench are used, which are dedicated to the verticals e-commerce, logistics, e-government and automotive industry.

OSMRec – Α tool for automatic annotation of spatial entities in OpenStreetMap

GeoKnow has recently introduced OSMRec, a JOSM plugin for automatic annotation of spatial features (entities) into OpenStreetMap.  OSMRec trains on existing OSM data and is able to recommend to users OSM categories, in order to annotate newly inserted spatial entities. This is important for two reasons. First, users may not be familiar with the OSM categories; thus searching and browsing the OSM category hierarchy to find appropriate categories for the entity they wish to insert may often be a time consuming and frustrating process, to the point of users neglecting to add this information. Second, if an already existing category that matches the new entity cannot be found quickly and easily (although it exists), the user may resort instead to using his/her own term, resulting in synonyms that later need to be identified and dealt with.

The category recommendation process takes into account the similarity of the new spatial entities to already existing (and annotated with categories) ones in several levels: spatial similarity, e.g. the number of nodes of the feature’s geometry, textual similarity, e.g. common important keywords in the names of the features and semantic similarity (similarities on the categories that characterize already annotated entities). So, for each level (spatial, textual, semantic) we define and implement a series of training features that represent spatial entities into a multidimensional space. This way, by training the aforementioned models, we are able to correlate the values of the training features with the categories of the spatial entities, and consequently, recommend categories for new features. To this end, we apply multiclass SVM classification, using LIBLINEAR.

The following figure represents a screen of OSMRec within JOSM. The user can select an entity or draw a new entity on the map and ask for recommendations by clicking the “Add Recommendation” button. The recommendation panel opens and the plugin automatically loads the appropriate recommendation model that has previously been trained offline.

6. Recommendation panel

The recommendation panel provides a list with the top-10 recommended categories and the user can select from this list and click “Add and continue”. As a result the selected category is added to the OSM tags. By the time the user adds a new tag at the selected object, a new vector is computed for that OSM instance in order to recalculate the predictions and display an updated list of recommendations (taking into account the previously selected categories/tags, as extra training information). Further, OSMRec provides functionality for allowing the user to combine several recommendation models, based on (a) a selected geographic area, (b) user’s past editing history on OSM and (c) combination of (a) and (b). This way, personalized category recommendations can be provided that take into account the user’s editing history and/or the specific characteristics of a geographic area of OSM.

OSMRec plugin can be downloaded and installed in JOSM following the standard procedure. Detailed implementation information can be found in the following documents:

The GeoKnow Generator Workbench v1.1.0 Release Announcement

To demonstrate GeoKnow software tools we are developing a Workbench that integrates different components to support users in the tasks of generating Linked data out of spatial data. Several tools can be used for transforming, authoring, interlinking and visualising spatial data as linked data. In this post we want to introduce the public release of the GeoKnow Generator Workbench which implements most of the user requirements collected at the beginning of the project and integrates Virtuoso, Limes, TripleGeo, GeoLift, FAGI-gis, Mappify, Facete and Sparqlify.

The Workbench also provides Single Sign On functionality, user and role management and data access control for the different users. The Workbench is comprised of a front-end and back-end implementations. The front-end provides GUIs for software components where a REST API is available (LimesService, GeoLiftService and TripleGeoService). Components that provide their own GUI, are integrated using containers (FAGI-gis, OntoWiki, Mappify, Sparqlify and Virtuoso SPARQL query interface). The front-end also provides GUIs for the administrative features like users and roles management, data source management and graphs management, as well as the Dashboard GUI. The Dashboard provides a visual feedback to the user with the registered jobs and the status of executions. The Workbench back-end provides REST interfaces for management of users, roles, graphs, datasources and batch jobs, for retrieving the system configuration, and for importing RDF data. All system information is stored in Virtuoso RDF store.

Generator Workbench Architecture.

Generator Workbench Architecture.

A more deep description of this workbench can be found in the GeoKnow D1.4.2 Intermediate release of the GeoKnow_Generator. The GeoKnow software, including this Workbench are open source and they are available in github. An easy to install preconfigured versions of all GeoKnow software ara available as Debian packages in the Linked Data Stack Debian repository.

GeoKnow Generator 2nd Year Releases

The second year of GeoKnow has passed and we have several new releases to announce. Among new software tools there are:

FAGI-gis 1.1+rev0
FAGI aims to provide data fusing on geometries of linked entities.
This latest version provides several optimisations that increased
the scalability and efficiency. It also provides a map-based interface
for facilitating the fusion actions through visualisation and
filtering of linked entities.
RDF Data Cube Validation Tool 0.0.1
Validation tool aims to ensure the quality of statistical datasets.
It is based primarily on the integrity constraints defined by the
RDF Data Cube vocabulary, and it can be used to detect violations
of the integrity constraints, identify violating resources, and
fix detected issues. Furthermore, to ensure the proper use of
vocabularies other than the RDF Data Cube vocabulary, it relies
on RDFUnit. It can be configured to work any SPARQL endpoint,
which needs to be writeable in order to perform fix operations.
However, if this is not the case, user is provided with the SPARQL
Update query that provides the fix, so that it can be executed
manually. Main purpose of the tool within the GeoKnow project
is to ensure the quality of input data that is to be processed and
visualized with ESTA-LD.
spring-batch-admin-geoknow 0.0.1
The spring-batch-admin-geoknow is the first version of batch processing
component that functions as the backend of the Workbench’s.

Besides brand new components, there are also new releases also available as Debian packages:

virtuoso-opensource 7.1.0
Virtuoso 7.1.0 includes improvements in the Engine (SQL Relational Tables and RDF Property/Predicate Graphs); Geo-Spatial support; SPARQL compiler; Jena and Sesame provider performance; JDBC Driver; Conductor CA root certificate management; WebDAV; and the Faceted Browser.
linkedgeodata 0.4.2
The LinkedGeoData package contains scripts and mapping files
for converting spatial data from non-RDF (currently relational)
sources to RDF. OpenStreetMap is so far the best covered data
source. Recently, initial support for GADM and Natural Earth were
added.

  • Added an alternative lgd load script which improves
    throughput by inserting data into a different schema first
    followed by a conversion step.
  • Optimized export scripts by using parallel version of pbzip.
  • Added rdfs:isDefinedBy triples providing licence information
    for each resource.
Facete2-tomcat7 0.0.1
Facete2 is a web application for exploring (spatial) data in SPARQL
endpoints. It features faceted browsing, auto detection of relations
to spatial data, export, and customization of which data to
show.

  • Context menus are now available in the result view enabling
    one to conveniently visit resources in other browser
    tabs, create facet constraints from selected items and copy
    values into the clipboard.
  • Improved Facete’s autodetection when counting facets is
    infeasible because of the size of the data
  • Suggestions of resources related to the facet selection that
    can be shown on the map are now sortable by the length
    of the corresponding property path.
facete2-tomcat-common 0.0.1
This package is a helper package and is mainly responsible for
the Facete database setup. There were no significant changes.
sparqlify-tomcat7 0.6.13
This package provides a web admin interface for Sparqlify. The
system supports running different mappings simultaneously under
different context paths. Minor user interface improvements.
sparqlify-tomcat-common 0.6.13
This package is a helper package and is mainly responsible for
the Sparqlify database setup. There were no significant changes.
sparqlify-cli 0.6.13
This package provides a command line interface for Sparqlify.
Sparqlify is an advanced scalable SPARQL-to-SQL rewriter and the
main engine for the LinkedGeoData project.

  • Fixed some bugs that caused the generation of invalid SQL.
  • Added improvements for aggregate functions that make
    Sparqlify work with Facete.
  • Added initial support for Oracle 11g database.
limes-service 0.5
Limes-services updated to the latest LIMES library. The main enhancement
this year was refactoring the service to provide RESTful
interface.
geoknow-generator-ui 1.1.0
First public release of the GeoKnow Generator Workbench extends
the initial prototype by including user and role management,
graph access control management, processing monitoring
within a dashboard.
Deer 0.0.1
GeoLift has been renamed to DEER. The functionalities provided
in GeoLift have been generalised to not only support geospatial,
but generally structured data.

If you need help installing or using components are available as Debian packages in the Linked Data Stack, do not hesitate to join and ask in the linkeddatastack google group.

GeoKnow First Year Benchmark Results

A GeoKnow Benchmarking laboratory has been setup for comparison of benchmark results over the duration of the GeoKnow project. In it current form this takes for the form of original the LOD2 project GeoBench program which has been taken over and adopted for use in the GeoKnow project and available from the GeobenchLab GIT repo.

The current improvements in the GeoBench program are primarily related to the expansion of the benchmark, in order to make it employable not only to RDF data, but to relational data as well. This will open opportunities for a performance comparison between RDF and relational spatial data management  systems.

Below are some comparison results using the planet wide Open Street Map (OSM) datasets hosted in both Virtuoso and PostGIS.

Screenshot 2014-01-20 11.02.07

In summary, the result demonstrate that Virtuoso in both SQL and SPARQL outperformed PostGIS by significant factor. Specifically, all the queries in the power run were executed 33 times slower in PostGIS than in Virtuoso SQL (single server). If we compare PostGIS with Virtuoso SPARQL, the factor will be even greater: 131 for low zoom level queries, 23 for high zoom level queries, and 113 in total. If we correlate Virtuoso SPARQL and SQL (single server), we will conclude that the relational version is slower almost 4 times on low zoom level queries, while it is faster 23% on high zoom level. In total, SQL version is slower more than 3 times.

The full  descriptions and results of these benchmarks can be found in the GeoKnow D1.3.2 - Continuous Report on Performance Evaluation deliverable.