Category Archives: News

Linked Open Data Switzerland at SWBI2015

Daniel Hladky from Ontos presented GeoKnow at the SWBI2015 conference two talks.

The first talk was the keynote on October 7, 2015 with the title “Linked Data Service (LINDAS): Status quo of the Linked Data life-cycle and lessons learned“. Within this keynote an introduction was given to the LOD2 based linked data life-cycle and the LINDAS platform. The LINDAS system is based on the GeoKnow Generator tool that was developed by Ontos during the GeoKnow project. At the end of the talk an outlook was given on future developments such as an improved natural language processing system based on neural networks and the new visualisation dashboards for RDF data.

The second talk on October 8, 2015 was part of the Linked Data Switzerland workshop. The focus on this talk was to set the stage of Linked Open Data in Switzerland using the LINDAS platform. Further the various participants discussed issues that have to be solved. For example how to build an linked data economy that publishes as many as possible datasets and how to motivate companies and individuals to start to develop new applications base don the datasets.

GeoKnow at Semantics 2015, Vienna

Several partners of GeoKnow were present this year at the Semantics conference 2015.
The previous day of the conference we organised a workshop about the work done during these last three years in GeoKnow.
In the conference, three papers with GeoKnow acknowledgement were presented:

  • Integrating custom index extensions into Virtuoso RDF store for E-Commerce applications, presented by Matthias Wauer,
  • An Optimization Approach for Load Balancing in Parallel Link Discovery presented by Mohamed Ahmed Sherif, and
  • Data Licensing on the Cloud – Empirical Insights and Implications for Linked Dat, presented by Ivan Emilov

And two posters in the posters sessions:

  • The GeoKnow Generator Workbench – An Integrated Tool Supporting the Linked Data Lifecycle for Enterprise Usage, and
  • RDF Editing on the Web

Moreover, the GeoKnow team was demonstrating tools and the Workbench at the Booth reserved for us. It was a nice experience and good opportunity to share our work and to see other peoples projects.

conference

The GeoKnow Generator Workbench v1.1.0 Release Announcement

To demonstrate GeoKnow software tools we are developing a Workbench that integrates different components to support users in the tasks of generating Linked data out of spatial data. Several tools can be used for transforming, authoring, interlinking and visualising spatial data as linked data. In this post we want to introduce the public release of the GeoKnow Generator Workbench which implements most of the user requirements collected at the beginning of the project and integrates Virtuoso, Limes, TripleGeo, GeoLift, FAGI-gis, Mappify, Facete and Sparqlify.

The Workbench also provides Single Sign On functionality, user and role management and data access control for the different users. The Workbench is comprised of a front-end and back-end implementations. The front-end provides GUIs for software components where a REST API is available (LimesService, GeoLiftService and TripleGeoService). Components that provide their own GUI, are integrated using containers (FAGI-gis, OntoWiki, Mappify, Sparqlify and Virtuoso SPARQL query interface). The front-end also provides GUIs for the administrative features like users and roles management, data source management and graphs management, as well as the Dashboard GUI. The Dashboard provides a visual feedback to the user with the registered jobs and the status of executions. The Workbench back-end provides REST interfaces for management of users, roles, graphs, datasources and batch jobs, for retrieving the system configuration, and for importing RDF data. All system information is stored in Virtuoso RDF store.

Generator Workbench Architecture.

Generator Workbench Architecture.

A more deep description of this workbench can be found in the GeoKnow D1.4.2 Intermediate release of the GeoKnow_Generator. The GeoKnow software, including this Workbench are open source and they are available in github. An easy to install preconfigured versions of all GeoKnow software ara available as Debian packages in the Linked Data Stack Debian repository.

GeoKnow Generator 2nd Year Releases

The second year of GeoKnow has passed and we have several new releases to announce. Among new software tools there are:

FAGI-gis 1.1+rev0
FAGI aims to provide data fusing on geometries of linked entities.
This latest version provides several optimisations that increased
the scalability and efficiency. It also provides a map-based interface
for facilitating the fusion actions through visualisation and
filtering of linked entities.
RDF Data Cube Validation Tool 0.0.1
Validation tool aims to ensure the quality of statistical datasets.
It is based primarily on the integrity constraints defined by the
RDF Data Cube vocabulary, and it can be used to detect violations
of the integrity constraints, identify violating resources, and
fix detected issues. Furthermore, to ensure the proper use of
vocabularies other than the RDF Data Cube vocabulary, it relies
on RDFUnit. It can be configured to work any SPARQL endpoint,
which needs to be writeable in order to perform fix operations.
However, if this is not the case, user is provided with the SPARQL
Update query that provides the fix, so that it can be executed
manually. Main purpose of the tool within the GeoKnow project
is to ensure the quality of input data that is to be processed and
visualized with ESTA-LD.
spring-batch-admin-geoknow 0.0.1
The spring-batch-admin-geoknow is the first version of batch processing
component that functions as the backend of the Workbench’s.

Besides brand new components, there are also new releases also available as Debian packages:

virtuoso-opensource 7.1.0
Virtuoso 7.1.0 includes improvements in the Engine (SQL Relational Tables and RDF Property/Predicate Graphs); Geo-Spatial support; SPARQL compiler; Jena and Sesame provider performance; JDBC Driver; Conductor CA root certificate management; WebDAV; and the Faceted Browser.
linkedgeodata 0.4.2
The LinkedGeoData package contains scripts and mapping files
for converting spatial data from non-RDF (currently relational)
sources to RDF. OpenStreetMap is so far the best covered data
source. Recently, initial support for GADM and Natural Earth were
added.

  • Added an alternative lgd load script which improves
    throughput by inserting data into a different schema first
    followed by a conversion step.
  • Optimized export scripts by using parallel version of pbzip.
  • Added rdfs:isDefinedBy triples providing licence information
    for each resource.
Facete2-tomcat7 0.0.1
Facete2 is a web application for exploring (spatial) data in SPARQL
endpoints. It features faceted browsing, auto detection of relations
to spatial data, export, and customization of which data to
show.

  • Context menus are now available in the result view enabling
    one to conveniently visit resources in other browser
    tabs, create facet constraints from selected items and copy
    values into the clipboard.
  • Improved Facete’s autodetection when counting facets is
    infeasible because of the size of the data
  • Suggestions of resources related to the facet selection that
    can be shown on the map are now sortable by the length
    of the corresponding property path.
facete2-tomcat-common 0.0.1
This package is a helper package and is mainly responsible for
the Facete database setup. There were no significant changes.
sparqlify-tomcat7 0.6.13
This package provides a web admin interface for Sparqlify. The
system supports running different mappings simultaneously under
different context paths. Minor user interface improvements.
sparqlify-tomcat-common 0.6.13
This package is a helper package and is mainly responsible for
the Sparqlify database setup. There were no significant changes.
sparqlify-cli 0.6.13
This package provides a command line interface for Sparqlify.
Sparqlify is an advanced scalable SPARQL-to-SQL rewriter and the
main engine for the LinkedGeoData project.

  • Fixed some bugs that caused the generation of invalid SQL.
  • Added improvements for aggregate functions that make
    Sparqlify work with Facete.
  • Added initial support for Oracle 11g database.
limes-service 0.5
Limes-services updated to the latest LIMES library. The main enhancement
this year was refactoring the service to provide RESTful
interface.
geoknow-generator-ui 1.1.0
First public release of the GeoKnow Generator Workbench extends
the initial prototype by including user and role management,
graph access control management, processing monitoring
within a dashboard.
Deer 0.0.1
GeoLift has been renamed to DEER. The functionalities provided
in GeoLift have been generalised to not only support geospatial,
but generally structured data.

If you need help installing or using components are available as Debian packages in the Linked Data Stack, do not hesitate to join and ask in the linkeddatastack google group.

Spatial Data on the Web

On January 6, 2015 the W3C has officially launched the Open Geospatial Consortium (OGC) and the W3C working group for spatial data. Ontos has voted and supported the charter on behalf of the GeoKnow project. This is the result of the joint workshop held in March 2014 together with the SmartOpenData project. The GeoKnow team is looking forward to support this activity and future standards for spatial data on the web. As an example, the GeoKnow team (represented by Valentina Janev and Jens Lehmann) co-organises a spatial data session with W3C (represented by Phil Archer) at the ICIST conference in March:  http://www.yuinfo.org/icist2015/icist_odagis.html.

More information about the OGC can be found on http://www.opengeospatial.org.
W3C News here http://www.w3.org/blog/news/archives/4287
W3C press release http://www.w3.org/2015/01/spatial.html.en
The workshop about linking geospatial data http://www.w3.org/2014/03/lgd/
The W3C working group for spatial data: https://www.w3.org/2015/spatial/wiki/Main_Page

The Linked Data Stack

Screen Shot 2014-11-19 at 15.43.15

The Linked Data Stack aims to simplify the deployment and distribution of tools that support the Linked Data life cycle. Moreover, it eases the information flow between components to enhance the end-user experience while harmonising the look and feel. It comprises a number of tools for managing the life-cycle of Linked Data. At the moment it consists of two software repositories for distributing Linked Data Software components to the developer communities: 1.)  A Debian repository that provides installers of components where users can directly install them on Linux servers using the standard packaging tools. 2.) And a Maven repository for managing binary software components used for developing, deploying and provisioning. 

The Linked Data Stack has been the result of the LOD2 EU project efforts, and now the GeoKnow team has officially became the manager of the Linked Data Stack. This announcement took place in the 10th International Conference on Semantic Systems held the 4th and 5th of September 2014 in Leipzig.

If you are a Linked Data User, visit the Linked Data Stack where you can find instructions on how to install and use the demonstrations and documentation for installing specific components. If you want to contribute to the stack with your software, you can find also guidelines how to contribute.

How can GeoKnow help you?

At this stage of the GeoKnow project, we are shaping our dissemination and exploitation strategy. In order to stay as close as possible to the needs of our potential users, we have created a survey to find out how we can help your business most.

If you work with geospatial data, your participation would be greatly appreciated. The survey will take no longer than 10 minutes to complete. There is a little incentive as well: If you leave your email address, you will be automatically entered to win one of three Amazon vouchers worth 50 Euro.

Please click on this link to take part: GeoKnow exploitation plan survey

GeoKnow Athens Meeting

In the last days of July, the second meeting of GeoKnow project took place in Athens. GeoKnow members had the opportunity to meet again after the Leipzig kick-off meeting, discuss the work performed during the first 7 months of the project, as well as fix the next steps. Apart from that, our fellow partners had the chance to strall around some of the most historic and picturesque sites of Athens, like the old and the new parliament buildings, the National University of Athens, the Temple of Olympian Zeus, the Monastiraki and Plaka quarters and Acropolis, and taste some of the most iconic Greek dishes!

The first part of the meeting focused on performed work. Advances from each Work Package were presented, feeding discussions about (a) integrating currently developed tools, (b) utilizing these tools to manage/process use case datasets and (c) resolving research issues that had come up and enhancing the functionality and efficiency of the developed solutions. All partners agreed that the project advanced significantly since December, as the first tools for managing and processing geospatial, RDF data have been already developed, and very informative reports about the state of the art on geospatial and RDF data management, benchmarking and system requirements have been published as well.

20130726_155709

Several discussions, during both meeting days, revolved around the system architecture and, specifically, the GeoKnow Generator (GKG). Concrete decisions were made about the GKG backend that will include a set of loosely integrated components for consuming, processing and exposing geospatial, Linked Data, based on Virtuoso RDF store. We also considered issues regarding user management, GKG’s front end, workflow processing and implementation of gathered system and user requirements.

With respect to geospatial information management, where GeoKnow has already provided solutions for transforming and exposing conventional geospatial data into RDF data (Sparqlify, LinkedGeoData, TripleGeo), all partners agreed that there is the potential to build (based on the work performed in Tasks 2.1 and 1.3) a timely geospatial RDF benchmark, that will be able to test efficiency and functionality capabilities of today’s RDF stores with geospatial support. Also, next steps were discussed, with emphasis on further optimizing the geospatial query capabilities of the underlying RDF store (Virtuoso).

As far as semantic integration of geospatial data is concerned, tools developed within Geoknow for enriching and interlinking geospatial RDF data (GeoLift, LIMES), were, at first, presented to the consortium. These tools triggered further discussions about the fusion and aggregation solutions currently under development and design, as well as how these tools can directly be tested and utilized into processing commercial datasets from the use case partners. Finally, a large part of our discussions was dedicated to quality measures and quality assessment of geospatial data; although these tasks are due to later periods in the project, they are of high importance for all the functionality being built in Work Package 3, since quality indicators of datasets can constitute valuable input for processing such as interlinking and fusion.

After the presentation of (implemented or under development) GeoKnow tools for visualization and authoring of geospatial RDF data, such as Facete, creative ideas were exchanged, discussing both detailed technical implementation solutions and desired functionality for end users. Some important aspects that were considered are the implementation and functionality of spatial authoring, the issue of public and spatial Linked Data co-evolution and the potential for spatial-social networking. Again, the discussions considered the use case scenarios of the project, that is, how the offered functionality can serve commercial and industrial Linked Data management and visualization needs.

In conclusion, during the GeoKnow meeting in Athens all partners exchanged interesting ideas about ongoing and future work and set more concrete objectives to achieve through the next months. We thank all the GeoKnow members for attending and contributing to this constructive meeting!

Virtual Machines of geospatial RDF stores

For our evaluation of current geospatial RDF stores (for which you can read more here), we have set up five (5) pre-built Virtual Machines. Each one contains a working installation of one of these geospatial RDF stores (in Debian 6.x “squeeze” OS):

The VMs are available at: ftp://guest@geoknow-server.imis.athena-innovation.gr/ (username:guest, password: be-my-guest).

The images are normal XEN domU disk images, which can be used to create a functional domU guest. Of course, you should also provide:

  • your own guest configuration file at /etc/xen/<the-vm>.cfg (or wherever you have chosen to configure your domU guests)
  • a swap image file

Use root:root as the root password for your newly created machine (and of course, change it just after your first login). After creating the machine, you should login with the serial console (eg. xm console <the-vm>) and:

  • configure your network interfaces to adapt to your local network
  • configure your hostname and your /etc/hosts file
  • restart your network interfaces

We have also uploaded at the same ftp location the dataset we used for our evaluation: OpenStreetMap data for Great Britain covering England, Scotland, and Wales (as of 05/03/2013) in ESRI shapefile format. Of the available OSM layers, only those concerning road network (roads), points of interest (points) and natural parks and waterbodies (natural) were actually utilized. From each original layer, only the most important attributes were retained: shape, name, osm_id, type.

You can find more information on how we used these VMs for our evaluation, in the public deliverable D2.1.1 ‘Market and Research Overview’.

Please enjoy!

Geospatial RDF stores: where do we stand?

Our goal in GeoKnow is to bring geospatial data in the Linked Data Web. Among others, our work will provide the means for efficiently managing and querying geospatial data in RDF stores.

But how can we measure our success? And what is the yard-stick we will compare ourselves to?

While many popular RDF stores advertise geospatial capabilities, we found little or no quantitative information in the relevant literature regarding the following:

  • Conformance to GeoSPARQL. Given GeoSPARQL’s status as an OGC standard and the ongoing work of various research groups regarding its integration into other Semantic Web technologies, there is a clear need for examining GeoSPARQL-compliance for RDF stores. In the highly standards-driven domain of GIS, conformance with open standards is a highly advertised and desirable property for open and proprietary software alike. End users select products based on features which have been validated under open tests that can be validated by anyone. In order for GeoSPARQL to reach the mainstream and be actively adopted by the Semantic Web and GIS community alike, we need a similar open process for compliance testing of the various products and systems
  • Performance under realistic geospatial workloads. Although there are several acknowledged benchmarks for evaluating RDF stores (e.g. BSBM), the only benchmark handling geospatial RDF is SSWB. SSWB, while it includes some important geospatial query types, is based on synthetic data and evaluates only one aspect of performance (query duration). There is a clear lack of a feature complete benchmark that is based on realistic data and workloads, in order to measure and highlight both the novel capabilities geospatial RDF stores provide, but also the support for typical everyday applications of geospatial data management.
  • Comparison with geospatial RDBMSs. Geospatial databases are used across the globe by a diverse community that produces, queries, and analyzes geospatial data. The performance, integration capabilities, and overall maturity of geospatial RDBMSs form the natural baseline of the GIS community regarding data management. Geospatial RDF stores will be compared along these lines. Of course they also provide functionalities that typical GIS and RDBMSs cannot support. This is something we can easily advertise and convince the GIS community about. However, we have no answers on what are the side-effects of applying a geospatial RDF store in a real-world setting. They might never be able to reach the native performance of an RDBMS. However, is that lack of speed balanced by increased interoperability? Similarly for scaling, compatibility with existing tools, etc. Potential users would like accurate information, enabling them to reach informed decisions regarding these trade-offs.

We feel that since incorporating geospatial capabilities into RDF stores is a subject of ongoing research and commercial efforts, it is important for the community to have access to a concrete methodology that describes the required resources and steps for objectively and realistically evaluating geospatial RDF stores.

With these goals, we introduced a benchmarking methodology for evaluating geospatial RDF stores, which will form the basis for a formal, feature-complete benchmark in the future.

Based on this methodology, Athena completed a first-cut evaluation of five RDF stores with geospatial support: Virtuoso Universal Server, Parliament, uSeekM, OWLIM-SE and Strabon. Further, we compared these RDF stores with two prominent geospatial RDBMSs: Oracle Spatial and PostGIS.

  • For our evaluation we used real geospatial data, and specifically, OpenStreetMap data for Great Britain covering England, Scotland, and Wales (as of 05/03/2013).
  • We transformed this dataset into RDF triples (~26M) using TripleGeo.
  • We tested a total of 21 SPARQL queries, covering several geospatial features: location queries, range queries, spatial joins, nearest neighbor queries and spatial aggregates. For each query we created a proper representation that would be compatible with each RDF store.
  • In parallel, we tested the same set of queries on the geospatial RDBMSs.
  • Apart from query times, we also measured other quantities, such as number of loaded triples, loading time and index creation time.
  • We evaluated the conformance of each RDF store to the GeoSPARQL standard.

A short description of the test queries used is given in the table below:

Our evaluation surfaced several interesting issues (some already known and others newly found), as well as valuable intuition and directions towards improving the functionality of geospatially-enabled RDF Stores.

  • First of all, it became clear that considerable effort from industry and academia is needed for adding baseline functionality on current RDF stores, regarding both the geospatial features and the geospatial standards they support. For example, some stores support only point geometries, while only a few of them support the GeoSPARQL standard.
  • Another important issue is the efficiency of current RDF stores, concerning loading, indexing and querying times, as compared to RDBMS’s.
  • Finally, several enhancements regarding both functionality and efficiency comprise very interesting research issues that can leverage the current performance of RDF stores, establishing them as first class research and industrial data management systems.

Detailed results, as well as a thorough survey of current geospatial and semantic standards can be found in our public deliverable D2.1.1 ‘Market and Research Overview’.

You can also find more information regarding our testing environment in this post. The complete set of pre-built Virtual Machines, queries, and data sets we used are are also freely available for download.