The GeoKnow Generator Workbench v1.1.0 Release Announcement

To demonstrate GeoKnow software tools we are developing a Workbench that integrates different components to support users in the tasks of generating Linked data out of spatial data. Several tools can be used for transforming, authoring, interlinking and visualising spatial data as linked data. In this post we want to introduce the public release of the GeoKnow Generator Workbench which implements most of the user requirements collected at the beginning of the project and integrates Virtuoso, Limes, TripleGeo, GeoLift, FAGI-gis, Mappify, Facete and Sparqlify.

The Workbench also provides Single Sign On functionality, user and role management and data access control for the different users. The Workbench is comprised of a front-end and back-end implementations. The front-end provides GUIs for software components where a REST API is available (LimesService, GeoLiftService and TripleGeoService). Components that provide their own GUI, are integrated using containers (FAGI-gis, OntoWiki, Mappify, Sparqlify and Virtuoso SPARQL query interface). The front-end also provides GUIs for the administrative features like users and roles management, data source management and graphs management, as well as the Dashboard GUI. The Dashboard provides a visual feedback to the user with the registered jobs and the status of executions. The Workbench back-end provides REST interfaces for management of users, roles, graphs, datasources and batch jobs, for retrieving the system configuration, and for importing RDF data. All system information is stored in Virtuoso RDF store.

Generator Workbench Architecture.

Generator Workbench Architecture.

A more deep description of this workbench can be found in the GeoKnow D1.4.2 Intermediate release of the GeoKnow_Generator. The GeoKnow software, including this Workbench are open source and they are available in github. An easy to install preconfigured versions of all GeoKnow software ara available as Debian packages in the Linked Data Stack Debian repository.

GeoKnow Generator 2nd Year Releases

The second year of GeoKnow has passed and we have several new releases to announce. Among new software tools there are:

FAGI-gis 1.1+rev0
FAGI aims to provide data fusing on geometries of linked entities.
This latest version provides several optimisations that increased
the scalability and efficiency. It also provides a map-based interface
for facilitating the fusion actions through visualisation and
filtering of linked entities.
RDF Data Cube Validation Tool 0.0.1
Validation tool aims to ensure the quality of statistical datasets.
It is based primarily on the integrity constraints defined by the
RDF Data Cube vocabulary, and it can be used to detect violations
of the integrity constraints, identify violating resources, and
fix detected issues. Furthermore, to ensure the proper use of
vocabularies other than the RDF Data Cube vocabulary, it relies
on RDFUnit. It can be configured to work any SPARQL endpoint,
which needs to be writeable in order to perform fix operations.
However, if this is not the case, user is provided with the SPARQL
Update query that provides the fix, so that it can be executed
manually. Main purpose of the tool within the GeoKnow project
is to ensure the quality of input data that is to be processed and
visualized with ESTA-LD.
spring-batch-admin-geoknow 0.0.1
The spring-batch-admin-geoknow is the first version of batch processing
component that functions as the backend of the Workbench’s.

Besides brand new components, there are also new releases also available as Debian packages:

virtuoso-opensource 7.1.0
Virtuoso 7.1.0 includes improvements in the Engine (SQL Relational Tables and RDF Property/Predicate Graphs); Geo-Spatial support; SPARQL compiler; Jena and Sesame provider performance; JDBC Driver; Conductor CA root certificate management; WebDAV; and the Faceted Browser.
linkedgeodata 0.4.2
The LinkedGeoData package contains scripts and mapping files
for converting spatial data from non-RDF (currently relational)
sources to RDF. OpenStreetMap is so far the best covered data
source. Recently, initial support for GADM and Natural Earth were
added.

  • Added an alternative lgd load script which improves
    throughput by inserting data into a different schema first
    followed by a conversion step.
  • Optimized export scripts by using parallel version of pbzip.
  • Added rdfs:isDefinedBy triples providing licence information
    for each resource.
Facete2-tomcat7 0.0.1
Facete2 is a web application for exploring (spatial) data in SPARQL
endpoints. It features faceted browsing, auto detection of relations
to spatial data, export, and customization of which data to
show.

  • Context menus are now available in the result view enabling
    one to conveniently visit resources in other browser
    tabs, create facet constraints from selected items and copy
    values into the clipboard.
  • Improved Facete’s autodetection when counting facets is
    infeasible because of the size of the data
  • Suggestions of resources related to the facet selection that
    can be shown on the map are now sortable by the length
    of the corresponding property path.
facete2-tomcat-common 0.0.1
This package is a helper package and is mainly responsible for
the Facete database setup. There were no significant changes.
sparqlify-tomcat7 0.6.13
This package provides a web admin interface for Sparqlify. The
system supports running different mappings simultaneously under
different context paths. Minor user interface improvements.
sparqlify-tomcat-common 0.6.13
This package is a helper package and is mainly responsible for
the Sparqlify database setup. There were no significant changes.
sparqlify-cli 0.6.13
This package provides a command line interface for Sparqlify.
Sparqlify is an advanced scalable SPARQL-to-SQL rewriter and the
main engine for the LinkedGeoData project.

  • Fixed some bugs that caused the generation of invalid SQL.
  • Added improvements for aggregate functions that make
    Sparqlify work with Facete.
  • Added initial support for Oracle 11g database.
limes-service 0.5
Limes-services updated to the latest LIMES library. The main enhancement
this year was refactoring the service to provide RESTful
interface.
geoknow-generator-ui 1.1.0
First public release of the GeoKnow Generator Workbench extends
the initial prototype by including user and role management,
graph access control management, processing monitoring
within a dashboard.
Deer 0.0.1
GeoLift has been renamed to DEER. The functionalities provided
in GeoLift have been generalised to not only support geospatial,
but generally structured data.

If you need help installing or using components are available as Debian packages in the Linked Data Stack, do not hesitate to join and ask in the linkeddatastack google group.

Spatial Data on the Web

On January 6, 2015 the W3C has officially launched the Open Geospatial Consortium (OGC) and the W3C working group for spatial data. Ontos has voted and supported the charter on behalf of the GeoKnow project. This is the result of the joint workshop held in March 2014 together with the SmartOpenData project. The GeoKnow team is looking forward to support this activity and future standards for spatial data on the web. As an example, the GeoKnow team (represented by Valentina Janev and Jens Lehmann) co-organises a spatial data session with W3C (represented by Phil Archer) at the ICIST conference in March:  http://www.yuinfo.org/icist2015/icist_odagis.html.

More information about the OGC can be found on http://www.opengeospatial.org.
W3C News here http://www.w3.org/blog/news/archives/4287
W3C press release http://www.w3.org/2015/01/spatial.html.en
The workshop about linking geospatial data http://www.w3.org/2014/03/lgd/
The W3C working group for spatial data: https://www.w3.org/2015/spatial/wiki/Main_Page

Ontos starts project at SECO using GeoKnow Generator

Ontos was selected as an implementation partner at SECO to implement a linked data stack platform. Based on the GeoKnow Generator Ontos will develop a data management and search platform that will allow the management of linked open government data. The GeoKnow generator will used as the backend system that orchestrates the various tools. In a first version triplification of data and the interlinking will be implemented.

The Linked Data Stack

Screen Shot 2014-11-19 at 15.43.15

The Linked Data Stack aims to simplify the deployment and distribution of tools that support the Linked Data life cycle. Moreover, it eases the information flow between components to enhance the end-user experience while harmonising the look and feel. It comprises a number of tools for managing the life-cycle of Linked Data. At the moment it consists of two software repositories for distributing Linked Data Software components to the developer communities: 1.)  A Debian repository that provides installers of components where users can directly install them on Linux servers using the standard packaging tools. 2.) And a Maven repository for managing binary software components used for developing, deploying and provisioning. 

The Linked Data Stack has been the result of the LOD2 EU project efforts, and now the GeoKnow team has officially became the manager of the Linked Data Stack. This announcement took place in the 10th International Conference on Semantic Systems held the 4th and 5th of September 2014 in Leipzig.

If you are a Linked Data User, visit the Linked Data Stack where you can find instructions on how to install and use the demonstrations and documentation for installing specific components. If you want to contribute to the stack with your software, you can find also guidelines how to contribute.

GeoLD Workshop

The GeoKnow team has organised the first international workshop on Geospatial Linked Data. The workshop was part of the SEMANTICS 2014conference that toke place in Leipzig, Germany. On behalf of GeoKnow Ontos acted as the sponsor for the GeoLD workshop and coordinated the call for papers, the final agenda and invited some of the guest speakers. Jens Lehman (AKSW) has made the welcome and Phil Archer from W3C reported on the progress towards a joint W3C/OGC working group. Further Matthias Wauer (Unister) and Claus Stadler (AKSW) presented the GeoKnow tools. More about the GeoLD workshop can be found on http://geold.geoknow.eu/.

BwbfNGbIgAEwCvh BwbmN3-IAAI08m1

GeoKnow Plenary Meeting Belgrade

The GeoKnow team meets in Belgrade for the plenary meeting. During the 2 days the team discusses the achievements since the 1 year review meeting. Besides the ongoing improvement of the various tools the team discusses the topic of benchmarking and quality assessment. A key focus of benchmarking is on Virtuoso store, Facete and Mappify, LIMES, FAGI, GeoLift and TripleGeo. Results of the benchmarks will be published on https://github.com/GeoKnow/GeoBenchLab.

On the second day and the break-out sessions each individual work-package was thoroughly discussed and next steps were defined. Some of the findings are:
- Dashboard requirements and batch processing
- Parallelisation of LIMES process
- Notification and subscription service
- Mobile version for smart phones and tablets
- More free datasets that can be used for the use cases

IMG_3253 IMG_3243

W3C Swiss Day and GeoKnow

Ontos is the W3C Switzerland representative and presents at the W3C Swiss Day the result of the GeoKnow project. Approximately 30 people attend the event that takes place in Fribourg, Switzerland. Daniel Hladky shows the GeoKnow generator and tools during the talk of “Linked Open Data”. Based on the online demo server a simple scenario is shown in order to attract people and customers to use the result of the GeoKnow project. For more details about the event visit the event home page at http://www.ontos.com/web-25-celebrating-25-years-of-the-web/.

Linked Geospatial Data 2014 Workshop, Part 4: GeoKnow, London, Brussels, The Message

Last Friday (2014-03-14) I (Orri Erling) gave a talk about GeoKnow at the EC Copernicus Big Data workshop. This was a trial run for more streamlined messaging. I have, aside the practice of geekcraft, occupied myself with questions of communication these last weeks.

The clear take-home from London and Brussels alike is that these events have full days and 4 or more talks an hour. It is not quite TV commercial spots yet but it is going in this direction.

If you say something complex, little will get across unless the audience already knows what you will be saying.

I had a set of slides from Jens Lehmann, the GeoKnow project coordinator, for whom I was standing in. Now these are a fine rendition of the description of work. What is wrong with partners, work packages, objectives, etc? Nothing, except everybody has them.

I recall the old story about the journalist and the Zen master: The Zen master repeatedly advises the reporter to cut the story in half. We get the same from PR professionals, “If it is short, they have at least thought about what should go in there,” said one recently, talking of pitches and messages. The other advice was to use pictures. And to have a personal dimension to it.

Enter “Ms. Globe” and “Mr. Cube”. Frans Knibbe of Geodan gave the Linked Geospatial Data 2014 workshop’s most memorable talk entitled “Linked Data and Geoinformatics – a love story” (pdf)about the excitement and the pitfalls of the burgeoning courtship of Ms. Globe (geoinformatics) and Mr. Cube (semantic technology). They get to talking, later Ms. Globe thinks to herself… “Desiloisazation, explicit semantics, integrated metadata…” Mr. Cube, young upstart now approaching a more experienced and sophisticated lady, dreams of finally making an entry into adult society, “critical mass, global scope, relevant applications…” There is a vibration in the air.

So, with Frans Knibbe‘s gracious permission I borrowed the storyline and some of the pictures.

We ought to make a series of cartoons about the couple. There will be twists and turns in the story to come.Mr. Cube is not Ms. Globe’s first lover, though; there is also rich and worldly Mr. Table. How will Mr. Cube prove himself? The eternal question… Well, not by moping around, not by wise-cracking about semantics, no. By boldly setting out upon a journey to fetch the Golden Fleece from beyond the crashing rocks. “Column store, vectored execution, scale out, data clustering, adaptive schema…” he affirms, with growing confidence.

This is where the story stands, right now. Virtuoso run circles around PostGIS doing aggregations and lookups on geometries in a map-scrolling scenario (GeoKnow’s GeoBenchLab). VirtuosoSPARQL outperforms PostGIS SQL against planet-scale OpenStreetMap; Virtuoso SQL goes 5-10x faster still.

Mr Cube is fast on the draw, but still some corners can be smoothed out.

Later in GeoKnow, there will be still more speed but also near parity between SQL and SPARQL via taking advantage of data regularity in guiding physical storage. If it is big, it is bound to have repeating structure.

The love story grows more real by the day. To be consummated still within GeoKnow.

Talking of databases has the great advantage that this has been a performance game from the start. There are few people who need convincing about the desirability of performance, as this also makes for lower cost and more flexibility on the application side.

But this is not all there is to it.

In Brussels, the public was about E-science (Earth observation). In science, it is understood that qualitative aspects can be even more crucial. I told the story about an E-science-oriented workshop I attended in America years ago. The practitioners, from high energy physics to life sciences to climate, had invariably come across the need for self-description of data and for schema-last. This was essentially never provided by RDF, except for some life science cases. Rather, we had one-off schemes, ranging from key-value pairs to putting the table name in a column of the same table to preserve the origin across data export.

Explicit semantics and integrated metadata are important, Ms. Globe knows, but she cannot sacrifice operational capacity for this. So it is more than a DBMS or even data model choice — there must be a solid tool chain for data integration and visualization. GeoKnow provides many tools in this space.

Some of these, such as the LIMES entity matching framework (pdf) are probably close to the best there is. For other parts, the SQL-based products with hundreds of person years invested in user interaction are simply unbeatable.

In these cases, the world can continue to talk SQL. If the regular part of the data is in fact tables already, so much the better. You connect to Virtuoso via SQL, just like to PostGIS or Oracle Spatial, and talk SQL MM. The triples, in the sense of flexible annotation and integrated metadata, stay there; you just do not see them if you do not want them.

There are possibilities all right. In the coming months I will showcase some of the progress, starting with a detailed look at the OpenStreetMap experiments we have made in GeoKnow.

Linked Geospatial Data 2014 Workshop posts: