Category Archives: GeoKnow Generator

EDF2015 and Linked Data Europe: Big Geospatial Data Workshop

In 2015, the European Data Forum took place in Luxembourg on the 16th and 17th November. GeoKnow team had the pleasure to be present at the event with a booth for showing GeoKnow results. The conference welcomed over 700 participants from industry, research, policy makers, and community initiatives form all over Europe.

Our representrs at the EDF2015

Our representers at the EDF2015

The day after the conference we participated at the Linked Data Europe Workshop, that was organized by IQmulus, GeoKnow, LEO and MELODIES teams. Jens Lehmann of the University in Leipzig and Jonas Schulz from Ontos AG demonstrated our GeoKnow workbench, talked about the tools in our Linked Data Stack and had insights into other projects with the scope of Linked Geo Data and Big Data. Overall 10 projects were presented and the workshop ended with an informative discussion about Linked Geo Data tools replacing or extending existing GIS solutions.

Thanks to everyone, who organized the EDF and the workshop.

FAGI-gis: fusing geospatial RDF data

GeoKnow introduces the latest version of FAGI-gis, a framework for fusing Linked Data, that focuses on the geospatial properties of the linked entities. FAGI-gis receives as input two datasets (through SPARQL endpoints) and a set of links that interlink entities between the datasets and produces a new dataset where each pair of linked entities is fused into a single entity. Fusion is performed for each pair of matched properties between two linked entities, according to a selected fusion action, and considers both spatial and non-spatial properties.

The tool supports an interactive interface, offering visualization of the data at every part of the process. Especially for spatial data, it provides map previewing and graphical manipulation of geometries. Further, it provides advanced fusion functionality through batch mode fusion, clustering of links, link discovery/creation, property matching, property creation, etc.

As the first step of the fusion workflow, the tool allows the user to select and filter the interlinked entities  (using the Classes they belong to or SPARQL queries) to be loaded for further fusion. Then, at the schema matching step, a semi-automatic process facilitates the mapping of entity properties from one dataset to the other. Finally, the fusion panel allows the map-based manipulation of geometries, and the selection from a set of fusion actions in order to produce a final entity, where each pair of matched properties are fused according to the most suitable action.

The above process can be enriched with several advanced fusion facilities. The user is able to cluster the linked entities according to the way they are interlinked, so as to handle with different fusion actions, different clusters of linked entities. Moreover, the user can load unlinked entities and be recommended candidate entities to interlink. Finally, training on past fusion actions and on OpenStreetMap data, FAGI-gis is able to recommend suitable fusion actions and OSM Categories (Classes) respectively, for pairs of fused entities.

The FAGI-gis is provided as free software and its current version is available from GitHub. An introductory user guide is also available. More detailed information on FAGI-gis is provided int he following documents:


 

GeoKnow presented paper at WASABI workshop on ESWC

Data-driven processes are the focus of the International Workshop on Semantic Web Enterprise Adoption and Best Practice (WASABI 2015) of the 12th Extended Semantic Web Conference (ESWC 2015). At the 3rd instance of the workshop Andreas Both presented on behalf of the GeoKnow consortium the GeoKnow Generator Workbench which is a key outcome of the our project.
The GeoKnow Generator is a stack of components that covers several steps in the Linked Data Lifecycle. These components have been integrated in the Generator Workbench. To demonstrate its usability we also present four different real world use cases where the Generator and Workbench are used, which are dedicated to the verticals e-commerce, logistics, e-government and automotive industry.

The GeoKnow Generator Workbench v1.1.0 Release Announcement

To demonstrate GeoKnow software tools we are developing a Workbench that integrates different components to support users in the tasks of generating Linked data out of spatial data. Several tools can be used for transforming, authoring, interlinking and visualising spatial data as linked data. In this post we want to introduce the public release of the GeoKnow Generator Workbench which implements most of the user requirements collected at the beginning of the project and integrates Virtuoso, Limes, TripleGeo, GeoLift, FAGI-gis, Mappify, Facete and Sparqlify.

The Workbench also provides Single Sign On functionality, user and role management and data access control for the different users. The Workbench is comprised of a front-end and back-end implementations. The front-end provides GUIs for software components where a REST API is available (LimesService, GeoLiftService and TripleGeoService). Components that provide their own GUI, are integrated using containers (FAGI-gis, OntoWiki, Mappify, Sparqlify and Virtuoso SPARQL query interface). The front-end also provides GUIs for the administrative features like users and roles management, data source management and graphs management, as well as the Dashboard GUI. The Dashboard provides a visual feedback to the user with the registered jobs and the status of executions. The Workbench back-end provides REST interfaces for management of users, roles, graphs, datasources and batch jobs, for retrieving the system configuration, and for importing RDF data. All system information is stored in Virtuoso RDF store.

Generator Workbench Architecture.

Generator Workbench Architecture.

A more deep description of this workbench can be found in the GeoKnow D1.4.2 Intermediate release of the GeoKnow_Generator. The GeoKnow software, including this Workbench are open source and they are available in github. An easy to install preconfigured versions of all GeoKnow software ara available as Debian packages in the Linked Data Stack Debian repository.

GeoKnow Generator 2nd Year Releases

The second year of GeoKnow has passed and we have several new releases to announce. Among new software tools there are:

FAGI-gis 1.1+rev0
FAGI aims to provide data fusing on geometries of linked entities.
This latest version provides several optimisations that increased
the scalability and efficiency. It also provides a map-based interface
for facilitating the fusion actions through visualisation and
filtering of linked entities.
RDF Data Cube Validation Tool 0.0.1
Validation tool aims to ensure the quality of statistical datasets.
It is based primarily on the integrity constraints defined by the
RDF Data Cube vocabulary, and it can be used to detect violations
of the integrity constraints, identify violating resources, and
fix detected issues. Furthermore, to ensure the proper use of
vocabularies other than the RDF Data Cube vocabulary, it relies
on RDFUnit. It can be configured to work any SPARQL endpoint,
which needs to be writeable in order to perform fix operations.
However, if this is not the case, user is provided with the SPARQL
Update query that provides the fix, so that it can be executed
manually. Main purpose of the tool within the GeoKnow project
is to ensure the quality of input data that is to be processed and
visualized with ESTA-LD.
spring-batch-admin-geoknow 0.0.1
The spring-batch-admin-geoknow is the first version of batch processing
component that functions as the backend of the Workbench’s.

Besides brand new components, there are also new releases also available as Debian packages:

virtuoso-opensource 7.1.0
Virtuoso 7.1.0 includes improvements in the Engine (SQL Relational Tables and RDF Property/Predicate Graphs); Geo-Spatial support; SPARQL compiler; Jena and Sesame provider performance; JDBC Driver; Conductor CA root certificate management; WebDAV; and the Faceted Browser.
linkedgeodata 0.4.2
The LinkedGeoData package contains scripts and mapping files
for converting spatial data from non-RDF (currently relational)
sources to RDF. OpenStreetMap is so far the best covered data
source. Recently, initial support for GADM and Natural Earth were
added.

  • Added an alternative lgd load script which improves
    throughput by inserting data into a different schema first
    followed by a conversion step.
  • Optimized export scripts by using parallel version of pbzip.
  • Added rdfs:isDefinedBy triples providing licence information
    for each resource.
Facete2-tomcat7 0.0.1
Facete2 is a web application for exploring (spatial) data in SPARQL
endpoints. It features faceted browsing, auto detection of relations
to spatial data, export, and customization of which data to
show.

  • Context menus are now available in the result view enabling
    one to conveniently visit resources in other browser
    tabs, create facet constraints from selected items and copy
    values into the clipboard.
  • Improved Facete’s autodetection when counting facets is
    infeasible because of the size of the data
  • Suggestions of resources related to the facet selection that
    can be shown on the map are now sortable by the length
    of the corresponding property path.
facete2-tomcat-common 0.0.1
This package is a helper package and is mainly responsible for
the Facete database setup. There were no significant changes.
sparqlify-tomcat7 0.6.13
This package provides a web admin interface for Sparqlify. The
system supports running different mappings simultaneously under
different context paths. Minor user interface improvements.
sparqlify-tomcat-common 0.6.13
This package is a helper package and is mainly responsible for
the Sparqlify database setup. There were no significant changes.
sparqlify-cli 0.6.13
This package provides a command line interface for Sparqlify.
Sparqlify is an advanced scalable SPARQL-to-SQL rewriter and the
main engine for the LinkedGeoData project.

  • Fixed some bugs that caused the generation of invalid SQL.
  • Added improvements for aggregate functions that make
    Sparqlify work with Facete.
  • Added initial support for Oracle 11g database.
limes-service 0.5
Limes-services updated to the latest LIMES library. The main enhancement
this year was refactoring the service to provide RESTful
interface.
geoknow-generator-ui 1.1.0
First public release of the GeoKnow Generator Workbench extends
the initial prototype by including user and role management,
graph access control management, processing monitoring
within a dashboard.
Deer 0.0.1
GeoLift has been renamed to DEER. The functionalities provided
in GeoLift have been generalised to not only support geospatial,
but generally structured data.

If you need help installing or using components are available as Debian packages in the Linked Data Stack, do not hesitate to join and ask in the linkeddatastack google group.

Ontos starts project at SECO using GeoKnow Generator

Ontos was selected as an implementation partner at SECO to implement a linked data stack platform. Based on the GeoKnow Generator Ontos will develop a data management and search platform that will allow the management of linked open government data. The GeoKnow generator will used as the backend system that orchestrates the various tools. In a first version triplification of data and the interlinking will be implemented.

The Linked Data Stack

Screen Shot 2014-11-19 at 15.43.15

The Linked Data Stack aims to simplify the deployment and distribution of tools that support the Linked Data life cycle. Moreover, it eases the information flow between components to enhance the end-user experience while harmonising the look and feel. It comprises a number of tools for managing the life-cycle of Linked Data. At the moment it consists of two software repositories for distributing Linked Data Software components to the developer communities: 1.)  A Debian repository that provides installers of components where users can directly install them on Linux servers using the standard packaging tools. 2.) And a Maven repository for managing binary software components used for developing, deploying and provisioning. 

The Linked Data Stack has been the result of the LOD2 EU project efforts, and now the GeoKnow team has officially became the manager of the Linked Data Stack. This announcement took place in the 10th International Conference on Semantic Systems held the 4th and 5th of September 2014 in Leipzig.

If you are a Linked Data User, visit the Linked Data Stack where you can find instructions on how to install and use the demonstrations and documentation for installing specific components. If you want to contribute to the stack with your software, you can find also guidelines how to contribute.

W3C Swiss Day and GeoKnow

Ontos is the W3C Switzerland representative and presents at the W3C Swiss Day the result of the GeoKnow project. Approximately 30 people attend the event that takes place in Fribourg, Switzerland. Daniel Hladky shows the GeoKnow generator and tools during the talk of “Linked Open Data”. Based on the online demo server a simple scenario is shown in order to attract people and customers to use the result of the GeoKnow project. For more details about the event visit the event home page at http://www.ontos.com/web-25-celebrating-25-years-of-the-web/.

Ontos to present GeoKnow Generator at Swiss Archives on 29.01.2014

Ontos is invited to present the 1 Year result of the GeoKnow Generator at the Swiss Archives in Bern on January 29, 2014. The event is organised by Swiss Archives and it is expected that many representatives from the Swiss Government will participate at this event. The objective of the event is to launch the “Open Data” initiative in Switzerland. Jon Jay Le Grange will give a brief overview of the current GeoKnow Generator Framework.

GeoLift – A Spatial Mapping Framework for Enriching RDF Datasets with Geo-spatial Information

Manifold RDF data contain implicit references to geographic data. For example, music datasets such as Jamendo include references to locations of record labels, places where artists were born or have been, etc. The aim of the spatial mapping component, dubbed GeoLift, is to retrieve this information and make it explicit.

Geographical information can be mentioned in three different ways within Linked Data:

  1. Through dereferencing: Several datasets contain links to datasets with explicit geographical information such as DBpedia or LinkedGeoData. For example, in a music dataset, one might find information such as http://example.org/Leipzig owl:sameAs http://dbpedia.org/resource/Leipzig. We call this type of reference explicit. We can now use the semantics of RDF to fetch geographical information from DBpedia and attach it to the resource in the other ontology as http://example.org/Leipzig and http://dbpedia.org/resource/Leipzig refer to the same realworld object.
  2. Through linking: It is known that the Web of Data contains an insufficient number of links. The latest approximations suggest that the Linked Open Data Cloud alone consists of 31+ billion triples but only contains approximately 0.5 billion links (i.e., less than 2% of the triples are links between knowledge bases). The second intuition behind GeoLift is thus to use link discovery to map resources in an input knowledge base to resources in a knowledge that contains explicit geographical information. For example, given a resource http://example.org/Athen, GeoLift should aim to find a resource such as http://dbpedia.org/resource/Athen to map it with. Once having established the link between the two resources, GeoLift can then resolve to the approach defined above.
  3. Through Natural Language Processing: In some cases, the geographic information is hidden in the objects of data type properties. For example, some datasets contain biographies, textual abstracts describing resources, comments from users, etc. The idea here is to use this information by extracting Named Entities and keywords using automated Information Extraction techniques. Semantic Web Frameworks such as FOX have the main advantage of providing URIs for the keywords and entities that they detect. These URIs can finally be linked with the resources to which the datatype properties were attached. Finally, the geographical information can be dereferenced and attached to the resources whose datatype properties were analyzed.

The idea behind GeoLift is to provide a generic architecture that contains means to exploit these three characteristics of Linked Data. In the following, we present the technical approach underlying GeoLift.

GeoLift Architecture

GeoLift was designed to be a modular tool which can be easily extended and re-purposed. In its first version, it provides two main types of artifacts:

  1. Modules: These artifacts are in charge of generating geographical data based on RDF data. To this aim, they implement the three intuitions presented above. The input for such a module is an RDF dataset (in Java, a Jena Model ). The output is also an RDF dataset enriched with geographical information (in Java, an enriched Jena Model ).
  2. Operators: The idea behind operators is to enable users to define a workflow for processing their input dataset. Thus, in case a user knows the type of enrichment that is to be carried out (using linking and then links for example), he can define the sequence of modules that must be used to process his dataset. Note that the format of the input and output of modules is identical. Thus, the user is empowered to create workflows of arbitrary complexity by simply connecting modules.

The corresponding architecture is shown below. The input layer allows reading RDF in different serializations. The enrichment modules are in the second layer and allow adding geographical information to RDF datasets by different means. The operators (which will be implemented in the future version of GeoLift) will combine the enrichment modules and allow defining a workflow for processing information. The output layer serializes the results in different format. The enrichment procedure will be monitored by implementing a controller, which will be added in the future version of GeoLift.

geolift_architecture

Architecture of GeoLift 

Please see http://aksw.org/Projects/GeoLift and https://github.com/GeoKnow/GeoLift for all the technical details and give us feedback. Thank you!

GeoLiftPoster

GeoLift Poster in the 3rd ESWC Summer School 2013.